Istvan Szita and Andras Lorincz from the Department of Information Systems at Eotvos University in Hungary have taught AI agents to play Ms. Pac-Man. Their paper on this, “Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man”, was published in the Journal of Artificial Intelligence Research. The study showed that AI agents can successfully be taught how to strategize through reinforcement learning.
Szita and Lorincz chose the game Ms. Pac-Man for their study because it enabled them to test a variety of teaching methods. In the original Pac-Man, released in 1979, players must eat dots, avoid being eaten by ghosts, and score big points by eating flashing ghosts. The player's movements here depend heavily on the movements of ghosts, whose routes are, however, deterministic, enabling players to find patterns and predict future movements. In Ms. Pac-Man the ghosts' routes are randomized, so that players can't figure out an optimal action sequence in advance. This means players must constantly watch the ghosts' movements, and make decisions based on their observations. In their study, Szita and Lorincz taught their AI agent to do the same.
Hungarian researchers used the "cross-entropy method" for the learning process of their AI, and rule-based policies to guide how the agent should transform its observations into the best action. The scientists gave their Ms Pac-Man program a selection of possible scenarios, such as “if ghost nearby”, and possible actions, such as “move away”. The program randomly combined scenarios with actions to produce rules, and then played games using random combinations of those rules to deduce which ones work best.
When the program has to make a decision, it checks its rule list, starting with the rules with highest priority, important for situations in which two rules conflict. The most important rule, it decided, was to avoid being eaten by ghosts. The next rule says that if there is an edible ghost on the board, then the agent should chase it, because eating ghosts results in the highest points. The AI agent also knows that if all moves seem equally good, it shouldn’t turn back as the dots in that direction have already been eaten.
The resulting program narrowly outperformed average human players. However, it failed to evolve certain tactics that humans find useful, such as waiting for ghosts to approach before eating a power dot to maximize the potential effect of the dot. In other words – there is still much to learn for this AI agent. Phew!
Szita and Lorincz chose the game Ms. Pac-Man for their study because it enabled them to test a variety of teaching methods. In the original Pac-Man, released in 1979, players must eat dots, avoid being eaten by ghosts, and score big points by eating flashing ghosts. The player's movements here depend heavily on the movements of ghosts, whose routes are, however, deterministic, enabling players to find patterns and predict future movements. In Ms. Pac-Man the ghosts' routes are randomized, so that players can't figure out an optimal action sequence in advance. This means players must constantly watch the ghosts' movements, and make decisions based on their observations. In their study, Szita and Lorincz taught their AI agent to do the same.
Hungarian researchers used the "cross-entropy method" for the learning process of their AI, and rule-based policies to guide how the agent should transform its observations into the best action. The scientists gave their Ms Pac-Man program a selection of possible scenarios, such as “if ghost nearby”, and possible actions, such as “move away”. The program randomly combined scenarios with actions to produce rules, and then played games using random combinations of those rules to deduce which ones work best.
When the program has to make a decision, it checks its rule list, starting with the rules with highest priority, important for situations in which two rules conflict. The most important rule, it decided, was to avoid being eaten by ghosts. The next rule says that if there is an edible ghost on the board, then the agent should chase it, because eating ghosts results in the highest points. The AI agent also knows that if all moves seem equally good, it shouldn’t turn back as the dots in that direction have already been eaten.
The resulting program narrowly outperformed average human players. However, it failed to evolve certain tactics that humans find useful, such as waiting for ghosts to approach before eating a power dot to maximize the potential effect of the dot. In other words – there is still much to learn for this AI agent. Phew!