page-header

PAC MAN AI GAME

This Project implements a Deep Convolutional Q-Learning (DCQN) agent to play Ms. Pacman.

Part 0 – Installing and Importing:

  • I started by installing the gymnasiumpackage, along with its atari and box2d dependencies, and ale-py for the Atari environments. I also installed swig.
  • Then, Iimported essential libraries such as os, random, numpy, torch and its neural network modules (nn, optim, functional), collections.deque for replay memory, and torch.utils.data.

Part 1 – Building the AI:

  • Idefined the Network class, which represents the neural network architecture for the Q-function. This network consists of several convolutional layers (with batch normalization and ReLU activations) to process image states, followed by fully connected layers, finally outputting action_size Q-values.

Part 2 – Training the AI:

  • Environment Setup: Iinitialized the MsPacmanNoFrameskip-v0 environment from Gymnasium, and extracted the state shape, state size, and the number of possible actions.
  • Hyperparameters: Iset up key hyperparameters for the training process, including learning_rate, minibatch_size, and discount_factor.
  • Preprocessing: Icreated a preprocess_frame function using PIL and transforms to resize the incoming game frames and convert them into PyTorch tensors suitable for the neural network.
  • DCQN Agent Implementation: Iimplemented the Agent class, which encapsulates the core DCQN logic:
  • It initializes a local_qnetworkand a target_qnetwork (both instances of Ir Network class).
  • It uses an Adam optimizer for updating the local network.
  • A dequeis used as a replay memory to store experiences.
  • The stepmethod adds new experiences (state, action, reward, next_state, done) to the memory and triggers the learn method if enough experiences are accumulated.
  • The actmethod implements the epsilon-greedy policy, using the local Q-network to choose an action or a random action for exploration.
  • The learnmethod performs a Q-learning update: it samples experiences from memory, computes Q-targets using the target network, calculates the MSE loss between expected and target Q-values, and updates the local Q-network using the optimizer.
    • Training Loop: Iinitiated a training loop for number_episodes.
  • In each episode, the agent interacts with the environment, performing actions based on its current policy (with exploration decaying over time).
  • Experiences are stored and used to train the agent.
  • The scorefor each episode is tracked, and an average score over the last 100 episodes is printed.
  • The training stops early if the average score reaches a predefined threshold (500.0), and the trained Q-network’s weights are saved.

Part 3 – Visualizing the Results:

  • Iprovided functions to visualize the trained agent’s performance:
  • show_video_of_modelrecords a video of the agent playing in the environment.
  • show_videodisplays the generated video directly in the notebook using display.HTML.

In essence, I have built, trained, and evaluated a deep reinforcement learning agent capable of playing Ms. Pacman using a Deep Convolutional Q-Network.

Get in touch — Start a project.

Share this project