This Project implements a Deep Convolutional Q-Learning (DCQN) agent to play Ms. Pacman.
Part 0 – Installing and Importing:
- I started by installing the gymnasiumpackage, along with its atari and box2d dependencies, and ale-py for the Atari environments. I also installed swig.
- Then, Iimported essential libraries such as os, random, numpy, torch and its neural network modules (nn, optim, functional), collections.deque for replay memory, and torch.utils.data.
Part 1 – Building the AI:
- Idefined the Network class, which represents the neural network architecture for the Q-function. This network consists of several convolutional layers (with batch normalization and ReLU activations) to process image states, followed by fully connected layers, finally outputting action_size Q-values.
Part 2 – Training the AI:
- Environment Setup: Iinitialized the MsPacmanNoFrameskip-v0 environment from Gymnasium, and extracted the state shape, state size, and the number of possible actions.
- Hyperparameters: Iset up key hyperparameters for the training process, including learning_rate, minibatch_size, and discount_factor.
- Preprocessing: Icreated a preprocess_frame function using PIL and transforms to resize the incoming game frames and convert them into PyTorch tensors suitable for the neural network.
- DCQN Agent Implementation: Iimplemented the Agent class, which encapsulates the core DCQN logic:
- It initializes a local_qnetworkand a target_qnetwork (both instances of Ir Network class).
- It uses an Adam optimizer for updating the local network.
- A dequeis used as a replay memory to store experiences.
- The stepmethod adds new experiences (state, action, reward, next_state, done) to the memory and triggers the learn method if enough experiences are accumulated.
- The actmethod implements the epsilon-greedy policy, using the local Q-network to choose an action or a random action for exploration.
- The learnmethod performs a Q-learning update: it samples experiences from memory, computes Q-targets using the target network, calculates the MSE loss between expected and target Q-values, and updates the local Q-network using the optimizer.
- Training Loop: Iinitiated a training loop for number_episodes.
- In each episode, the agent interacts with the environment, performing actions based on its current policy (with exploration decaying over time).
- Experiences are stored and used to train the agent.
- The scorefor each episode is tracked, and an average score over the last 100 episodes is printed.
- The training stops early if the average score reaches a predefined threshold (500.0), and the trained Q-network’s weights are saved.
Part 3 – Visualizing the Results:
- Iprovided functions to visualize the trained agent’s performance:
- show_video_of_modelrecords a video of the agent playing in the environment.
- show_videodisplays the generated video directly in the notebook using display.HTML.
In essence, I have built, trained, and evaluated a deep reinforcement learning agent capable of playing Ms. Pacman using a Deep Convolutional Q-Network.

