page-header

Lunar Landing using Ai Agent

Training a Deep Q-Learning Agent for Lunar Lander

This details the development and training of a Deep Q-Learning (DQN) agent to successfully solve the Lunar Lander environment from Gymnasium. The objective was to enable an AI to safely land a spacecraft on a designated landing pad.

 

  1. Environment Setup and Library Imports

We began by installing the necessary gymnasium packages, including support for Box2D environments, which Lunar Lander is based on. Key libraries such as torch for neural network operations, numpy for numerical computations, and random for exploration strategies were imported. We ensured the environment LunarLander-v3 was correctly initialized, extracting the state_size (8 observations) and number_actions (4 discrete actions) required for the agent’s neural network.

 

  1. Building the Deep Q-Network (DQN) Architecture

A Network class was defined using torch.nn.Module to construct the neural network. This network consists of three fully connected layers with ReLU activation functions between the first two layers. The input layer’s size corresponds to the environment’s state_size, and the output layer’s size matches the action_size, allowing the network to predict Q-values for each possible action.

 

  1. Hyperparameter Initialization

Crucial hyperparameters for the DQN training were set:

learning_rate: 5e-4 for the Adam optimizer.

minibatch_size: 100 for sampling experiences during learning.

discount_factor: 0.99 to determine the importance of future rewards.

replay_buffer_size: 10000 to store past experiences for replay.

interpolation_parameter: 1e-3 for the soft update of the target network.

 

  1. Implementing Experience Replay

An ReplayMemory class was implemented to store the agent’s experiences (state, action, reward, next_state, done tuples). This replay buffer allows the agent to learn from a diverse set of past interactions, breaking correlations in sequential observations and improving learning stability. A push method adds new experiences, and a sample method retrieves random minibatches for training.

 

  1. Developing the DQN Agent

The Agent class encapsulates the core DQN logic:

__init__: Initializes two Network instances – a local_qnetwork for action selection and learning, and a target_qnetwork for calculating target Q-values, ensuring training stability. An Adam optimizer is set up, and the ReplayMemory is instantiated.

step: Stores new experiences in the replay buffer and triggers the learn function periodically.

act: Implements an ε-greedy policy, where the agent either exploits its learned knowledge (choosing the action with the highest predicted Q-value) or explores new actions randomly with a probability determined by epsilon.

learn: Performs the Q-learning update. It calculates target Q-values using the target network and current rewards, then computes the mean squared error loss between the expected Q-values (from the local network) and the target Q-values. The optimizer minimizes this loss, and a soft_update mechanism is used to gradually update the target network parameters towards the local network’s parameters.

Ai Agent made up to Episode 100 to get this initial result. As you can see it is still not accurate, but alot of Progress has been made:

  1. Training the Agent

The agent was trained over 2000 episodes, with each episode having a maximum of 1000 timesteps. An ε-greedy strategy was employed with epsilon decaying from 1.0 to 0.01 over time, balancing exploration and exploitation. The agent’s performance was monitored by tracking the average score over the last 100 episodes. Training continued until the average score reached 200.0 or more, indicating that the environment was successfully solved. The best performing model’s weights were saved to ‘checkpoint.pth’.

 

Outcome: The agent successfully solved the Lunar Lander environment within 1000 episodes, achieving an average score of 212.48.

 

  1. Visualizing the Results

After training, a show_video_of_model function was used to render the trained agent’s performance in the LunarLander-v3 environment and save it as an MP4 video. This video was then embedded and displayed in the notebook using IPython.display.HTML, visually confirming the agent’s ability to navigate and land the spacecraft.

Get in touch — Start a project.

Share this project