Policy Gradient and Q-learning Mouse Agents

I trained a policy gradient mouse agent to learn policies that would take it to cookies and cheese while avoiding salads.

To do this, I employed the REINFORCE algorithm for the policy gradient agent and implemented tabular Q-learning to train the mouse agent. I wrote the tabular Q-learning agent in kotlin and the REINFORCE agent in Python.