May – July 2020
I learnt about the concepts and applications of reinforcement learning in the projects for this class.
Tools used: Python, OpenAIGym
Selected Results
Figure 1: TD:The intermediate values of λ < 1 do better than TD(1). In particular, TD(1) continues to perform poorly as seen in the previous experiment. Amongst TD methods, lower values of λ like 0 and 0.3 do better than higher values like 0.8.
Figure 2: DQN:HP tuning for Lunar Lander:A smaller value of ε-decay = 0.05 allows the agent to balance exploration and exploitation better than other values in the range of available values.
Figure 3:Linear Programming: For the game of Soccer(two player, zero-sum, repeated game with imperfect information.) Foe-Q algorithm converges within 5 x 105 episodes.