Machine Learning

January – April 2020

I learnt about the concepts and applications of machine learning in the projects for this class.
Specifically, I learnt how to :
- tune parameters for supervised learning algorithms,
- apply clustering and dimensionality reduction techniques to improve performance for a chosen metric,
- apply randomized optimization techniques to find solve NP hard problems, and,
- apply fundamental Reinforcement Learning algorithms to solve Markov Decision Process(MDP) problems.
Tools used: Python, sklearn, OpenAIGym, MDPToolbox, MLROSE
Algorithms used
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning

Selected Results

Figure 1: Comparision of Area under Precision recall curve of supervised learning algorithms. Area under Precision recall curve works well for the wines dataset which is imbalanced. SVM and Neural Networks perform well in this case. However, Neural Networks take longer to train compared to SVM. Consequently, SVM is the best choice amongst the algorithms available.

Figure 2: Comparison of randomized optimization algorithms for the Max K Color problem. Genetic Algorithm converges to a high value of fitness while other algorithms take longer. MIMIC, notably, converges quickly but to a much lower value of fitness.

Figure 3:Learning curve to compare dimensionality reduction algorithms for the diabetes dataset. PCA has the highest accuracy(74%) compared to other algorithms. The likely explanation is because PCA lowers the number of dimensions to be manageable instead of removing selected features for example. Consequently, the accuracy improves.

Figure 4:For 16 states, an ε = 0.5 i.e equal amount of exploration and exploitation results in higher reward(80,000). Notably, only exploration (ε = 1) is not preferred because it results in relatively smaller value for reward(40,000)