100 Days Of Machine Learning Code

Here I pledge to dedicate 1 hour to coding or learning Machine Learning for the next 100 days!

Day 0: July 06, 2018

Today’s Progress

Learned about word embedding in Sequence Models course on Coursera, such as Word2Vec, negative sampling, GloVe word vectors and sentiment classification.

Thoughts

It is interesting to know about how the word embedding algorithms are simplified over time while still being able to achieve very good results. Being simpler in this case gets even better performance.

Link to work: None

Day 1: July 07, 2018

Today’s Progress

Learned basics and intuitions about Singular Vector Decomposition(SVD). Here are two very good videos explaining it:
- How to do Singular Value Decomposition: step for step explanation.
- Dimensionality Reduction: Singular Value Decomposition: intuitive practical example.
Continued with Sequence Models course on Coursera.
- Learned Debiasing word embeddings. Concepts about how to debias the biased word embeddings. For example, “doctor” should not have a closer relation to “Man” than “Woman”. And “mother” and “father” should be equally similar at the “gender” axis.
- Programming assignments
  - Debiasing word embeddings.
  - Emojify (word embedding + LSTM) - outputs a suitable emoji given a sentence.

Thoughts

SVD is so simple yet powerful. I should try to use it to solve some real problems.
I am still not familiar with Keras syntax.
It is quite impressive to see how well the “Emojify” works even with a relatively small data set.

Link to work:

Due to the honor code on Coursera, I am not allow to share my code in the course publicly.

Day 2: July 08, 2018

Today’s Progress

Learned how t-SNE works by watching StatQuest: t-SNE, Clearly Explained.
Continued with Sequence Models course on Coursera.

Thoughts

Beam Search seems like an engineering approach to cope with data set where other exact algorithms like BFS and DFS will be too time consuming. Are there really no better way than Beam Search?
From the error analysis in beam search, I learned the concept of how to prioritize error sources, in this case in beam search or in RNN itself. Similarly, when training a Neural Network, it also has to be determined whether it is overfitting or underfitting to locate the most important error sources.

Link to work: None

Day 3: July 09, 2018

Today’s Progress

Continued with and finished Sequence Models course on Coursera.
- Videos
  - Speech recognition
  - Trigger word detection
- Programming assignments
  - Neural Machine Translation with Attention: in this exercise, I build a Neural Machine Translation (NMT) model, which translate human readable dates (“25th of June, 2009”) into machine readable dates (“2009-06-25”). This is being achieved using an attention model.
  - Trigger word detection: here I practiced converting audio clip into spectrogram and built a sequence model consisting of a 1-D convolution layer and GRUs.

Thoughts

Today I finished my roughly 6-weeks sprint of the deep learning specialization from deeplearning.ai. I realized just as Andrew Ng said, deep learning is such a super power that normal people like you and me can use it to make huge changes to this world and eventually make this world a better place.

Link to work:

Certificate for my deep learning specialization.

Day 4: July 10, 2018

Today’s Progress

Reviewed Coursera videos about RNN, gradient vanishing problems of RNN and GRU.
Started the Practical Reinforcement Learning course on Coursera. Learned about Markov Decision Process and crossentropy method for reinforcement learning. I am still stuck at setting up its development environment with Kitematic/Docker.

Thoughts

None for today.

Day 5: July 11, 2018

Today’s Progress

Finished setting up Practical Reinforcement Learning git repo with Kitematic/Docker on my local PC.
Played around a little bit with gym.

Day 6: July 12, 2018

Today’s Progress

Used crossentropy method to solve the Taxi-v2 problem.

Thoughts

I am still not familiar with python. So much time was wasted just because I did not know in for item in items the item is not a reference. It seems that I need a organized learning session for python.

Link to work: Crossentropy method notebook.

Day 7: July 13, 2018

Today’s Progress

Started writing a reinforcement learning script solving CartPole-v0 from scratch. Getting familiar with setting up gym and managing states as well as actions.

Day 8: July 14, 2018

Today’s Progress

Further worked on my Q learning script solving CartPole-v0. Rough structure is done but it still needs debugging.

Day 9: July 15, 2018

Today’s Progress

My first hand written Q learning script worked and converged to a relative good result.

Thoughts

Some bugs that I fixed today were caused by missunderstanding of the concept. Next time before writing down the code, I should always make sure I fully understand the math and intuition behind it.
I encountered instability in softmax function. The instability issue can be solved by substracting the max the all elements. The sample code could be look like:
```
max_a = np.max(actions_set)
logits_exp = np.exp(actions_set - max_a)
probs = logits_exp / np.sum(logits_exp)
```

Day 10: July 16, 2018

Today’s Progress

Tried to understand how tensorflow.js works by reading an example project. Due to the lack of knowledge of javascript, I did not make too far.

Day 11: July 17, 2018

Today’s Progress

Decided to use the CartPole problem as example to practicing tensorflow and started reimplementing Q learning using tensorflow. The code is here.

Day 12: July 18, 2018

Today’s Progress

I have recently just barely 1 hour for maching learning and this will last until 26.07.2018. Today I finished my draft of tensorflow version Q learning. However, it still does not converge and remains to be fixed.

Day 13: July 19, 2018

Today’s Progress

Further polished the tensorflow version of Q learning, which seems really slow to converge. In order to fix this problem more oriented, I started watching this video which hopefully makes me better understand the issue.

Day 14: July 20, 2018

Today’s Progress

Still stuggling making my code converge to solve the cartpole problem.
Watched the videos: Training a neural network to play a game with TensorFlow and Open AI 1 2 3 4

Thoughts

I am still lack the skill to fully utilize tensorflow. My next short term plan would be becoming familiar with tensorflow.

Day 15: July 21, 2018

Today’s Progress

Started the course Deep Learning with TensorFlow on Cognitive Class. Relearned basics of tensorflow.

Day 16: July 22, 2018

Today’s Progress

Written tensorflow scripts to detect digits in MNIST dataset.
- Multi-layer perceptron
- Convolutional Neural Network

Day 17: July 23, 2018

Today’s Progress

Read the paper One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning. In the paper, the author presents a method combining model-agnostic meta-learning algorithm (MAML) and 1D convolution layers, which enables robots to imitate the task shown by a human from a video. Although the video in test phase could have different similar objects and human demonstrator, a completely new task still cannot be inferred by the method.

Day 18: July 24, 2018

Today’s Progress

For better understanding reinforcement learning, I started reading book Reinforcement Learning: An Introduction Second edition. Progress: till chapter 1.3.

Day 19: July 25, 2018

Today’s Progress

Continued reading book Reinforcement Learning: An Introduction Second edition. Progress: till chapter 1.7.

Day 20: July 26, 2018

Today’s Progress

Continued reading book Reinforcement Learning: An Introduction Second edition. Progress: till chapter 2.2.

Day 21: July 27, 2018

Today’s Progress

Continued reading book Reinforcement Learning: An Introduction Second edition. Progress: till chapter 2.7.

Day 22: July 28, 2018

Today’s Progress

Reviewed and organized the learning materials and code from deep learning specialization of deeplearning.ai.

Day 23: July 29, 2018

Today’s Progress

Finished organizing the learning materials and code from deep learning specialization of deeplearning.ai.

Day 24: July 30, 2018

Today’s Progress

Continued reading book Reinforcement Learning: An Introduction Second edition. Progress: till chapter 3.1.

Day 25: July 31, 2018

Today’s Progress

Continued reading book Reinforcement Learning: An Introduction Second edition. Progress: till chapter 3.4.

Day 26: August 1, 2018

Today’s Progress

Continued reading book Reinforcement Learning: An Introduction Second edition. Progress: till chapter 3.6.

Day 27: August 2, 2018

Today’s Progress

Continued reading book Reinforcement Learning: An Introduction Second edition. Progress: till chapter 3.8.

Day 28: August 3, 2018

Today’s Progress

Continued with the course Deep Learning with TensorFlow on Cognitive Class. Progressing till the lab section in Module 3 - Recurrent Neural Network. This course is mainly about practicing using tensorflow for deep learning tasks.

Day 29: August 4, 2018

Today’s Progress

Continued reading book Reinforcement Learning: An Introduction Second edition. Progress: till chapter 3.9.

Day 30: August 5, 2018

Today’s Progress

Continued reading book Reinforcement Learning: An Introduction Second edition. Progress: till chapter 4.
Reviewed basics of LSTM and the APIs of tensorflow of it.

Day 31: August 6, 2018

Today’s Progress

Reviewed knowledge about GRU and LSTM on coursera
Continued with the course Deep Learning with TensorFlow on Cognitive Class. With the help of existing jupyter notebook, I went through the process of using LSTMs to solve MNIST dataset.

Day 32: August 20, 2018

During the last two-week vacation, no progress was made in machine learning area.

Today’s Progress

Continued reading book Reinforcement Learning: An Introduction Second edition. Progress: till chapter 4.4.
Learned at DataCamp about basics of Pandas python library.
Started learning about Bayesian Optimization by this paper and started the course Bayesian Methods for Machine Learning at Coursera.

Day 33: August 21, 2018

Today’s Progress

Started watching videos for week 1 in course Bayesian Methods for Machine Learning at Coursera.
It is very interesting to see the bayesian perspective of linear regression automatically includes a L2 regulator. How beautiful! Link to video

Day 34: August 22, 2018

Today’s Progress

Computed for week 1 exercise in course Bayesian Methods for Machine Learning.
Learned about Evolution Strategy in course Practical Reinforcement Learning.

Day 35: August 24, 2018

Today’s Progress

Reviewed Evolution Strategy and learned about reward design in course Practical Reinforcement Learning.

Day 36: August 25, 2018

Today’s Progress

Reworked and verified the quiz of week 1 in course Bayesian Methods for Machine Learning.
Tried to set up Kitematic on windows 10. It works in general but does not support visualization yet, which might be related to conflict between Hyper-V and VirtualBox.

Day 37: August 27, 2018

Today’s Progress

Learned Latent variable models and probabilistic clustering in course Bayesian Methods for Machine Learning.
Advantages of probabilistic clustering:
- Allows to tune hyper parameters. GMM for example can be used to pick the proper using training and validation, while methods like K-Means performs similar in train and validation set with increasing number of clusters.
- Builds generative model of the data, which can be used to generate new data points

Day 38: August 28, 2018

Today’s Progress

Learned Gaussian Mixture Model and its training method - Expectation Maximization in course Bayesian Methods for Machine Learning.
Advantages of Expectation Maximization comparing to stochastic gradient decent methods:
- EM trains faster (less iterations)
- EM handles complicated contraint better, such as semi-definite covariance matrix.
Drawback of EM: local maximum

Day 39: August 29, 2018

Today’s Progress

Reviewed Maximum A Posteriori algorithm and its simple examples.
Continued to learn details of Expectation Maximization.

Day 40: August 30, 2018

Today’s Progress

Finished quiz of week 1 in course Bayesian Methods for Machine Learning.

Day 41: September 01, 2018

Today’s Progress

Learned about K-Means from EM perspective in course Bayesian Methods for Machine Learning. Interestingly, K-Means is actually Expectation Maximization for Gaussian Mixture Model, but
- with fixed covariance matrix to be identity matrix $$\sum_c = I$$
- with simplified E-step (approximate $$p(t_i \mid x_i, \theta)$$ with delta function)

Day 42: September 03, 2018

Today’s Progress

Learned about probabilistic PCA with EM in course Bayesian Methods for Machine Learning.

Day 43: September 13, 2018

Today’s Progress

Kept learning deep learning in the last few days, but did not keep track the progress.
- Officially started the course How to Win a Data Science Competition: Learn from Top Kagglers. In the last 10 days, I learned about Pandas basics, feature preprocessing and generation, and Bag of words which is a feature extraction method for text and images.
- Progressed with Bayesian Methods for Machine Learning - using EM to train GMM for clustering problem.
- Slowly learning Dynamic Programming in Practical Reinforcement Learning
Used pre-trained VGG model to do simple classification. This site provides a very simple example which works pretty well.

Day 43: September 14, 2018

Today’s Progress

Learned about how to do Exploratory Data Analysis (EDA).
- Purpose:
  - Better understanding of the data
  - Building an intuition about the data
  - Generating hypothesizes
  - Finding insights
- Aspects:
  - Getting domain knowledge (to e.g. make use of value range or create new feature)
  - Checking if the data is intuitive and agrees with domain knowledge
  - Understanding how the data was generated for mostly a proper validation

Learned how to visualize feature importance of given data sets

# Suppose X are the feature vector and y is the class labels
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(X, y)

plt.plot(rf.feature_importances_)
plt.xticks(np.arrange(X.shape[1]), X.columns.tolist(), rotation=90)

This code will plot the feature importance of all features.

Day 44: September 18, 2018

Today’s Progress

Finished the programming exercises for Pandas in the first week of How to Win a Data Science Competition: Learn from Top Kagglers.
Reviewed concepts about dynamic programming.

Day 44: September 27, 2018

Today’s Progress

Submitted my first Kaggle result to the challenge. The ranking on the leaderboard is extremely bad though.