I started following Deep Learning Curriculum written by Jacob Hilton and here is what I learnt from the exercise in Topic 1 - Transformer. My solution is written in Colab T1-Transformers-solution.ipynb
It took me around 20 hours to finish the exercise and it totally worth it. Throughout the process I learnt:
- How to implement the transformer model end-to-end.
- How to gather and clean the data for transformer model
- How to implement positional embedding, Attention, FNN, Residual Connection and put all of them together into transformer model.
- Switching between
LayerNorm(x + SubLayer(x))and
x + SubLayer(LayerNorm(x)doesn’t affect model performance.
- How to program in Pytorch more fluently and gathered a bunch of utility function for later usage.
- How to debug the model by using gradient flow and
torchviz.make_dotto check model structure clearly.
Enjoy Reading This Article?
Here are some more articles you might like to read next: