This project implements the core components of the “Attention Is All You Need” paper to help understand the inner workings of Transformers.
Project case study
Transformer Model from Scratch (PyTorch)
An implementation of the Transformer architecture (specifically a Decoder-only GPT-style model) built from scratch in PyTorch. Trained on the Shakespeare dataset from Kaggle to generate Shakespearean-like text.
Key outcomes
- Architecture: Decoder-only Transformer (suited for text generation).
- Components: Multi-Head Attention, Sinusoidal Positional Encoding, Feed-Forward Networks, Layer Normalization.
- Dataset: Character-level tokenization of the Shakespeare dataset.
- Hardware: Optimized for CUDA-enabled GPUs