Project case study

Transformer Model from Scratch (PyTorch)

An implementation of the Transformer architecture (specifically a Decoder-only GPT-style model) built from scratch in PyTorch. Trained on the Shakespeare dataset from Kaggle to generate Shakespearean-like text.

Key outcomes

  • Architecture: Decoder-only Transformer (suited for text generation).
  • Components: Multi-Head Attention, Sinusoidal Positional Encoding, Feed-Forward Networks, Layer Normalization.
  • Dataset: Character-level tokenization of the Shakespeare dataset.
  • Hardware: Optimized for CUDA-enabled GPUs

This project implements the core components of the “Attention Is All You Need” paper to help understand the inner workings of Transformers.