Project case study

Transformer Model from Scratch (PyTorch)

An implementation of the Transformer architecture (specifically a Decoder-only GPT-style model) built from scratch in PyTorch. Trained on the Shakespeare dataset from Kaggle to generate Shakespearean-like text.

Back to projects Ask about this work

Key outcomes

Architecture: Decoder-only Transformer (suited for text generation).
Components: Multi-Head Attention, Sinusoidal Positional Encoding, Feed-Forward Networks, Layer Normalization.
Dataset: Character-level tokenization of the Shakespeare dataset.
Hardware: Optimized for CUDA-enabled GPUs

This project implements the core components of the “Attention Is All You Need” paper to help understand the inner workings of Transformers.

Previous LLM Fine-Tuning for Information Extraction (NER) Next PostgreSQL MCP Server