Get Book Access
TO improve your skills
More than 8,000+ Books sold
4.4 stars ON Amazon

Under the Hood of Large Language Models

Architectures, Training Foundations, and Multimodal Advances

Master transformer architectures, training pipelines, and multimodal systems with clear explanations and hands-on PyTorch projects. Build transformers from scratch while learning attention mechanisms, tokenization, and embeddings. For engineers ready to design AI systems.

Improve your programming skills

What You'll Get from This Book

5 chapters spanning over 720 pages

More than 190 explanatories blocks of code

More than 35 practical exercises

1 Quizz to test your knowledge

2 Practical "Real World" Projects

About thIS book

The Gap Between Using and Understanding

You can use a tool without understanding it. You can call an API, adjust parameters, and observe results. Many do exactly this, and for some purposes, it suffices.

But something essential remains out of reach.

When you don't understand the inner workings, you cannot truly optimize. You cannot diagnose unexpected behavior. You cannot adapt the system to novel challenges. You cannot design new architectures that push boundaries.

You remain dependent rather than empowered. Reactive rather than creative. A user rather than a builder.

This book bridges that gap.

What Lives Beneath the Surface

Large language models rest on foundations that are both mathematically elegant and practically complex. Transformer architectures. Attention mechanisms that weigh context with precision. Embedding spaces where meaning becomes geometry. Training pipelines that process billions of tokens. Optimization strategies that balance capability with efficiency.

These are not abstract concepts meant only for research papers. They are the concrete building blocks of systems you can understand layer by layer, component by component.

This book takes you inside. Not through vague analogies, but through clear explanations grounded in real implementations. Not through intimidating jargon, but through language that illuminates rather than obscures.

From Confusion to Clarity

Perhaps you've tried to learn this material before. You've read papers dense with notation. You've encountered tutorials that skip crucial steps. You've felt the frustration of concepts that seem just beyond grasp.

That confusion is not a reflection of your capability. It reflects the way the material has been presented.

This book takes a different approach. It respects both the complexity of the subject and your ability to master it when guided with patience and precision.

Each chapter builds progressively. Each concept receives the attention it deserves. Each technical detail connects to practical understanding.

The path from confusion to clarity is not mysterious. It requires only that someone light the way with care.

What You Will Gain

This book reveals the complete picture of how large language models work, from foundational principles to advanced implementations.

The Architecture Revealed

You will understand transformer architectures not as black boxes, but as systems you can visualize and reason about. The attention mechanism that allows models to weigh context. The feed-forward networks that process information. The normalization techniques that stabilize training. The positional encodings that give sequence awareness.

Each component serves a purpose. Each design choice reflects engineering insight. You will see the structure clearly.

The Training Journey

Creating a large language model requires more than just architecture. It requires understanding how models learn from data at scale.

You will explore the complete training pipeline. How raw text becomes tokens. How tokens become embeddings in high-dimensional space. How massive datasets flow through optimization algorithms. How loss functions guide learning. How computational resources scale with model size.

This knowledge transforms training from mystery into methodology.

From Text to Multimodality

Language models no longer work with text alone. Modern systems integrate images, audio, and video into unified representations.

You will discover how multimodal architectures extend foundational principles. How vision transformers process images. How audio encoders handle speech. How cross-modal attention bridges different types of information.

The future of AI is multimodal. This book prepares you for that future.

Practical Implementation

Understanding theory means nothing without the ability to apply it. Throughout this book, you will work with real code, real architectures, and real challenges.

Build a transformer from scratch in PyTorch. Train domain-specific tokenizers. Implement audio transcription systems. Extract and work with video embeddings.

These projects transform understanding into capability.

Who This Book Is For

This book serves engineers and technical thinkers who want to move beyond surface-level knowledge.

You Are Ready If

You want to understand the systems you work with at a fundamental level. You are comfortable with Python and basic machine learning concepts. You have the patience to learn step by step. You value depth over shortcuts.

You may be a machine learning engineer expanding your expertise. A software developer transitioning into AI. A researcher building on solid foundations. A technical professional preparing for advanced work in this field.

What matters most is your intention: to truly understand, not just to use.

What This Book Is Not

This is not an introduction to programming or machine learning basics. This is not a recipe book of quick fixes. This is not a superficial survey of trending topics.

This is deep, technical exploration for those ready to invest in genuine mastery.

The Transformation: From Mystery to Mastery

The journey this book offers is one of progressive revelation.

Where You Are Now

Perhaps large language models feel opaque to you. Perhaps you understand pieces but not the whole. Perhaps you can use them but not modify them. Perhaps you sense there is deeper knowledge just out of reach.

These feelings are common. They reflect not limitation, but the beginning of real learning.

The Path Forward

As you progress through this book, opacity gives way to clarity. Fragments connect into coherent understanding. Passive use transforms into active design. What seemed unreachable becomes accessible.

This transformation happens gradually, chapter by chapter, concept by concept. Not through sudden revelation, but through patient accumulation of understanding.

Where You Will Arrive

By the end of this journey, you will see large language models with new eyes. You will understand their architecture at every level. You will grasp their training dynamics. You will recognize their capabilities and limitations. You will have the foundation to design, optimize, and extend these systems.

You will have moved from dependence to competence. From mystery to mastery.

This is not an end point, but a new beginning. With this foundation, you can explore further, build more, and contribute to the field with confidence.

An Invitation

This book represents hundreds of pages of carefully structured explanation, dozens of code examples, and practical projects designed to solidify your understanding.

It offers you a path from where you are to where you want to be.

The path requires effort. It requires focus. It requires patience with yourself as you work through challenging material.

But the destination is worth the journey.

If you are ready to stop treating large language models as magic and start understanding them as the elegant, structured systems they truly are, this book is here to guide you.

Take the first step with clarity and confidence. The rest will follow.

AI systems are no longer optional in technical work—they're everywhere. Yet most people interact with these systems without understanding how they actually function. This creates dependency rather than mastery. As LLMs become more integrated into software, products, and infrastructure, the engineers who understand their inner workings—not just their APIs—will be the ones who can optimize, debug, adapt, and innovate. This book gives you that foundation at a moment when it matters most.
Understanding how large language models work transforms you from a user of tools into a builder of systems. You'll grasp the architecture behind transformers, attention mechanisms, and training pipelines. You'll work with real implementations in PyTorch. You'll build transformers from scratch and train custom tokenizers. This depth allows you to diagnose problems, optimize performance, and design novel solutions—capabilities that separate competent engineers from exceptional ones.
Most resources either stay at the surface level or drown you in research paper notation without clear explanation. This book takes a different path. It respects both the complexity of the subject and your ability to master it when guided with patience and precision. Each concept receives the attention it deserves. Theory connects directly to practical implementation through hands-on projects. The explanations illuminate rather than obscure. You move progressively from foundational principles to advanced implementations—from confusion to clarity.
You should be comfortable with Python and have basic machine learning concepts under your belt. This is not an introduction to programming or ML fundamentals. The book assumes you're ready for deep, technical exploration. If you have that foundation and the patience to learn step by step, you're ready.
The book grants free access to the e-learning platform, which includes: - Complete repository code with all examples from the book - Free chapters from the entire library of published programming books - Free premium customer support - Additional learning resources You're not alone on this path. The support structure is there to help you move forward when you encounter obstacles

Table of contents

Chapter 1: What Are LLMs? From Transformers to Titans

1.1 From GPT to LLaMA, Claude, Gemini, Mistral, DeepSeek

1.2 Decoder-Only vs Encoder-Decoder vs Mixture-of-Experts (MoE)

1.3 Scaling Laws: Kaplan, Chinchilla, and Data–Model Trade-Offs

Practical Exercises – Chapter 1

Chapter 1 Summary – From Transformers to Titans

Chapter 2: Tokenization and Embeddings

2.1 Byte Pair Encoding (BPE), WordPiece, SentencePiece

2.2 Training Custom Tokenizers for Domain-Specific Tasks

2.3 Subword, Character-Level, and Multimodal Embeddings

Practical Exercises – Chapter 2

Chapter 2 Summary

Chapter 3: Anatomy of an LLM

3.1 Multi-Head Attention, Rotary Embeddings, and Normalization Strategies

3.2 Transformer Depth vs Width, Position Encoding Tricks (ALiBi, RoPE)

3.3 Advanced Architectures: SwiGLU, GQA, Attention Sparsity

Practical Exercises – Chapter 3

Chapter 3 Summary – Anatomy of an LLM

Chapter 4: Training LLMs from Scratch

4.1 Data Collection, Cleaning, Deduplication, and Filtering

4.2 Curriculum Learning, Mixture Datasets, and Synthetic Data

4.3 Infrastructure: Distributed Training, GPUs vs TPUs vs Accelerators

4.4 Cost Optimization & Sustainability in Large-Scale Training

Practical Exercises – Chapter 4

Chapter 5: Beyond Text: Multimodal LLMs

5.1 Text+Image Models (LLaVA, Flamingo, GPT-4o, DeepSeek-VL)

5.2 Audio & Speech Integration (Whisper, SpeechLM)

5.3 Video and Cross-Modal Research Directions

Practical Exercises – Chapter 5

Chapter 5 Summary – Beyond Text: Multimodal LLMs

Quiz

Questions

Answers

Project 1: Build a Toy Transformer from Scratch in PyTorch

0. Setup

1. Tiny Dataset & Character Tokenizer

2. Model Components

3. The Tiny GPT-Style Model

4. Training Loop (Causal LM)

Project 2: Train a Custom Domain-Specific Tokenizer (e.g., for legal or medical texts)

0. Setup

1. Gather a Representative Mini-Corpus

2. Train a BPE Tokenizer (🤗 tokenizers)

3. Train a SentencePiece Tokenizer (Unigram or BPE)

4. Wrap Your Tokenizer for Transformers

Reviews

What our readers are saying about this book

Explore the reviews to understand why this book is a great choice! Discover how others have gained from the knowledge and insights it provides. Get a taste of the exciting content that awaits you and see if this book is the perfect fit for your journey.

Recommended by dozens of people
Review from Amazon

Piter

An excellent study guide. This book helped me understand modern artificial intelligence in greater depth and detail. It even includes examples and code snippets. Although I'm not a programmer, I find this purchase useful. I now understand the meanings behind the chat's responses to my questions, and I think I'm beginning to understand why it responds the way it does. It's amazing: when you understand how a machine works, controlling it becomes less scary, and the thought of communicating with Skynet has vanished. It's not an easy book, but believe me, it's very useful for our time.

Review from Amazon

Sivan Kish

Under the Hood of Large Language Models is one of the most detailed and well-structured guides to modern AI available today. It explains complex ideas — like transformer architectures, embeddings, and multimodal systems — in a way that’s both technically precise and easy to follow. The mix of theory and hands-on projects makes it perfect for anyone who wants to go beyond surface knowledge and truly understand how large language models work. Each chapter builds naturally on the last, helping readers gain real confidence in advanced AI concepts. A must-read for developers, engineers, and students who want a clear path from fundamentals to expert-level understanding.

Start your learning journey today

Unlock Access

Is your choice, paperback, eBook, or a Full Access Pass to our entire library

Paperback on Amazon
$49.90
Buy it on Amazon
  • Paperback shipped from Amazon
  • Free code repository access
  • Premium customer support
Book Access
$24.90
  • Digital eLearning platform
  • Free additional video content
  • Cost-effective
  • Premium customer support
  • Easy copy-paste code resources
  • Learn anywhere
Entire Library Unlimited Access
$8.25/mo
Know more
  • Everything from Book Access
  • Unlimited Book Library Access
  • 50% Off on Paperback Books
  • Early Access to New Launches
  • Exclusive Video Content
  • Monthly Book Recommendations
  • Unlimited book updates
  • 24/7 VIP Customer Support
  • Programming Challenges
FAQs

Find answers to common questions about book formats, purchasing options, and subscription details.

Our subscription plan offers unlimited access to our entire library of programming books for a year. It's a cost-effective way to enhance your learning journey.
To purchase books, simply browse our collection, select the ones you want, and proceed to checkout. We offer various payment options for your convenience.
Our books are available in both digital and print formats. You can choose the format that suits your preference and reading style.
Once you've purchased a book, you can access it through your account dashboard. From there, you can download the digital version or view your order history.
To cancel your subscription easily in your dashboard. If need any assistance please contact our support team. They will help you with the cancellation process and any related inquiries.

This book is part of our

AI Engineering

Learning path

More Books on this Learning Path

Feature Engineering for Modern Machine Learning with Scikit-Learn

View this book

Data Engineering Foundations

View this book

Deep Learning and AI Superhero

View this book

Machine Learning Hero

View this book
Cookie Consent

By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.