components of Generative AI
Here’s an expanded explanation of the key components of Generative AI, now including simple, runnable Python code examples for many of them (mostly using PyTorch or popular libraries like tiktoken, sentence-transformers, and minimal from-scratch implementations).
These examples are kept minimal and educational — they show the core idea in working condition, not full production-scale training.
1. Data (Training Corpus)
Huge text/image/code datasets.
No code example needed — but imagine loading billions of tokens from Common Crawl, The Pile, GitHub, LAION-5B, etc.
2. Tokenization
Breaking text → list of token IDs.
# pip install tiktoken
import tiktoken
# GPT-4 / cl100k_base tokenizer (very common in 2025–2026)
encoding = tiktoken.get_encoding("cl100k_base")
text = "Generative AI is transforming technology in 2026!"
tokens = encoding.encode(text)
print("Tokens (IDs) :", tokens)
print("Token count :", len(tokens))
print("Decoded back :", encoding.decode(tokens))
print("Decoded tokens :", [encoding.decode([t]) for t in tokens])
# Example output:
# Tokens (IDs) : [48609, 315, 15592, 374, 18258, 4769, 304, 220, 451, 220, 605, 0]
# Token count : 12
# Decoded back : Generative AI is transforming technology in 2026!
3. Embeddings
Tokens → dense vectors (capturing meaning).
# pip install sentence-transformers
from sentence_transformers import SentenceTransformer
import torch
model = SentenceTransformer("all-MiniLM-L6-v2") # ~80 MB, fast & good quality
sentences = [
"The king is strong",
"The queen is powerful",
"Apple is a fruit",
"Apple released iPhone 17"
]
embeddings = model.encode(sentences) # shape: (4, 384)
print("Embedding shape :", embeddings.shape)
print("Similarity king ↔ queen:", torch.nn.functional.cosine_similarity(
torch.tensor(embeddings[0]), torch.tensor(embeddings[1]), dim=0
).item()) # usually ~0.65–0.75
# Same word, different context → different embeddings in contextual models
4–5. Neural Networks + Attention Mechanisms (Transformer core)
Minimal self-attention from scratch (very simplified):
import torch
import torch.nn.functional as F
def simple_self_attention(x):
# x shape: (batch=1, seq_len, d_model)
d_k = x.size(-1)
# In real models: three separate projections Q, K, V
Q = K = V = x # naive for demo
scores = torch.matmul(Q, K.transpose(-2, -1)) / (d_k ** 0.5) # scaled dot-product
attn_weights = F.softmax(scores, dim=-1)
output = torch.matmul(attn_weights, V)
return output
# Tiny example
torch.manual_seed(42)
x = torch.randn(1, 5, 64) # 5 tokens, 64-dim embeddings
out = simple_self_attention(x)
print("Input shape :", x.shape)
print("Output shape:", out.shape) # same shape
Real transformers use multi-head attention, causal masking, etc.
6. Training (Next-token prediction – core objective)
Very simplified training loop idea:
import torch
import torch.nn as nn
import torch.optim as optim
# Fake tiny model: just embedding + linear (real = many transformer layers)
vocab_size = 50000
embed_dim = 128
model = nn.Sequential(
nn.Embedding(vocab_size, embed_dim),
nn.Linear(embed_dim, vocab_size) # predicts next token logits
)
optimizer = optim.Adam(model.parameters(), lr=3e-4)
loss_fn = nn.CrossEntropyLoss()
# Fake batch: token ids (batch=2, seq_len=6)
inputs = torch.tensor([[ 5, 234, 89, 1543, 12, 9999],
[1024, 67, 543, 21, 8765, 4321]])
targets = inputs[:, 1:] # predict next token → shift right
inputs = inputs[:, :-1]
# Forward + loss (real training does many layers, attention, etc.)
logits = model(inputs) # shape: (2, 5, 50000)
loss = loss_fn(logits.reshape(-1, vocab_size), targets.reshape(-1))
print("Loss:", loss.item())
# optimizer.zero_grad()
# loss.backward()
# optimizer.step()
7. Parameters
Just the count — no code. Modern 2026 open models: 8B–405B parameters.
8. Decoding (Generation with sampling)
Simple autoregressive generation + temperature sampling:
import torch
import torch.nn.functional as F
def generate_simple(model, start_tokens, max_new=30, temperature=0.8):
model.eval()
generated = start_tokens.clone()
for _ in range(max_new):
with torch.no_grad():
logits = model(generated)[:, -1, :] # last position logits
logits = logits / temperature
probs = F.softmax(logits, dim=-1)
next_token = torch.multinomial(probs, num_samples=1)
generated = torch.cat([generated, next_token], dim=1)
return generated
# Fake usage (real model would be GPT-like)
# start = torch.tensor([[40, 3021]]) # e.g. "Hello world"
# output_ids = generate_simple(model, start)
9. Fine-tuning
Usually LoRA/QLoRA nowadays → change only ~0.1–1% of parameters.
No minimal code here — involves peft library + trl / SFTTrainer.
10. Prompting
No code — just string engineering:
System: You are a helpful Python expert.
Think step by step before answering.
User: Write a function that reverses a string without using [::-1]
11. Inference
Running the model → covered in generation example above.
Modern tricks: quantization (bitsandbytes), FlashAttention-2/3, speculative decoding.
12. Safety
No simple code — usually extra models (moderation API) or refusal fine-tuning.
13. Evaluation
Example: perplexity (lower = better language modeling)
# perplexity = exp(average negative log likelihood)
loss = 2.3 # from cross-entropy
perplexity = torch.exp(torch.tensor(loss))
print("Perplexity:", perplexity.item()) # ~10 is decent for small model
14. Deployment
No code — usually FastAPI / vLLM / TGI / Triton servers.
Quick Summary Table (2026 perspective)
| Component | Typical Library/Tool (2026) | Key Idea in Code |
|---|---|---|
| Tokenization | tiktoken, sentencepiece, HF | text → [40, 3021, 2956, ...] |
| Embeddings | sentence-transformers, torch.nn | token id → 384–8192 dim vector |
| Attention | torch.nn.MultiheadAttention | Q·Kᵀ / √d → softmax → weighted V |
| Training | PyTorch + AdamW + CrossEntropy | predict shifted tokens |
| Decoding | custom sampling loop | multinomial(probs / temperature) |
| Fine-tuning | peft (LoRA), trl (SFT) | update small adapter weights |
Would you like me to expand any of these code examples (e.g. full minimal transformer, LoRA fine-tuning snippet, top-p sampling, etc.)?