components of Generative AI

3 days ago

csemachine learning

Here’s an expanded explanation of the key components of Generative AI, now including simple, runnable Python code examples for many of them (mostly using PyTorch or popular libraries like tiktoken, sentence-transformers, and minimal from-scratch implementations).

These examples are kept minimal and educational — they show the core idea in working condition, not full production-scale training.

1. Data (Training Corpus)

Huge text/image/code datasets.

No code example needed — but imagine loading billions of tokens from Common Crawl, The Pile, GitHub, LAION-5B, etc.

2. Tokenization

Breaking text → list of token IDs.

# pip install tiktoken
import tiktoken

# GPT-4 / cl100k_base tokenizer (very common in 2025–2026)
encoding = tiktoken.get_encoding("cl100k_base")

text = "Generative AI is transforming technology in 2026!"

tokens = encoding.encode(text)
print("Tokens (IDs)    :", tokens)
print("Token count     :", len(tokens))
print("Decoded back    :", encoding.decode(tokens))
print("Decoded tokens  :", [encoding.decode([t]) for t in tokens])

# Example output:
# Tokens (IDs)    : [48609, 315, 15592, 374, 18258, 4769, 304, 220, 451, 220, 605, 0]
# Token count     : 12
# Decoded back    : Generative AI is transforming technology in 2026!

3. Embeddings

Tokens → dense vectors (capturing meaning).

# pip install sentence-transformers
from sentence_transformers import SentenceTransformer
import torch

model = SentenceTransformer("all-MiniLM-L6-v2")   # ~80 MB, fast & good quality

sentences = [
    "The king is strong",
    "The queen is powerful",
    "Apple is a fruit",
    "Apple released iPhone 17"
]

embeddings = model.encode(sentences)   # shape: (4, 384)

print("Embedding shape       :", embeddings.shape)
print("Similarity king ↔ queen:", torch.nn.functional.cosine_similarity(
    torch.tensor(embeddings[0]), torch.tensor(embeddings[1]), dim=0
).item())   # usually ~0.65–0.75

# Same word, different context → different embeddings in contextual models

4–5. Neural Networks + Attention Mechanisms (Transformer core)

Minimal self-attention from scratch (very simplified):

import torch
import torch.nn.functional as F

def simple_self_attention(x):
    # x shape: (batch=1, seq_len, d_model)
    d_k = x.size(-1)
    
    # In real models: three separate projections Q, K, V
    Q = K = V = x   # naive for demo
    
    scores = torch.matmul(Q, K.transpose(-2, -1)) / (d_k ** 0.5)   # scaled dot-product
    attn_weights = F.softmax(scores, dim=-1)
    
    output = torch.matmul(attn_weights, V)
    return output

# Tiny example
torch.manual_seed(42)
x = torch.randn(1, 5, 64)   # 5 tokens, 64-dim embeddings
out = simple_self_attention(x)
print("Input shape :", x.shape)
print("Output shape:", out.shape)   # same shape

Real transformers use multi-head attention, causal masking, etc.

6. Training (Next-token prediction – core objective)

Very simplified training loop idea:

import torch
import torch.nn as nn
import torch.optim as optim

# Fake tiny model: just embedding + linear (real = many transformer layers)
vocab_size = 50000
embed_dim = 128
model = nn.Sequential(
    nn.Embedding(vocab_size, embed_dim),
    nn.Linear(embed_dim, vocab_size)   # predicts next token logits
)

optimizer = optim.Adam(model.parameters(), lr=3e-4)
loss_fn = nn.CrossEntropyLoss()

# Fake batch: token ids (batch=2, seq_len=6)
inputs = torch.tensor([[   5,  234,   89, 1543,   12, 9999],
                       [1024,   67,  543,   21, 8765, 4321]])
targets = inputs[:, 1:]   # predict next token → shift right
inputs = inputs[:, :-1]

# Forward + loss (real training does many layers, attention, etc.)
logits = model(inputs)              # shape: (2, 5, 50000)
loss = loss_fn(logits.reshape(-1, vocab_size), targets.reshape(-1))

print("Loss:", loss.item())

# optimizer.zero_grad()
# loss.backward()
# optimizer.step()

7. Parameters

Just the count — no code. Modern 2026 open models: 8B–405B parameters.

8. Decoding (Generation with sampling)

Simple autoregressive generation + temperature sampling:

import torch
import torch.nn.functional as F

def generate_simple(model, start_tokens, max_new=30, temperature=0.8):
    model.eval()
    generated = start_tokens.clone()
    
    for _ in range(max_new):
        with torch.no_grad():
            logits = model(generated)[:, -1, :]   # last position logits
            logits = logits / temperature
            probs = F.softmax(logits, dim=-1)
            next_token = torch.multinomial(probs, num_samples=1)
            generated = torch.cat([generated, next_token], dim=1)
    
    return generated

# Fake usage (real model would be GPT-like)
# start = torch.tensor([[40, 3021]])   # e.g. "Hello world"
# output_ids = generate_simple(model, start)

9. Fine-tuning

Usually LoRA/QLoRA nowadays → change only ~0.1–1% of parameters.

No minimal code here — involves peft library + trl / SFTTrainer.

10. Prompting

No code — just string engineering:

System: You are a helpful Python expert.
Think step by step before answering.

User: Write a function that reverses a string without using [::-1]

11. Inference

Running the model → covered in generation example above.

Modern tricks: quantization (bitsandbytes), FlashAttention-2/3, speculative decoding.

12. Safety

No simple code — usually extra models (moderation API) or refusal fine-tuning.

13. Evaluation

Example: perplexity (lower = better language modeling)

# perplexity = exp(average negative log likelihood)
loss = 2.3   # from cross-entropy
perplexity = torch.exp(torch.tensor(loss))
print("Perplexity:", perplexity.item())   # ~10 is decent for small model

14. Deployment

No code — usually FastAPI / vLLM / TGI / Triton servers.

Quick Summary Table (2026 perspective)

Component	Typical Library/Tool (2026)	Key Idea in Code
Tokenization	tiktoken, sentencepiece, HF	text → [40, 3021, 2956, ...]
Embeddings	sentence-transformers, torch.nn	token id → 384–8192 dim vector
Attention	torch.nn.MultiheadAttention	Q·Kᵀ / √d → softmax → weighted V
Training	PyTorch + AdamW + CrossEntropy	predict shifted tokens
Decoding	custom sampling loop	multinomial(probs / temperature)
Fine-tuning	peft (LoRA), trl (SFT)	update small adapter weights

Would you like me to expand any of these code examples (e.g. full minimal transformer, LoRA fine-tuning snippet, top-p sampling, etc.)?

← See more posts