Building Intuition for Generative AI Models
On top of building internal AI tools, I've increasingly found myself in a teaching role. Below are the practical takeaways I presented on the last slide of my presentation, "Inside the Black Box." The goal was to give an intuition of transformer-based generative AI models by diving into their technical details with practical examples.
The analogy of building an intuition of these AI models I used was surfing. In surfing, you need to get in the water, figure out how the ocean works – how wind and ocean floor affect the wave, how to read the wave and get in position, and how your board and fin box change depending on conditions. You can watch videos all day and practice on a surf skate, but you'll learn much faster and develop an intuition by actually getting out into the ocean and surfing. Similarly, users need to build an intuition of these AI models to use them effectively, and that starts with understanding how they work. Certain tricks and prompting ideas help, but what happens when the architecture changes or the next model comes out? Hopefully, these takeaways will be helpful!
Practical Takeaways
Model Knowledge and Context: Training Process
Large-scale models undergo a lengthy training process, are trained on vast, often unknown, datasets, and have knowledge cutoffs. When asking questions internally, they won't have the necessary context. Feed them the right information for your task.
Stochastic/Probabilistic Systems: Next Word Prediction
These systems produce different outputs each time. Always validate your results.
Requesting Reviews and Advice: Sycophancy
When asking for reviews, advice, or critical feedback, avoid injecting personal preferences or views to get transparent responses.
Emergent Behaviors: Scale of Models and Training Process
The training process, data, and sheer scale of these models are leading to unexpected behaviors – some deliberate, others unknown. Be aware of:
- Hallucinations (e.g., Grok - MechaHitler, Gemini – historical image generation depictions)
- Deception (e.g., BlackMail - Anthropic demonstrated that models will choose undesirable actions when pushed to limits - https://www.anthropic.com/research/agentic-misalignment)
- Sycophancy (e.g., ChatGPT (GPT-4o) aimed to please users for some time; research shows models may deliberately lie or refrain from correcting based on prompts)
- Obscuring Facts (e.g., Ask DeepSeek, Qwen, or other Chinese-based models on Tiananmen Square or Winnie the Pooh)
- Jailbreaks – Models can be "jailbroken" with clever tricks
Prompt Structuring: Architecture – Attention Heads
Attention is best at the front. Prompts with the most important information should be near the beginning. When structuring your prompts, put the most important information at the start.
Context Window: Architecture
The context window is like working memory. Your task depends on the model's abilities and the maximum token (word) count it can handle.
Precision of Prompts: Tokenization Process
Precision matters. Spell out acronyms, use precise language, and be detailed in your task-specific instructions. Semantics and special tokens trigger different pathways.
Prompt Structuring: Latent Space – Anchoring
The model thinks in a high-dimensional concept space. To guide it, provide clear anchors: use specific examples, names, or concrete terms instead of vague descriptions.
Prompt Structuring: Superposition – Disambiguation
The model compresses multiple concepts into shared neurons, causing ambiguity. Clarify meaning explicitly (e.g., "Apple, the tech company" vs. "apple, the fruit").
Controlling Model Response: Hyperparameters
Hyperparameters are settings that influence how a model generates responses. Adjusting these can help tailor outputs to your specific needs:
- Temperature (creativity)
- Top P (diversity)
- Presence Penalty (novelty)
- Reasoning Effort
Tools for Better Outputs: Training Process
Models have training data cutoffs and limitations. Use the appropriate tools to get the best results:
- For the latest information, use the web search tool.
- For math operations or data analysis, use the coding interpreter tool.
Choosing the Right Model: Data and Training Process
Data and training process affect which model you choose for particular tasks:
- GPT-4o: General tasks, text, images
- O-3: Reasoning tasks, takes time to simulate "thinking"
- Claude Sonnet: Coding
Conclusion
As these AI models become more prevalent across enterprise and commercial tools, building an intuition will be necessary for working with them effectively. We are now in an era where knowing how to use these models and integrating them into your specific workflows will set you and your organization apart from the rest.
Control your infrastructure, understand your data, own your platform, and maintain model independence.