AI 101: key terminology explained
Everyone’s talking about AI. Boards and government ministers are asking about it. Your competitors are deploying it. Your team is using it, possibly without telling you. And somewhere in every conversation, someone drops a term … LLM, agent, inference, embedding, MCP, tokens … and half the room nods like they understand.
Let's fix that. This is AI 101.
What is a Large Language Model (LLM)?
A Large Language Model is the engine underneath most of what people call “AI”. ChatGPT, Claude, Gemini, Llama, Kimi, Qwen … these are all LLMs. The name tells you something useful. These models are large (trained on enormous amounts of text), they’re about language (they understand and generate text), and they’re models (mathematical representations of patterns in that data).
An LLM has read, in effect, a significant portion of the internet, millions of books, scientific papers, legal documents, code repositories and more. From all of that, it has learned patterns such as how language works, how ideas connect, how problems get solved. When you ask it a question, it uses those patterns to generate a response. It’s generating a “probablistic” response, i.e. what is probably going to be the answer based on the previous inputs and knowledge.
What it’s not doing is looking things up in a database or following a script, like traditional software and algorithms do. The LLM is predicting what the most useful and coherent response looks like. That distinction matters a lot for understanding both its power and its limitations.
The “large” part is significant. The difference between a small model and a large is speed and capability. Larger models, trained on more data with more parameters, can reason across complex problems, follow nuanced instructions, and handle ambiguity in ways smaller models can’t.
What is an AI Agent?
If an LLM is a brain, an agent is that brain with hands and legs.
An AI agent can take actions like browse the web, run code, read files, send emails, query databases, fill in forms, and interact with other software systems. It doesn’t just answer a question; it works through a problem/task/workflow across multiple steps, making decisions along the way.
Think of it like the difference between asking someone a question and hiring someone to do a job. An LLM answers your question. An agent takes your goal, figures out the steps to get there, executes them, and comes back with the result.
Agents are where AI goes from interesting to genuinely transformative for business. A well-designed agent can handle an entire workflow … reading a brief, researching the market, drafting a response, and sending it for review … without a human touching it until the end. This isn’t theoretical; organisations are running these in production today.
The key concept is autonomy combined with tools. Agents have access to tools (APIs, applications, databases) and the judgment to decide when and how to use them. The quality of that judgment depends on the underlying model, the quality of the agent’s design, and how well it’s been configured for your specific context.
What is Inference?
Inference is what happens when a model is used. You type something in, the model processes it, and produces an output. That’s inference. The model is inferring a result/answer/output based on what it knows.
Why does the word matter? Because in AI, there's a hard distinction between training (building the model) and inference (using it). Training happens once, or occasionally. Inference happens every single time someone uses the model.
For business leaders, inference is where the cost and privacy questions live. Every time your team asks an AI a question, inference is happening somewhere. On whose servers? With whose data? Under what terms? These are not abstract concerns. They are operational, commercial and legal ones.
Inference speed and cost also matter. Fast, cheap inference means your team can use AI interactively throughout their day. Slow or expensive inference means they’ll use it sparingly, or find workarounds. The architecture choices you make about where and how inference runs have downstream consequences for adoption and value.
What is an Embedding?
Embedding is a way of representing information like a document, an image, audio … as a list of numbers. Specifically, numbers that capture the meaning of that information, not just its content.
This is powerful because traditional search (think Google search) finds documents that contain the words you searched for. Embedding-based search finds documents that mean what you're looking for, even if they use completely different words. With AI if you search for “quarterly revenue shortfall” you will find documents that discuss “Q3 missed targets” or “revenue below forecast” because the embeddings of those phrases sit close together in meaning-space.
This is what makes “chat with your documents” products work. Your documents get converted to embeddings, stored in what’s called a vector database, and when you ask a question, the system finds the most semantically relevant chunks and hands them to the LLM to answer from. It's not magic … it's embeddings doing the heavy lifting.
For business, this means you can give an AI access to your internal knowledge such as contracts, policies, research, reports and have it answer questions from that knowledge accurately, without the model having “learned” your data in any permanent sense.
What types of Models exist?
Not all AI models are the same. The broad categories worth knowing:
Foundation models are the large, general-purpose models trained on massive datasets. GPT-4, Claude, Gemini, Llama, Mistral, Qwen etc. are foundation models. They can do almost anything language-related, to varying degrees of quality.
Fine-tuned models are foundation models that have been further trained on specific data to perform better in a particular domain such as legal, medical, financial, customer service. They retain the general capability but are sharper in their target area.
Open models (sometimes called open-source models) are models whose weights (i.e. the actual trained parameters) have been released publicly. Llama (Meta), Mistral, Qwen (Alibaba), Kimi (Moonshot) and others are open models. Anyone can download them, run them, and modify them. This matters enormously for privacy, cost, and control, which is covered in the next blog.
Proprietary models are owned and operated by their creators. OpenAI's GPT-5, Anthropic's Claude, Google's Gemini etc. These run on the vendor’s infrastructure. You access them via API, and your data passes through their systems which is convenient, but not without implications.
Multimodal models can handle more than text such as images, audio, video, code. Recent releases of GPT, Claude and Gemini are all multimodal. This matters because the real world isn’t just text, and the most powerful workflows often combine modalities.
Embedding models are a specific type, designed purely to convert content into the numerical representations described above. They’re not for generating answers. They’re for finding and retrieving relevant content.
What is MCP (Model Context Protocol)?
MCP is an open standard that lets AI models connect to external tools, data sources and services in a consistent way. Think of it as a universal plug where instead of every AI vendor building custom integrations with every application, MCP gives developers one standard way to expose their tools so any compatible AI can use them. It’s what allows an AI agent to, say, read files from your computer, query a database, or send a Slack message, rather than just responding with text.
What are tokens?
They are the basic units an AI model uses to read and write text. They are roughly equivalent to word fragments. The word “running” might be one token, “unbelievable” might be three. When people talk about a model’s context window (how much it can hold in working memory at once) they’re measuring it in tokens. Tokens determine cost when using cloud AI services: you’re billed on how many tokens go in (your input) and how many come out (the model’s response). Longer documents and conversations consume more tokens.
Summary
AI is about pattern recognition at unprecedented scale. LLMs are extraordinarily capable at language tasks. Agents extend that capability into action. Inference is what happens when you use it. Embeddings make your own knowledge searchable and useful. Tokens are the unit of currency users pay.