What RAG and fine-tuning actually do
Retrieval-augmented generation (RAG) leaves the model's weights untouched. At query time, it searches a knowledge base — usually a vector database like Pinecone or Weaviate — pulls the most relevant chunks, and feeds them to the model as context. The model reasons over facts it was handed, not facts it memorized.
Fine-tuning changes the model itself. You take a base model and continue training it on your examples, adjusting its weights so the behavior you want becomes the default. The knowledge or style is baked in, not retrieved.
The distinction matters because it decides how your system handles change. RAG updates the moment you update the data. Fine-tuning requires a new training run every time the underlying knowledge shifts.
When to use RAG
Choose RAG when answers depend on facts that change — product docs, policies, pricing, support history, or any internal knowledge base. Because retrieval happens at query time, a document edit is live immediately with no retraining.
RAG also wins when you need citations. Since the model answers from retrieved passages, you can show users exactly which source backed each claim — essential for legal, finance, healthcare, and any domain where a wrong answer is expensive.
It's the cheaper and faster path to production for most business use cases: a knowledge assistant, a support copilot, a search-over-documents tool. You ship value without a training pipeline.
When to use fine-tuning
Fine-tuning earns its keep when you need consistent style, tone, or output format that prompting can't reliably enforce — a specific JSON schema, a brand voice, a domain-specific phrasing the base model keeps drifting away from.
It also helps with narrow classification and extraction tasks at scale, where a smaller fine-tuned model can match a larger general model at a fraction of the cost and latency.
What fine-tuning does not do well is inject fresh facts. Teaching a model new knowledge by fine-tuning is expensive, leaks at the edges, and goes stale. For knowledge, reach for RAG.
The pragmatic answer: usually both, in order
In practice the strongest systems combine them: RAG to ground answers in current data, and a lightly fine-tuned model to lock the format and behavior. But order matters. Start with RAG and good prompting, measure where it falls short, and only fine-tune to close a specific, named gap.
We scope this choice during the Blueprint phase — before any training budget is spent — so the architecture matches the outcome you're actually paying for.