You want AI that knows your business. Not generic AI that gives generic answers. AI that knows your pricing, your policies, your 300-page operations manual, and the weird exception you make for enterprise clients on Tuesdays.
There are three ways to get there. Prompt engineering costs $0. RAG costs $3K-12K to set up. Fine-tuning costs $5K-25K per model. Most businesses only need one of them. Here is how to decide.
Quick decision: If AI needs to reference your documents, use RAG. If AI needs to change how it writes or thinks, use fine-tuning. If neither, good prompts are enough. 80% of businesses we work with need RAG and nothing else.
Prompt Engineering: The Free Option That Gets You 60% There
Prompt engineering is writing better instructions for ChatGPT, Claude, or whatever model you use. No code. No infrastructure. Just clearer, more specific directions.
What it costs
$0. Just your time. You can get surprisingly far by spending an afternoon writing a solid system prompt. A well-crafted prompt with role definitions, formatting rules, and example outputs can handle 60% of business tasks out of the box.
When it works
- General content drafting (blog posts, emails, social media)
- Summarizing meeting notes or long documents you paste in
- Brainstorming product names, ad copy, or marketing angles
- Translating or reformatting text between styles
When it fails
The moment AI needs to know something specific to your business, prompts break down. "Write me a proposal" works. "What's our refund policy for Product X?" fails. The model simply does not have that information. You can paste your policy into the chat, sure. But that does not scale when you have 500 products, 10,000 case files, or a knowledge base that changes weekly.
RAG: The Right Answer for 80% of Businesses
Retrieval-Augmented Generation sounds complicated. It is not. Here is the plain English version: before the AI answers your question, it searches your documents first. Then it uses what it found to give you an accurate, grounded answer.
How it actually works
Your documents (PDFs, Google Docs, emails, SOPs, Notion pages) get split into small chunks and stored in a vector database. When someone asks a question, the system finds the 5-10 most relevant chunks, feeds them to the AI alongside the question, and the AI generates an answer based on your actual data. Not its training data. Your data.
What it costs
- Setup: $3,000-$12,000 depending on data volume and complexity
- Monthly: $200-$500 for hosting, embeddings, and model API calls
- Data prep: 2-4 weeks to clean, chunk, and index your documents
Real example
A law firm uploads 10,000 case files into a RAG system. A paralegal asks: "Find precedent for wrongful termination involving non-compete clauses in New York." The system searches across all 10,000 files, pulls the three most relevant cases, and summarizes the precedent with direct citations. Without RAG, the paralegal spends 6 hours doing that manually. With RAG, it takes 30 seconds.
Why RAG wins for most businesses
- Works with any model. GPT-4o, Claude, Llama, Mistral. Swap models without rebuilding anything.
- Your data stays current. Update a document and the AI knows about it immediately.
- Answers are traceable. The system can cite which document it pulled from, so you can verify accuracy.
- Works with both cloud and local models like Ollama for sensitive data.
Tools like LlamaIndex make building RAG pipelines significantly easier than rolling your own from scratch. For most teams, RAG plus solid prompts is the entire solution.
Fine-Tuning: When Your AI Needs to Think Like Your Industry
Fine-tuning is fundamentally different from RAG. RAG gives the model reference material. Fine-tuning changes how the model behaves. You are teaching it new patterns, new terminology, new ways of writing.
When RAG is not enough
- Medical terminology: General models get clinical language wrong in subtle ways. A fine-tuned model trained on 2,000 radiology reports writes like a radiologist, not like a college student who read a Wikipedia article.
- Legal brief formatting: Courts require exact structures, citation formats, and procedural language. Prompts get you close. Fine-tuning gets you compliant.
- Brand voice at scale:If you need 5,000 product descriptions that all sound like your brand, not like "AI wrote this," fine-tuning is how you get there.
What it costs
- Per model: $5,000-$25,000 depending on model size and data quality
- Data requirements: 500-5,000 examples of the behavior you want (input/output pairs)
- Timeline: 4-8 weeks including data preparation, training, and evaluation
- Ongoing: Re-training costs when your requirements change
One sentence on LoRA: it is a technique that lets you fine-tune a model by adjusting only 1-2% of its parameters, which drops the cost by 10x and makes fine-tuning feasible on a single GPU instead of a cluster.
The Decision Flowchart
Forget the nuance for a moment. Here is the simple version:
Step 1: Does the AI need your specific data?
If no, stop. Prompt engineering is enough. Write a good system prompt with role definitions, constraints, and examples. You are done.
Step 2: Does it need to reference your documents?
If yes, build RAG. Upload your knowledge base, connect a vector database, and let the AI search your data before answering. This covers customer support, internal Q&A, sales enablement, legal research, and 90% of business use cases.
Step 3: Does it need to change HOW it writes or thinks?
If the model needs to produce outputs in a specific style, use domain-specific terminology natively, or follow formatting rules that prompts cannot reliably enforce, then fine-tune. This is the exception, not the rule.
Can You Combine Them?
Yes. The best AI systems use all three together.
A fine-tuned model that speaks your industry language, connected to a RAG pipeline that searches your knowledge base, wrapped in a system prompt that defines the persona and guardrails. That is the gold standard. Enterprise companies like Bloomberg (BloombergGPT) and Harvey (legal AI) use exactly this stack.
But here is the thing: start with RAG alone. Ship it. Measure the accuracy. Collect the edge cases where it fails. Only then do you have the data to justify fine-tuning. Companies that skip straight to fine-tuning waste $15K-25K learning what they could have learned for $5K with a RAG prototype.
- Week 1-4: Build RAG system, connect your documents, deploy to a test group
- Week 5-8: Collect feedback, identify failure patterns, optimize prompts
- Week 9-12 (only if needed): Fine-tune on the specific failure cases RAG could not handle
Cloud vs Local: Where Should Your AI Run?
This is a separate decision from RAG vs fine-tuning, but it comes up in every conversation. Quick breakdown:
- Cloud (OpenAI, Anthropic, Google): Fastest to deploy, lowest upfront cost, best model quality. Fine for most businesses. Your data is processed on their servers.
- Local (Ollama, vLLM): Data never leaves your network. Required for HIPAA compliance, attorney-client privilege, financial regulations, and government contracts.
If you handle patient records, legal case files, or regulated financial data, you likely need private AI infrastructure. Everyone else should start with cloud APIs and only move to local when there is a regulatory or cost reason to do so.
The AI customization decision is not about picking the fanciest technology. It is about matching the solution to the problem. Start cheap, measure what breaks, and invest only where the data tells you to.
Need help figuring out which approach fits your business? Talk to our AI implementation team and we will map your use case to the right stack in a 30-minute call. No pitch, just clarity.