Learn Architecture

Source · Google Cloud Generative AI Leader exam guide + Vertex AI grounding, RAG, and tuning documentation

Why this matters

Exam Guide, Domain: Techniques to improve gen AI output

Most of the quality a business gets from gen AI comes from technique, not from swapping models. Better prompts, grounding on trusted data, and retrieval-augmented generation (RAG) turn a generic model into a reliable assistant that cites facts and hallucinates less. A leader who understands these levers can raise output quality at low cost and knows when the expensive option -- fine-tuning -- is actually warranted.

The concept

Google Cloud docs: Grounding and retrieval-augmented generation on Vertex AI

Prompt design shapes output by giving clear instructions, context, examples, and constraints. Grounding connects the model to authoritative sources so answers are based on your data or verified references rather than only the model's memory. RAG is a common grounding pattern: relevant documents are retrieved (often via embeddings and vector search) and inserted into the prompt so the model answers from that supplied context, and can cite it.

Fine-tuning further trains the model on domain examples to change its behavior or style. The decision order for most teams is: prompt first, then ground or add RAG, and fine-tune only when prompting plus grounding still cannot deliver the needed quality. Prompting and grounding are cheaper, faster, and easier to update than retraining.

Worked scenario

Exam Guide: reduce hallucination and ground responses

A support team wants answers drawn strictly from the current knowledge base. Fine-tuning would bake in today's articles and go stale. Instead they use RAG: each question retrieves the top matching articles by embedding similarity, and the model answers from those, citing sources. When an article changes, they re-index -- no retraining. Hallucination drops because the model is told to answer only from retrieved context, and when nothing matches, to say it does not know. This is the grounding-first pattern the exam rewards.

How it connects

Google Cloud: Responsible AI and grounding practices

Grounding relies on embeddings and vector search from the fundamentals, and runs on the Vertex AI stack. It is also a Responsible AI control: citing sources and refusing when unsure supports transparency and human oversight. Agents (next topic) often use RAG as one of their tools.

Common traps

Jumping to fine-tuning first when better prompts or grounding would solve the problem more cheaply and stay current.
Believing grounding eliminates hallucination -- it reduces it, but prompts must still instruct the model to answer only from retrieved context.
Treating RAG data as static: if the source is not re-indexed when it changes, answers go stale even though the model was never retrained.

Key takeaways

Order of levers: prompt, then ground or add RAG, then fine-tune only if a gap remains.
RAG retrieves relevant documents into the prompt so the model answers from -- and can cite -- your data.
Grounding reduces hallucination and supports transparency; it does not make the model infallible.