Learn Architecture

Source · Google Cloud Generative AI Leader exam guide + Google Cloud generative AI documentation

Why this matters

Exam Guide, Domain: Fundamentals of Gen AI

Every Google Cloud generative AI product sits on top of the same core ideas: large language models, tokens, prompts, and embeddings. As a Gen AI Leader you rarely write model code, but you must judge feasibility, cost, and risk. If you cannot tell a leader when a foundation model is the right tool versus a classic rule engine or a fine-tuned classifier, you will greenlight projects that fail. The fundamentals are the vocabulary that lets you translate a business goal into a solvable AI problem.

The concept

Google Cloud docs: Introduction to generative AI

Generative AI creates new content -- text, images, code, audio -- rather than only classifying or predicting a label. Modern text generation is driven by large language models (LLMs), a type of foundation model: a very large model pre-trained on broad data that can be adapted to many downstream tasks.

LLMs work with tokens, not whole words. A token is a chunk of text (often a word-piece); billing and context limits are measured in tokens. The model predicts the next token given prior tokens, one at a time. A prompt is the input instruction plus any context you supply. An embedding is a numeric vector that captures the meaning of text, so that similar meanings sit near each other in vector space -- this is what powers semantic search and retrieval.

Worked scenario

Exam Guide: identify appropriate Gen AI use cases

A retailer wants to auto-summarize thousands of product reviews. A leader asks: is this generative? Yes -- it produces new summary text, so an LLM fits. Next: what are the risks? The model may hallucinate, stating facts not present in the reviews. Because output is billed per token, very long reviews raise cost, so you cap input length. Because ratings must be exact, you keep the numeric star average in classic code and let the LLM handle only the prose. This split -- LLM for language, deterministic code for facts and math -- is the fundamentals reasoning a leader is expected to show.

How it connects

Google Cloud: Generative AI on Vertex AI overview

These primitives recur throughout the stack. Tokens explain Vertex AI pricing. Embeddings underpin grounding and retrieval-augmented generation. The tendency of LLMs to hallucinate is exactly why Responsible AI, grounding, and human oversight exist. Master the fundamentals and every later topic becomes an application of them.

Common traps

Assuming all AI is generative -- classification, forecasting, and recommendation are often better solved by traditional ML, not an LLM.
Confusing tokens with words: pricing and context windows are in tokens, and one word can be several tokens.
Trusting fluent output as factual -- a confident, well-written answer can still be a hallucination.

Key takeaways

Generative AI creates new content; LLMs are foundation models adapted to many tasks.
Tokens drive cost and context limits; embeddings encode meaning for semantic search.
Hallucination is intrinsic to LLMs, which motivates grounding and human oversight.