Cheat sheetAIS-05

RAG & Memory

AI Specialization / RAG & Memory

RAG fetches relevant text at query time and feeds it to the model, grounding answers in real sources. Memory does the same trick across turns and sessions.

Index pipelineChunk documents -> embed each chunk into a vector -> store in a vector store that supports nearest-neighbor search.

Retrieve + generateEmbed the query -> find nearest chunks (semantic search) -> optionally re-rank -> augment the prompt -> answer with citations.

ChunkingSize/overlap are tuning knobs: too large dilutes relevance and wastes context; too small loses meaning.

MemoryShort-term = in-context transcript; long-term = facts/summaries persisted across sessions, often via the same embed-and-retrieve pipeline.

⚑

Build RAG as: chunk sensibly, embed, store, retrieve top-k, re-rank, and inject with a citation format. Then evaluate retrieval quality — if the right chunk is not retrieved, the answer will still be wrong.

Grounded answerAsk 'how do I factory-reset?' -> retrieve the reset procedure chunk -> model answers the real steps and cites the manual section.

Honest missIf nothing relevant is retrieved, a good system says 'I do not have that information' rather than inventing a procedure.

index: chunk -> embed -> store in vector store
query: embed -> nearest-neighbor search -> re-rank -> augment -> generate
long-term memory reuses the same embed-and-retrieve pipeline

ragembeddingsvector-storechunkingmemorygrounding

review in 6d

Synced · Neon

AI Specialization / RAG & Memory / AIS-05

RAG & Memory

RAG fetches relevant text at query time and feeds it to the model, grounding answers in real sources. Memory does the same trick across turns and sessions.

Key points

Index pipelineChunk documents -> embed each chunk into a vector -> store in a vector store that supports nearest-neighbor search.

Retrieve + generateEmbed the query -> find nearest chunks (semantic search) -> optionally re-rank -> augment the prompt -> answer with citations.

ChunkingSize/overlap are tuning knobs: too large dilutes relevance and wastes context; too small loses meaning.

MemoryShort-term = in-context transcript; long-term = facts/summaries persisted across sessions, often via the same embed-and-retrieve pipeline.

⚑

Examples

Grounded answerAsk 'how do I factory-reset?' -> retrieve the reset procedure chunk -> model answers the real steps and cites the manual section.

Honest missIf nothing relevant is retrieved, a good system says 'I do not have that information' rather than inventing a procedure.

Reference

index: chunk -> embed -> store in vector store
query: embed -> nearest-neighbor search -> re-rank -> augment -> generate
long-term memory reuses the same embed-and-retrieve pipeline

ragembeddingsvector-storechunkingmemorygrounding

On this pageDefinition Key points Examples Reference

This topicnewReview in 6d

0% mastery · AI Specialization