RAG (Retrieval-Augmented Generation) — Beginner देखि Clear हुने Full Guide

Document chunk कसरी हुन्छ? Query आउँदा search कसरी हुन्छ? — Step-by-Step Explanation

Introduction — RAG किन सिक्ने?

आजकल AI chatbot, document search system, company assistant जस्ता धेरै system हरू RAG (Retrieval-Augmented Generation) प्रयोग गरेर बनाइन्छन्।

यदि तपाईं:

  • Data Science सिक्दै हुनुहुन्छ
  • AI/LLM engineering सिक्न चाहनुहुन्छ
  • Chatbot वा document search system बनाउन चाहनुहुन्छ

भने RAG बुझ्नु अत्यन्त जरुरी skill हो।

यो blog मा हामी zero knowledge भएको beginner ले पनि बुझ्ने गरी step-by-step RAG explain गर्नेछौं।


RAG भनेको के हो? (Simple Definition)

RAG = Retrieval + Generation

यसको अर्थ:

  • Retrieval → document बाट relevant information खोज्ने
  • Generation → LLM ले answer generate गर्ने

Simple शब्दमा:

RAG भनेको AI लाई answer दिनु अघि document बाट सही information खोजेर त्यसको आधारमा answer generate गर्ने system हो।


Real-Life Example बाट बुझौं

Imagine गर्नुहोस्:

  • 1000 PDF files
  • Company policy
  • Research notes

User सोध्छ:

"Sick leave कति दिन पाइन्छ?"

AI ले guess गरेर answer दिनु भन्दा:

  1. Sick leave related document खोज्छ
  2. त्यो part पढ्छ
  3. त्यसको आधारमा answer दिन्छ

यो process नै RAG हो।


RAG System को Main Parts

RAG system मा सामान्यतया यी 5 component हुन्छन्:

  1. Documents
  2. Chunking
  3. Embeddings
  4. Vector Database
  5. LLM

Step 1 — Document Loading (Data तयार गर्ने)

पहिले हामीसँग ठूलो document हुन्छ।

Example:

  • Leave policy
  • Sick leave
  • Work from home

यो document ठूलो paragraph मा हुन्छ।

Problem:

Large document direct search गर्दा slow हुन्छ।

त्यसैले next step:

Chunking


Step 2 — Chunking (सबैभन्दा Important Concept)

Chunking भनेको:

ठूलो document लाई साना-साना pieces (chunks) मा split गर्नु।

Example:

Original document:

  • Section 1 → Leave policy
  • Section 2 → Sick leave
  • Section 3 → Work from home

Chunk गरेपछि:

Chunk 1:
Leave Policy
Employees can take 20 days annual leave.

Chunk 2:
Sick Leave
Employees can take sick leave when ill.

Chunk 3:
Work From Home
Employees can work from home when approved.

Chunk किन बनाइन्छ?

  • Search fast हुन्छ
  • Relevant part मात्र find हुन्छ
  • Memory efficient हुन्छ
  • Accuracy बढ्छ

Chunk Size कति हुन्छ?

Typical rule:

  • 300–500 words per chunk
  • 50–100 overlap

Example:

Chunk 1 → words 1–500
Chunk 2 → words 450–950
Chunk 3 → words 900–1400

Overlap किन?

Important sentence छुट्न नदिन।


Step 3 — Embedding (Text लाई Number बनाउने)

Computer लाई text meaning बुझाउन:

Text लाई vector (numbers) मा convert गरिन्छ।

Example:

Text:
"Sick leave allowed when ill"

Vector:
[0.78, 0.22, 0.19, 0.55]

यो process लाई भनिन्छ: Embedding


Step 4 — Vector Database (Storage)

अब सबै chunk vector लाई Vector Database मा store गरिन्छ।

Common vector databases:

  • FAISS
  • Pinecone
  • Chroma
  • Weaviate

Step 5 — Query आउँदा के हुन्छ?

User सोध्छ:

"Sick leave कति दिन पाइन्छ?"

Query Processing — Step-by-Step

Step 1 — Query लाई Vector बनाउने

"Sick leave कति दिन?"
→ Query Vector

Step 2 — Vector Search

System compare गर्छ:

Chunk 1 → Leave policy
Chunk 2 → Sick leave ← MATCH
Chunk 3 → Work from home

Step 3 — Relevant Chunk LLM लाई पठाउने

Question:
"Sick leave कति दिन?"

Context:
"Sick leave allowed for 10 days."

Step 4 — Final Answer Generate

Employees can take 10 days sick leave.

Real-Life Simple Analogy

Imagine: तपाईं library मा हुनुहुन्छ।

User सोध्छ:

"Python book कहाँ छ?"

You don't read whole library.

Instead:

  1. Index search
  2. Relevant shelf find
  3. Book open
  4. Answer give

Library = Vector DB
Search = Retrieval
Answer = Generation

That is RAG.


Simple RAG Pipeline Summary

  1. Load document
  2. Split into chunks
  3. Convert chunks to vectors
  4. Store in vector database
  5. Convert query to vector
  6. Search similar chunks
  7. Send chunks to LLM
  8. Generate final answer

Final Summary

If you remember only this — you understand RAG:

  • Document split हुन्छ
  • Vector बनाइन्छ
  • Query search हुन्छ
  • LLM answer generate गर्छ

यही नै Full RAG workflow हो।