In the Weeds: RAG from Scratch with txtai
A technical walkthrough of building a Retrieval-Augmented Generation pipeline with txtai — from document ingestion to query-time generation.
What RAG Actually Solves
Language models know a lot of things. They also confidently make things up. If you ask a general question — "explain photosynthesis" — you'll get a solid answer. If you ask about your company's Q3 revenue or the specifics of your internal API documentation, the model will either refuse or hallucinate.
Retrieval-Augmented Generation (RAG) solves this by giving the model access to your data at query time. Instead of relying on training data, the model retrieves relevant documents from your corpus and generates answers grounded in actual sources.
ttxtai is a Python library that handles the retrieval side of this equation. It's all-in-one: embeddings, vector search, document processing, and LLM integration. Let's build a RAG pipeline from scratch.
Architecture Overview
A RAG pipeline has three stages:
- Ingest: Convert documents into embeddings (numerical representations) and store them in a vector index.
- Retrieve: When a query arrives, embed it and find the most similar documents in the index.
- Generate: Pass the retrieved documents as context to an LLM, which generates an answer grounded in those sources.
ttxtai handles all three stages. Let's build each one.
Stage 1: Document Ingestion
Install txtai:
bashpip install txtai[pipeline]
Build an embeddings index from a collection of documents:
pythonfrom txtai import Embeddings
Create embeddings instance
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True
})
Your documents - could be files, database records, web pages
documents = [
(0, {"text": "Q3 revenue was $4.2M, up 12% YoY", "source": "quarterly_report.pdf"}),
(1, {"text": "The REST API requires Bearer token authentication", "source": "api_docs.md"}),
(2, {"text": "PTO requests must be submitted 2 weeks in advance", "source": "employee_handbook.pdf"}),
# ... hundreds or thousands more
]
embeddings.index(documents)
embeddings.save("my-index")
In production, you'd read these from files. txtai includes pipelines for PDF, HTML, and plain text:
pythonfrom txtai.pipeline import Textractor
textractor = Textractor()
text = textractor("quarterly_report.pdf")
For large document sets, chunk your text. A document that's 50 pages long shouldn't be a single embedding — the semantic meaning gets diluted. Split into paragraphs or fixed-size chunks with overlap:
pythondef chunk_text(text, size=500, overlap=50):
words = text.split()
chunks = []
for i in range(0, len(words), size - overlap):
chunk = " ".join(words[i:i + size])
chunks.append(chunk)
return chunks
Stage 2: Retrieval
Query the index to find relevant documents:
python# Load existing index
embeddings = Embeddings()
embeddings.load("my-index")
Search
results = embeddings.search("What was Q3 revenue?", limit=3)
for result in results:
print(f"Score: {result['score']:.3f}")
print(f"Text: {result['text']}")
print(f"Source: {result['source']}")
print()
The search returns documents ranked by semantic similarity. "What was Q3 revenue?" matches "Q3 revenue was $4.2M, up 12% YoY" even though the exact words differ — that's the power of embedding-based search over keyword search.
Tuning Retrieval
Retrieval quality makes or breaks your RAG pipeline. A few things that matter:
Embedding model selection. all-MiniLM-L6-v2 is fast and good enough for most use cases. For higher quality, try BAAI/bge-large-en-v1.5 or intfloat/e5-large-v2. Bigger models are slower but produce better embeddings.
Chunk size. Too small and you lose context. Too large and you dilute relevance. 200-500 words per chunk is a reasonable starting range. Experiment with your specific data.
Retrieval count. More retrieved documents give the LLM more context but also more noise. Start with 3-5 and adjust based on answer quality.
Hybrid search. txtai supports combining vector search with BM25 keyword search. This catches cases where exact keyword matches matter (product codes, names, dates) alongside semantic similarity:
pythonembeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True,
"hybrid": True
})
Stage 3: Generation
Now combine retrieval with generation. txtai integrates with LLMs directly:
pythonfrom txtai import Embeddings, LLM
embeddings = Embeddings()
embeddings.load("my-index")
llm = LLM("TheBloke/Mistral-7B-Instruct-v0.2-GGUF")
def rag_query(question):
# Retrieve relevant context
results = embeddings.search(question, limit=3)
context = "\n\n".join(r["text"] for r in results)
# Generate answer with context
prompt = f"""Based on the following context, answer the question.
If the answer isn't in the context, say so.
Context:
{context}
Question: {question}
Answer:"""
return llm(prompt)
answer = rag_query("What authentication does the API use?")
print(answer)
For cloud LLMs, swap the model:
python# Using OpenAI-compatible API (works with LLocalAI too)
llm = LLM("openai/gpt-4", api_key="your-key")
This is where LLocalAI shines — point txtai at your local instance for a fully private RAG pipeline. No data leaves your network.
Production Considerations
Incremental updates. Don't rebuild the entire index when documents change. txtai supports upsert operations:
pythonembeddings.upsert([(new_id, {"text": new_text, "source": new_source})])
Metadata filtering. In production, you often want to restrict search to specific document categories:
pythonresults = embeddings.search(
"SELECT text, source FROM txtai WHERE similar('your query') AND source = 'api_docs.md'"
)
Yes, that's SQL. txtai supports SQL-based queries over your embeddings, which is incredibly useful for combining semantic search with structured filters.
Evaluation. Measure retrieval quality by creating a test set of questions with known answers. Track: does the retrieval stage return the correct source document? Does the generation stage produce an accurate answer? These are different failure modes that need different fixes.
Integrations
txtai + nn8n: nn8n can trigger RAG queries as part of automation workflows. Customer support email comes in, n8n calls your RAG pipeline to find relevant documentation, and drafts a response.
txtai + Supabase: Use SSupabase MCP as your document source. Store documents in Supabase, index them with txtai, and keep everything synchronized.
txtai + SSmolagents: SSmolagents can use your RAG pipeline as a tool. Build a research agent that queries your knowledge base alongside the web.
Getting Started
pip install txtai[pipeline]- Index ten documents
- Run a search query and check the results
- Add an LLM for answer generation
- Iterate on chunk size and retrieval parameters
RAG isn't complicated. It's retrieval (find the right information) plus generation (say something useful about it). ttxtai handles both, and getting a basic pipeline running takes under an hour.
The hard part isn't the code. It's curating your data and tuning your retrieval. Start simple, measure what works, and build from there.
Ratings & Reviews
0.0
out of 5
0 ratings
No reviews yet. Be the first to share your experience.
Tools in this post
LocalAI
Drop-in OpenAI API replacement for local inference
n8n
Open-source workflow automation with AI integration
Smolagents
Lightweight AI agent framework by Hugging Face
Supabase MCP
Connect AI agents to Supabase database, auth, and edge functions
txtai
All-in-one embeddings database and RAG framework