In the Weeds: RAG from Scratch with txtai

joey-ioApril 12, 20264 min read

A technical walkthrough of building a Retrieval-Augmented Generation pipeline with txtai — from document ingestion to query-time generation.

technical in-the-weeds rag txtai tutorial python

What RAG Actually Solves

Language models know a lot of things. They also confidently make things up. If you ask a general question — "explain photosynthesis" — you'll get a solid answer. If you ask about your company's Q3 revenue or the specifics of your internal API documentation, the model will either refuse or hallucinate.

Retrieval-Augmented Generation (RAG) solves this by giving the model access to your data at query time. Instead of relying on training data, the model retrieves relevant documents from your corpus and generates answers grounded in actual sources.

ttxtai is a Python library that handles the retrieval side of this equation. It's all-in-one: embeddings, vector search, document processing, and LLM integration. Let's build a RAG pipeline from scratch.

Architecture Overview

A RAG pipeline has three stages:

Ingest: Convert documents into embeddings (numerical representations) and store them in a vector index.
Retrieve: When a query arrives, embed it and find the most similar documents in the index.
Generate: Pass the retrieved documents as context to an LLM, which generates an answer grounded in those sources.

ttxtai handles all three stages. Let's build each one.

Stage 1: Document Ingestion

Install txtai:

bashpip install txtai[pipeline]

Build an embeddings index from a collection of documents:

pythonfrom txtai import Embeddings
Create embeddings instance
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})
Your documents - could be files, database records, web pages
documents = [
    (0, {"text": "Q3 revenue was $4.2M, up 12% YoY", "source": "quarterly_report.pdf"}),
    (1, {"text": "The REST API requires Bearer token authentication", "source": "api_docs.md"}),
    (2, {"text": "PTO requests must be submitted 2 weeks in advance", "source": "employee_handbook.pdf"}),
    # ... hundreds or thousands more
]
embeddings.index(documents)
embeddings.save("my-index")

In production, you'd read these from files. txtai includes pipelines for PDF, HTML, and plain text:

pythonfrom txtai.pipeline import Textractor
textractor = Textractor()
text = textractor("quarterly_report.pdf")

For large document sets, chunk your text. A document that's 50 pages long shouldn't be a single embedding — the semantic meaning gets diluted. Split into paragraphs or fixed-size chunks with overlap:

pythondef chunk_text(text, size=500, overlap=50):
    words = text.split()
    chunks = []
    for i in range(0, len(words), size - overlap):
        chunk = " ".join(words[i:i + size])
        chunks.append(chunk)
    return chunks

Stage 2: Retrieval

Query the index to find relevant documents:

python# Load existing index
embeddings = Embeddings()
embeddings.load("my-index")
Search
results = embeddings.search("What was Q3 revenue?", limit=3)
for result in results:
    print(f"Score: {result['score']:.3f}")
    print(f"Text: {result['text']}")
    print(f"Source: {result['source']}")
    print()

The search returns documents ranked by semantic similarity. "What was Q3 revenue?" matches "Q3 revenue was $4.2M, up 12% YoY" even though the exact words differ — that's the power of embedding-based search over keyword search.

Tuning Retrieval

Retrieval quality makes or breaks your RAG pipeline. A few things that matter:

Embedding model selection. all-MiniLM-L6-v2 is fast and good enough for most use cases. For higher quality, try BAAI/bge-large-en-v1.5 or intfloat/e5-large-v2. Bigger models are slower but produce better embeddings.

Chunk size. Too small and you lose context. Too large and you dilute relevance. 200-500 words per chunk is a reasonable starting range. Experiment with your specific data.

Retrieval count. More retrieved documents give the LLM more context but also more noise. Start with 3-5 and adjust based on answer quality.

Hybrid search. txtai supports combining vector search with BM25 keyword search. This catches cases where exact keyword matches matter (product codes, names, dates) alongside semantic similarity:

pythonembeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True,
    "hybrid": True
})

Stage 3: Generation

Now combine retrieval with generation. txtai integrates with LLMs directly:

pythonfrom txtai import Embeddings, LLM
embeddings = Embeddings()
embeddings.load("my-index")
llm = LLM("TheBloke/Mistral-7B-Instruct-v0.2-GGUF")
def rag_query(question):
    # Retrieve relevant context
    results = embeddings.search(question, limit=3)
    context = "\n\n".join(r["text"] for r in results)
# Generate answer with context
    prompt = f"""Based on the following context, answer the question.
If the answer isn't in the context, say so.
Context:
{context}
Question: {question}
Answer:"""
return llm(prompt)
answer = rag_query("What authentication does the API use?")
print(answer)

For cloud LLMs, swap the model:

python# Using OpenAI-compatible API (works with LLocalAI too)
llm = LLM("openai/gpt-4", api_key="your-key")

This is where LLocalAI shines — point txtai at your local instance for a fully private RAG pipeline. No data leaves your network.

Production Considerations

Incremental updates. Don't rebuild the entire index when documents change. txtai supports upsert operations:

pythonembeddings.upsert([(new_id, {"text": new_text, "source": new_source})])

Metadata filtering. In production, you often want to restrict search to specific document categories:

pythonresults = embeddings.search(
    "SELECT text, source FROM txtai WHERE similar('your query') AND source = 'api_docs.md'"
)

Yes, that's SQL. txtai supports SQL-based queries over your embeddings, which is incredibly useful for combining semantic search with structured filters.

Evaluation. Measure retrieval quality by creating a test set of questions with known answers. Track: does the retrieval stage return the correct source document? Does the generation stage produce an accurate answer? These are different failure modes that need different fixes.

Integrations

txtai + nn8n: nn8n can trigger RAG queries as part of automation workflows. Customer support email comes in, n8n calls your RAG pipeline to find relevant documentation, and drafts a response.

txtai + Supabase: Use SSupabase MCP as your document source. Store documents in Supabase, index them with txtai, and keep everything synchronized.

txtai + SSmolagents: SSmolagents can use your RAG pipeline as a tool. Build a research agent that queries your knowledge base alongside the web.

Getting Started

pip install txtai[pipeline]
Index ten documents
Run a search query and check the results
Add an LLM for answer generation
Iterate on chunk size and retrieval parameters

RAG isn't complicated. It's retrieval (find the right information) plus generation (say something useful about it). ttxtai handles both, and getting a basic pipeline running takes under an hour.

The hard part isn't the code. It's curating your data and tuning your retrieval. Start simple, measure what works, and build from there.

Share this post:

Ratings & Reviews

0.0

out of 5

0 ratings

No reviews yet. Be the first to share your experience.

Tools in this post

LocalAI

Drop-in OpenAI API replacement for local inference

n8n

Open-source workflow automation with AI integration

Smolagents

Lightweight AI agent framework by Hugging Face

Supabase MCP

Connect AI agents to Supabase database, auth, and edge functions

txtai

All-in-one embeddings database and RAG framework