Creating Your Own PDF-Bot: A Guide to Building a Local RAG for PDFs with Langchain, Chroma, and Ollama
In this post, we’ll dive into the process of building a powerful PDF-Bot—a lightweight, local retrieval-augmented generation (RAG) system that lets you index and query PDF documents. By combining Langchain, Chroma, and Ollama, you can develop a tool that processes PDFs, extracts useful text chunks, stores them in a vector database, and generates detailed answers using an LLM. Whether you’re a developer eager to expand your project toolkit or simply curious about modern document processing, this guide is for you.

Overview of PDF-Bot
PDF-Bot is designed to work entirely on your local machine. It automatically ingests PDF files from a designated folder, splits the content into manageable pieces, indexes these chunks in a Chroma vector database, and finally uses an Ollama-powered language model to generate context-based answers. This system is ideal for those who want to explore retrieval-augmented generation without relying on external APIs.
Key features include:
PDF Ingestion: Automatically load and process PDF files from a designated folder.
Text Splitting: Utilize Langchain's RecursiveCharacterTextSplitter to break documents into chunks.
Vector Database: Persistently store document chunks using Chroma.
Retrieval-Augmented Generation: Perform similarity searches to retrieve relevant context and generate thorough answers.
Modular Architecture: Benefit from a clear separation of concerns across various modules.
Prerequisites
Before we start, ensure you have:
Python 3.8+ installed
Basic terminal/Python knowledge
Ollama installed and running locally
4-8GB RAM free (for LLM operations)
Structure
The project follows a modular design, which makes it easy to maintain and extend. Here’s a quick look at the file organization:
Step 1: Project Setup
1.1 Create Project Structure
1.2 Install Dependencies
Create requirements.txt:
Next, install the dependencies using:
1.3 Downloading and Serving Ollama
Before you run PDF-Bot, you must download and serve Ollama, the language model backend powering our answer generation. Follow these steps to get it up and running:
Download Ollama: Visit the Ollama official website to download the latest version for your operating system.
Install Ollama: Follow the installation instructions provided on the website to set up Ollama on your machine.
Run the Ollama Server: Once installed, start the Ollama server. By default, the server listens on http://localhost:11434. If you need a different URL or port, update the BASE_URL in config.py.
Verify the Server: Open a browser or use a command-line tool (like curl) to ensure that Ollama is running at the specified URL.
For example, run:
You should receive a response indicating that the server is active.
Step 2: Core Components Explained
2.1 Configuration (config.py)
Key Settings:
BASE_URL: Your Ollama server URL
CHROMA_PATH: Chroma DB will be in this directory
DATA_FOLDER: Where your PDFs live
CHAT_MODEL: Local LLM via Ollama
EMBEDDING_MODEL: Text embedding model
2.2 Document Processing Pipeline
PDF Loading (document_loader.py):
Text Splitting (text_splitter.py):
Chunk Size: This parameter (set to 800) determines the maximum number of characters in each text chunk. It ensures that large documents are broken down into manageable pieces for processing.
Chunk Overlap: This parameter (set to 80) specifies the number of characters that will be shared between consecutive chunks. The overlap helps maintain context at the boundaries, ensuring that important information isn’t lost if it spans across two chunks.
Ollama Embedding (ollama_embeddings.py):
Vector Database (database.py):
Finally, processing the documents (process_documents.py):
2.2 Setting up the Main App
PDF Loading (llm_handler.py):
And, at last, the entry point of the program (main.py):
Step 3: Initialize Ollama Models
Run these commands in your terminal:
Keep Ollama running in background!
Step 4: Add Your First PDF
Place PDFs in ./data folder
Example test document:
Step 5: Run the Bot!
5.1 First-Time Setup
The system will:
Process PDFs → Split text → Create vector database
Perform similarity search
Generate answer using Llama3
5.2 Sample Output
Troubleshooting Tips
Ollama Not Responding?
Missing Dependencies?
Empty Chroma DB?
Delete ./chroma_db folder and rerun
Further Improvement
This is a really simple project but we can extend it further by:
Add web interface with Gradio
Support other document types (DOCX, TXT)
Implement batch processing for large PDF collections
Conclusion
You've just built a fully local RAG system for PDF analysis! This tutorial covered:
PDF ingestion and text processingVector database setup with Chroma
Local LLM integration via Ollama
If you are stuck anywhere, feel free to check my Github repo for the source code.