Dockerize Local RAG with Models

Containerize with Ollama, BGE-M3, and MultiBERT for a complete local RAG system

Chunting Wu
3 min readSep 16, 2024
My girl

Previously, I introduced a generic RAG template, in which I mentioned that there are three cores needed to make a high-quality RAG.

  1. embedding with semantic understanding
  2. LLM with contextualized knowledge.
  3. compression result by rerank.

When all of these are in place, a high quality RAG will be created, regardless of whether there is fine-tuning or not.

Add high quality sources and accurate prompts, and you’ve got a complete RAG.

Simple, right?

Is it possible to containerize such a simple yet useful implementation and run it completely locally? Yes, of course.

Let’s take the three models mentioned in the previous template as an example.

  1. Ollama plus TAIDE.
  2. BGE-M3 for embedding.
  3. ms-marco-MultiBERT-L-12 as reranker

Ollama with Models

Ollama is a completely local LLM framework, you can pull down the LLM model you want to use by ollama pull.

--

--

Chunting Wu

Architect at SHOPLINE. Experienced in system design, backend development, and data engineering.