Dockerize Local RAG with Models

Containerize with Ollama, BGE-M3, and MultiBERT for a complete local RAG system

3 min readSep 16, 2024

Previously, I introduced a generic RAG template, in which I mentioned that there are three cores needed to make a high-quality RAG.

When all of these are in place, a high quality RAG will be created, regardless of whether there is fine-tuning or not.

Add high quality sources and accurate prompts, and you’ve got a complete RAG.

Simple, right?

Is it possible to containerize such a simple yet useful implementation and run it completely locally? Yes, of course.

Let’s take the three models mentioned in the previous template as an example.

Ollama is a completely local LLM framework, you can pull down the LLM model you want to use by ollama pull.