Member-only story
Dockerize Local RAG with Models
Containerize with Ollama, BGE-M3, and MultiBERT for a complete local RAG system
Previously, I introduced a generic RAG template, in which I mentioned that there are three cores needed to make a high-quality RAG.
- embedding with semantic understanding
- LLM with contextualized knowledge.
- compression result by rerank.
When all of these are in place, a high quality RAG will be created, regardless of whether there is fine-tuning or not.
Add high quality sources and accurate prompts, and you’ve got a complete RAG.
Simple, right?
Is it possible to containerize such a simple yet useful implementation and run it completely locally? Yes, of course.
Let’s take the three models mentioned in the previous template as an example.
Ollama
plusTAIDE
.BGE-M3
for embedding.ms-marco-MultiBERT-L-12
as reranker
Ollama with Models
Ollama is a completely local LLM framework, you can pull down the LLM model you want to use by ollama pull
.