RAGTime with langroid

6 minute read Published: 2024-04-21

Motivation

Retrieval-Augmented Generation (RAG) was quite popular in the AI space in 2023. The big players may have moved on to newer toys but it still seems useful for those that want to run their LLM's locally.

Others have written about RAG, but, put simply, it is the idea of combining search plus a LLM. The search step first finds relevant parts of documents and then uses those parts as context for the LLM. This allows the LLM to respond with information that it has never seen before (like your personal notes). Providing the LLM with more context about your prompts tends to lower hallucinations and the search results give the user a list of citations that can be used to check the accuracy of the response.

As I've previously posted about, I run a TTS engine which produces a podcast feed of all of the articles I've saved into omnivore. An intermediate step in this pipeline converts the articles from html into plain text. I also use obsidian for my personal notes. Obsidian saves all notes in markdown files. These easily accessible file formats are perfect for feeding into an LLM with RAG.

One of the authors of langroid shows up in the /r/LocalLLaMA subreddit fairly often to promote it. So I took it for a spin. The TL;DR is it works but still has the air of a research project and some of the sharp edges are sharp.

The Code

I mainly worked off of this example in a Jupyter notebook.

You'll need to start by installing ollama and downloading an LLM that you can run locally:

$ ollama pull mistral:7b-instruct-v0.2-q4_K_M

or if you want the newest Zuckbot

 $ ollama pull llama3:8b

Create some flavor of of virtual environment. I prefer direnv with pyenv and then install the required libs:

pip install langroid langroid[hf-embeddings] jupyterlab notebook

(You can drop the last two if you don't want to run it in a notebook)

Fire up the notebook

jupyter notebook --port=9999

And put the following into the first cell. Assuming everything is installed correctly, the cell should complete successfully.

import os
import fire
import langroid as lr
import langroid.language_models as lm
from langroid.agent.special.doc_chat_agent import DocChatAgent, DocChatAgentConfig
from sentence_transformers import SentenceTransformer, CrossEncoder, util

Next, we create an agent and ask it a question to make sure the LLM is installed correctly.

llm_config = lm.OpenAIGPTConfig(
	chat_model="ollama/mistral:7b-instruct-v0.2-q8_0", #or ollama/llama3:8b
	chat_context_length=16_000, # adjust based on model
	max_output_tokens=100,
	temperature=0.2,
	stream=True,
	timeout=45,
)
agent = lr.ChatAgent(
	lr.ChatAgentConfig(
	llm=llm_config
))


agent.llm_response("What's cooking good looking?")

Which yields this LLM response:

 I'm an artificial intelligence and don't have the ability to cook, but I can certainly help you with some delicious recipe ideas! Here's one for a classic Italian dish that's sure to impress:
**Spaghetti Bolognese**
Ingredients:
- 1 pound ground beef or a mix of ground beef and pork
- 2 tablespoons olive oil
- 1 onion, finely chopped
-

(I originally ran into a bug at this point. You can read about in this PR . My PR has been merged and a release has been cut. So it shouldn't be an issue for anyone else. 🥳)

Time to ingest some documents:

doc_paths = ["/paths/to/my/saved/notes/and/articles"]
config = DocChatAgentConfig(
	# default vector-db is LanceDB,
	# using SentenceTransformers/BAAI/bge-large-en-v1.5 embedding model
	llm=llm_config,
	doc_paths=doc_paths,
	system_message="""
		Concisely answer my questions about docs. Start by asking me what I want to know.
""",
)

agent = DocChatAgent(config)
task = lr.Task(agent)
task.run()

Finally, let's ask it a question! I am a big Matt Levine fan. I remembered I had recently read about the BOXX etf in his newsletter and I don't think there is any way that this recently created ETF could be in the LLM's training data. Thus, I asked, "what is BOXX?".

It ran for a while, printed bits of the relevant documents it found (which I've omitted), and gave a quite passable answer:

BOXX is a money-market exchange-traded fund offered by Alpha Architect, which allows its holders to receive Treasury bill interest rates without paying taxes on the income until they sell their shares. It uses a complex options strategy and a tax loophole that favors ETFs (Exchange-Traded Funds).
The goal is to provide investors with a place to park their money that closely tracks short-term Treasuries, offers safety similar to short-term Treasury

Final Thoughts

My final challenge with langroid was trying to run the unit tests before submitting my PR. The tests were failing because I didn't have the open API environment variable set. I stupidly set it to a valid key and the unit tests quickly burned through 10 bucks in OpenAI api calls. The project should really default to not using OpenAI for tests and there should be a huge warning of "We are about to spend $ to run tests. Are you really sure you want to do this?!?"

I haven't found a place for this in my day-to-day workflow, but I enjoyed setting it up and learning about langroid.

What do you think? Do you use a personal LLM with fine tuning or RAG in your day-to-day? Which one & do you find it useful? Respond to me on mastodon: [email protected]