Let’s be honest — the internet doesn’t need another blog post about RAG.
There are already countless articles and tutorials out there. This one? Just another drop in the ocean. Or, to borrow a line from Blade Runner: "All those blog posts will be lost in time, like tears in rain."
But before it fades away, there’s one thing I want to burn into your brain - like hot iron:
It’s 2025 and RAG is NOT DEAD. It is so alive.
And today, we’re going to prove it. In this post, we’ll explore Agentic RAG as part of my open-source course with
: PhiloAgents.Ready? Let’s begin!

What is Agentic RAG?
Before we get into why Agentic RAG is such a big deal, it’s worth taking a step back.
Because to really appreciate the benefits, you first need to understand what RAG is … and where its limitations are.
RAG (Retrieval-Augmented Generation) is a technique for building LLM-powered applications that can access external knowledge sources. The idea is simple: instead of relying only on what the LLM “remembers” from training, you give it relevant context at runtime - reducing hallucinations without needing to finetune the model.
The classic (or “naïve”) RAG workflow can be split into three main steps.
Retrieval step → Search for external knowledge (databases, realtime APIs, web search, etc.)
Augmentation step → Inject those documents into the LLM’s prompt.
Generation step → Let the LLM use that context to generate a response.
Here’s what that looks like in practice.

But this “naïve” approach comes with a couple of important limitations:
It only looks at one external knowledge source.
It retrieves context once … and hopes for the best
That’s fine for simple use cases. But if you want your AI system to reason, explore, and adapt to a production environment (e.g. a videogame simulation) ... it quickly starts to feel pretty limited.
And that’s exactly where Agentic RAG comes in.

Agentic RAG uses an agent as the retrieval component. This gives your system the ability to reason and act, instead of just running through a static pipeline.
Here’s what that unlocks:
✅ The agent can decide whether or not to retrieve information
🧰 It can choose the right tool for the job (vector DB, API, etc.)
🔁 It can refine the query until it gets what it needs
This dynamic approach is exactly what we needed for our PhiloAgents. Sometimes, they need to dig into MongoDB for detailed info - like biographical context or core concepts. Other times, they can just generate a quick response with what they already know (LLM weights).
Now that you’ve got a feel for what Agentic RAG is all about ... Let’s build it.
Implementing an Agentic RAG workflow
As I explained in the previous article of this series, every NPC response you saw in the video above is generated by a LangGraph workflow.
To get a bit more technical, this LangGraph workflow runs behind a FastAPI application, which “talks” to the Phaser game UI using WebSockets (But don’t worry, more on this in future lessons).
Now, you might be thinking: “Whoa, this workflow must be super complex.” But here’s the surprising part: It’s actually pretty simple!
The workflow itself is built around 5 main nodes (or 7, if we count the mandatory __start__
and __end__
nodes that every LangGraph workflow needs).
And if you visualize it using LangGraph Studio, it looks something like this.
The workflow kicks off with the __start__
node, which connects to the conversation_node
. This is where the decision-making begins. The conversation_node
evaluates the situation and chooses between two paths:
🧠 Retrieval path → Use RAG to pull in external context
✍️ Response path → Skip retrieval and generate an answer directly using Groq’s Llama 3.3 70B
Now, while both paths are useful, I’m especially interested in the first one, because that’s where the core of the Agentic RAG loop lives. The heart of this loop revolves around three key nodes:
conversation_node
retrieve_philosopher_context
summarize_context_node
These nodes work together in a feedback loop, making sure the conversation_node
always has the context it needs before generating a response.
When the conversation_node
decides that it needs more context to answer properly, it triggers the retrieve_context_node
. This is where the magic happens.
The retrieve_context_node
talks to MongoDB to fetch all the relevant information related to the user’s question.
For example:
→ If the user asks Alan Turing about the Turing Machine, this node will go fetch everything it knows about that concept from the long-term memory.
But here’s the interesting part: to interact with MongoDB, the LLM needs a tool — and in LangGraph, we can define that tool as a node using the ToolNode abstraction.
This is exactly where we set up our retriever tool in the code:
To define the retriever node (retrieve_philosopher_context
), we can simply use the ToolNode class from LangGraph, passing in our list of available tools (just one in this case; the MongoDB retriever tool).
Here’s how easy that looks in code:
from langgraph.prebuilt import ToolNode
retriever_node = ToolNode(tools)
Then, we add this node to our LangGraph workflow just like any other node:
graph_builder.add_node("retrieve_philosopher_context", retriever_node)
Simple enough, right? But … there’s a catch.
Sometimes, this node will retrieve a lot of context. Maybe too much. And that comes with two main problems:
Slower response times
More tokens consumed (aka higher costs)
That’s why we need one more key player in our Agentic RAG loop: the summarize_context_node
. This node takes all the retrieved information and distills it down into a concise summary, injecting that directly back into the conversation_node
.
The result?
→ A clean, compact version of the relevant knowledge, ready to help the agent generate a high-quality response.
If you want a more detailed and thorough walkthrough of the full LangGraph workflow … I’ve got you covered 👇
Also make sure to read Paul’s article on building production-ready RAG agents on
! Both resources are complementary, and we recommend checking out both to level up your understanding and get the full picture.Happy coding, and see you next Wednesday! 👋
Very good ideas. Appreciate the smooth flow.
Great post! Thanks for the shout-out and amazing collaboration 🤟