A hands-on introduction to Agentic RAG

Building an Agentic RAG workflow with LangGraph

Apr 16, 2025

Let’s be honest — the internet doesn’t need another blog post about RAG.

There are already countless articles and tutorials out there. This one? Just another drop in the ocean. Or, to borrow a line from Blade Runner: "All those blog posts will be lost in time, like tears in rain."

This Is Blade Runner GIFs - Find & Share on GIPHY

But before it fades away, there’s one thing I want to burn into your brain - like hot iron:

It’s 2025 and RAG is NOT DEAD. It is so alive.

And today, we’re going to prove it. In this post, we’ll explore Agentic RAG as part of my open-source course with

Paul Iusztin

: PhiloAgents.

AI Agents inside a videogame

Miguel Otero Pedrido

Apr 9

Read full story

Ready? Let’s begin!

The LangGraph workflow decides when to retrieve context from the long-term memory system (MongoDB)

What is Agentic RAG?

Before we get into why Agentic RAG is such a big deal, it’s worth taking a step back.

Because to really appreciate the benefits, you first need to understand what RAG is … and where its limitations are.

RAG (Retrieval-Augmented Generation) is a technique for building LLM-powered applications that can access external knowledge sources. The idea is simple: instead of relying only on what the LLM “remembers” from training, you give it relevant context at runtime - reducing hallucinations without needing to finetune the model.

The classic (or “naïve”) RAG workflow can be split into three main steps.

Retrieval step → Search for external knowledge (databases, realtime APIs, web search, etc.)
Augmentation step → Inject those documents into the LLM’s prompt.
Generation step → Let the LLM use that context to generate a response.

Here’s what that looks like in practice.

Vanilla RAG — Naïve RAG (source: Weaviate blog)

But this “naïve” approach comes with a couple of important limitations:

It only looks at one external knowledge source.
It retrieves context once … and hopes for the best

That’s fine for simple use cases. But if you want your AI system to reason, explore, and adapt to a production environment (e.g. a videogame simulation) ... it quickly starts to feel pretty limited.

And that’s exactly where Agentic RAG comes in.

Single Agent RAG System (Router).png — Agentic RAG (source: Weaviate blog)

Agentic RAG uses an agent as the retrieval component. This gives your system the ability to reason and act, instead of just running through a static pipeline.

Here’s what that unlocks:

✅ The agent can decide whether or not to retrieve information
🧰 It can choose the right tool for the job (vector DB, API, etc.)
🔁 It can refine the query until it gets what it needs

This dynamic approach is exactly what we needed for our PhiloAgents. Sometimes, they need to dig into MongoDB for detailed info - like biographical context or core concepts. Other times, they can just generate a quick response with what they already know (LLM weights).

Now that you’ve got a feel for what Agentic RAG is all about ... Let’s build it.

Implementing an Agentic RAG workflow

As I explained in the previous article of this series, every NPC response you saw in the video above is generated by a LangGraph workflow.

To get a bit more technical, this LangGraph workflow runs behind a FastAPI application, which “talks” to the Phaser game UI using WebSockets (But don’t worry, more on this in future lessons).

Now, you might be thinking: “Whoa, this workflow must be super complex.” But here’s the surprising part: It’s actually pretty simple!

The workflow itself is built around 5 main nodes (or 7, if we count the mandatory __start__ and __end__ nodes that every LangGraph workflow needs).

And if you visualize it using LangGraph Studio, it looks something like this.

PhiloAgent LangGraph workflow visualised in LangGraph Studio

The workflow kicks off with the __start__ node, which connects to the conversation_node. This is where the decision-making begins. The conversation_node evaluates the situation and chooses between two paths:

🧠 Retrieval path → Use RAG to pull in external context
✍️ Response path → Skip retrieval and generate an answer directly using Groq’s Llama 3.3 70B

Now, while both paths are useful, I’m especially interested in the first one, because that’s where the core of the Agentic RAG loop lives. The heart of this loop revolves around three key nodes:

conversation_node
retrieve_philosopher_context
summarize_context_node

These nodes work together in a feedback loop, making sure the conversation_node always has the context it needs before generating a response.

When the conversation_node decides that it needs more context to answer properly, it triggers the retrieve_context_node. This is where the magic happens.

The retrieve_context_node talks to MongoDB to fetch all the relevant information related to the user’s question.

For example:

→ If the user asks Alan Turing about the Turing Machine, this node will go fetch everything it knows about that concept from the long-term memory.

But here’s the interesting part: to interact with MongoDB, the LLM needs a tool — and in LangGraph, we can define that tool as a node using the ToolNode abstraction.

This is exactly where we set up our retriever tool in the code:

To define the retriever node (retrieve_philosopher_context), we can simply use the ToolNode class from LangGraph, passing in our list of available tools (just one in this case; the MongoDB retriever tool).

Here’s how easy that looks in code:

from langgraph.prebuilt import ToolNode
retriever_node = ToolNode(tools)

Then, we add this node to our LangGraph workflow just like any other node:

graph_builder.add_node("retrieve_philosopher_context", retriever_node)

Simple enough, right? But … there’s a catch.

Sometimes, this node will retrieve a lot of context. Maybe too much. And that comes with two main problems:

Slower response times
More tokens consumed (aka higher costs)

That’s why we need one more key player in our Agentic RAG loop: the summarize_context_node. This node takes all the retrieved information and distills it down into a concise summary, injecting that directly back into the conversation_node.

The result?

→ A clean, compact version of the relevant knowledge, ready to help the agent generate a high-quality response.

If you want a more detailed and thorough walkthrough of the full LangGraph workflow … I’ve got you covered 👇

Also make sure to read Paul’s article on building production-ready RAG agents on

Decoding ML

! Both resources are complementary, and we recommend checking out both to level up your understanding and get the full picture.

Decoding ML

Your first production-ready RAG Agent

The second lesson of the open-source PhiloAgents course: a free course on building gaming simulation agents that transform NPCs into human-like characters in an interactive game environment…

2 months ago · 34 likes · 4 comments · Paul Iusztin

Happy coding, and see you next Wednesday! 👋

A hands-on introduction to Agentic RAG

Building an Agentic RAG workflow with LangGraph

AI Agents inside a videogame

What is Agentic RAG?

Implementing an Agentic RAG workflow

Discussion about this post