The Architecture of Phone Calling Agents

Phone Calling Agents Course | Lesson 0

Nov 12, 2025

You have no idea how long I’ve been waiting for this, builders…

I’m so excited to officially kick off this course with my friend

. A few months ago, we launched our first collaboration, a multimodal agent connected to WhatsApp. This time, we’re taking things even further.

Our mission? To teach you how a real-time voice agent works, and not just how it listens and responds, but how it can actually handle real phone calls.

🙋 “Wait, Miguel… are you saying I’ll be able to call a number and have an AI agent pick up?” Exactly.

And that’s not all, you’ll even learn how to make the agent call you back. But let’s not get ahead of ourselves … we’ll cover that in later lessons.

In this first article of the series, our goal is simple:

✅ Walk you through what we’re building, how the course is structured, and what kind of content you can expect along the way.

Ready to build something amazing? Let’s go!

What will you learn in this course?

This diagram shows the application you’ll have built by the end of the course.

Before we get started, there’s something important I need to say:

⚠️ This is not a simple course.

In other words, it’s not your typical plug-and-play tutorial where you have an app running locally in 5 minutes. This course brings together a wide range of technologies and concepts (don’t worry, we’ll walk through each one) and it will take some hands-on work from you to get the most out of it.

Our goal is that, by the time you finish, you’ll feel completely confident adapting everything you’ve learned to your own projects and use cases.

Now that the disclaimer’s out of the way, let’s talk about the fun part: what are we actually building?

🏢 We’re going to create a real estate company, but with a twist … the employees will be realtime voice agents.

gray wooden house — Image by todd kent (source: Unsplash)

This system will be able to take calls, answer property questions in real time, and even call potential clients back — handling both inbound and outbound conversations seamlessly.

The diagram above shows all the tools we’ll be using. You can also find a breakdown of each one, along with the lessons we’ll cover week by week, in the next section.

What’s the structure of the course?

Building an agent-based real estate company isn’t a simple task, so we’ve divided this course into four lessons (not including today’s Lesson 0).

Each new lesson will bring a significant batch of fresh code, all of which will be pushed to our GitHub repository as we progress.

If you check the repo right now, you’ll notice it’s empty by design — we want to build up the system step by step, gaining a deeper understanding as the architecture evolves week by week.

🧑‍💻 Follow along and check the repo here

Now, let me give you a short overview of what we’ll cover in each lesson.

Lesson 1 - FastRTC LangGraph Agent

📰 Article
Building Realtime Voice Agents with FastRTC
Miguel Otero Pedrido
·
Nov 19
Read full story
🎙️ Live Session → November 23

In this first lesson, we’ll show you how to transform your LangGraph agent into a FastRTC-powered agent

In this first lesson, we’ll explore FastRTC, an open-source library that makes it easy to build real-time audio applications. You’ll get familiar with the core concepts of FastRTC — how it streams audio, processes live voice input, and generates responses on the fly.

Then, we’ll take it a step further by combining FastRTC with LangGraph, transforming your regular agent into a fully interactive realtime voice agent.

By the end of this lesson, you’ll have an application running on your laptop that lets you interact with your agent through voice:

Lesson 2 - Superlinked for Realtime Property Search

📰 Article → November 26
🎙️ Live Session → November 30

In this lesson, we’ll introduce Superlinked, the framework that will serve as the entry point for our real estate agents to access and reason about property data.

You’ll learn the fundamentals of Superlinked — how it integrates with Qdrant, and how to create indexes capable of handling multiple data types (numeric, categorical, and more). This will give your agent a powerful, flexible way to search, rank, and retrieve the most relevant listings.

🏠 The end goal? When a user asks for something like:

“A nice apartment in Barrio de Salamanca for under 300,000 euros, with at least 3 bedrooms and 2 bathrooms.”

Your agent will be able to find and return the exact matching properties using Superlinked.

We’ll also share a few advanced tricks to make your agents more natural and conversational — including how to call tools dynamically during a dialogue without breaking the flow of conversation.

Lesson 3 - Improving STT and TTS Systems

📰 Article → December 3
🎙️ Live Session → December 7

Up until now, our agent has relied on Moonshine for speech-to-text (STT) and Kokoro for text-to-speech (TTS). This setup has worked well enough — delivering decent transcription quality and natural-sounding voice output — but there’s a lot of room for improvement in both accuracy and performance.

Here’s what’s been happening under the hood: your input audio frames are collected, processed by the Moonshine STT model for transcription, and then spoken back using Kokoro.

The flow is smooth, but the transcription quality can sometimes lag behind modern standards, and the voice output could be even more expressive and efficient.

💡 That’s why we’re upgrading both ends of the system.

We’ll be moving from Moonshine → faster-whisper for faster, higher-quality transcriptions, and from Kokoro → Orpheus 3B for richer, more natural voice synthesis.

Since Orpheus 3B is a large model that can’t be run locally (at least, not on most laptops), we’ll deploy it on Runpod, serving it through Llama.cpp. We’ll use the GGUF version, which is highly efficient for inference —but we’ll dive deeper into that when the time comes.

Serving Orpheus 3B with Llama.cpp Server

By the end of this lesson, you’ll be able to experiment with a voice that sounds like this:

Lesson 4 - Deployment, monitoring and Twilio Integration

📰 Article → December 10
🎙️ Live Session → December 14

Finally, we’ve arrived at Lesson 4, where everything comes together. In this session, we’ll show you how to deploy your FastAPI app to Runpod — this time using CPU pods for an efficient and cost-friendly setup.

You’ll also learn how to add an observability layer with Opik, giving you full visibility into your system’s behavior and performance.

And then comes the most exciting part of the course… connecting everything to Twilio.

Creating a TwiML App to connect phone calls with your FastAPI application

By the end of this lesson, you’ll have a fully functional system capable of receiving incoming calls and even making outbound calls to any number you choose.

(Just maybe don’t start by pranking your friends with weird calls... or do 😄)

Next steps

Here’s how you can stay in the loop and make the most out of this course:

1. Follow the journey on GitHub

All the code for this course will be pushed to the repo week by week, as we build our agent-based real estate system together.

👉 Head over to the repo, star it, and follow the updates so you don’t miss a single drop of new content.

2. Join our first (FREE) live session this Sunday!

Check the recorded live session here:
The Architecture of Phone Calling Agents (Live Session!)
Miguel Otero Pedrido
·
Nov 17
Read full story

This Sunday, we’re hosting our first live session — and it’s completely free to join! We’ll present the course, walk you through the architecture, and answer all your questions live.

You’ll be able to join the session on YouTube, LinkedIn Live, or Substack Live — whichever platform suits you best.

⚠️ Important: This will be the only free live session of the series. Starting next week, both the Wednesday articles and future live sessions (one for each lesson) will be exclusive to Premium subscribers.

See you there, builders! 🎙️