5 Comments
User's avatar
Louis-François Bouchard's avatar

Thanks for having us Miguel! :) Super happy to be the first to guest post and also super happy to jump in and share our insights from Towards AI.

Expand full comment
Nicos's avatar

Man, you are setting with your protein laden post such a high bar it will be hard to keep up with the level 🤣.

In no time Miguel’s full time job will be securing high quality guest chime-ins. Good problem to have, if you ask me.

Will help to rise above noise and fluff. 🙏

Expand full comment
Nicos's avatar

Welcome Louis-Francois on your fabulous waves, Miguel! I skimmed through this fascinating yet dense deep dive. My intention is not to bluntly critique but to help your fans have better clarity/intuition from the onset - why bother?

As it was numerously stated in the post CAG is a different beast, requires thorough understanding of underlying mechanisms (transformer, KV, FSDP, 3D compute etc) for real world setup.

Use cases warranting CAG implementation

——

- You are enterprise, that have expensive GPU capacities and want to strike cost effective latency/throughput balance

- your typical sessions are either very long or across many sessions(users) there is repeated use of context

- context can’t change otherwise (like every caching mechanism) it has to be recomputed

What are we missing?

- evaluate, evaluate, evaluate - perfect prompt and taming hallucinations, alignment requires multilayer/multistage evaluations. Highly dynamic. Forget “black box” CAG. Well, if you insist and do open heart surgery to all kinds of CAG hookups - that’s different pay grade,

not even post-doc might help :-)

- enterprise has to sift through terabytes of data, swap in/swap out for analysis by LLM. Forget about limited context window even with CAG machinery.

RAG with its own demons is here to save you. Or sort of, because more often than not it is clunky and ugly with way too many moving parts and steps (thank you non-determinism and never ending evaluations).

If one is super adventurous and pragmatic (and loaf of determination) front-ending CAG with RAG would be the way to go. Of course, devil is in details. But it is not necessarily CAG vs RAG.

Expand full comment
Toni Petrov's avatar

Hi Miguel, this is very helpful, but I'm afraid the context you tested with is pretty small. Yes, for few thousand of tokens (or maybe even tens of thousands) it will be more practical, but if we have a lot of data we want to inject (think 100,000+ tokens) i wonder if RAG or CAG will work better.

Expand full comment
Miguel Otero Pedrido's avatar

I agree, that's a very good point. We should test this approach with a higher amount of data and see how it performs 👌

Expand full comment