NIR-008

Beyond Basic RAG

Technical builders
All reports

NeuroGen Intelligence Report NIR-008: Beyond Basic RAG

Prepared by: NeuroGen Research Date: April 10, 2026 Classification: Market Research & Competitive Positioning Research basis: Lewis et al. (2020) "Retrieval-Augmented Generation" — Facebook AI; Gao et al. (2024) "RAG Survey"; Asai et al. (2024) "Self-RAG" — University of Washington; Liu et al. (2023) "Lost in the Middle" — Stanford Related reading: NIR-000 "Solving Context Rot"


1. Executive Summary

Most AI knowledge systems in production today work the same way: take the question, embed it as a vector, find the nearest matches in a database, and feed the top results to the model. This approach has a name — basic RAG (Retrieval-Augmented Generation) — and it was a reasonable starting point when it was introduced in 2020.

It is no longer good enough.

The research is unambiguous. Basic RAG routinely retrieves noisy chunks that dilute answer quality. It fails on any question that requires combining information from multiple places in a document. It has no structural understanding of what it is reading. And it forgets everything the moment a session ends — forcing users to re-explain context they gave the system yesterday.

Businesses feel the consequences every day. A contract review that overlooks an indemnification clause. A research summary that cites 40 papers but ignores the 160 that contradict its conclusion. A customer-facing AI that confidently repeats a policy that was updated last month. The output looks plausible. The damage is invisible until it isn't.

NeuroGen was built to move past these failure modes. The platform's knowledge layer — NeuroGen Cortex — treats documents and memory the way a skilled analyst does: with structure, with focused retrieval, with verification, and with the ability to remember what matters across sessions. This is not basic RAG with more tokens. It is a fundamentally different approach to how AI reads.

This report does three things:

  1. Explains why basic RAG fails and what four major research papers prescribe as the fix.
  2. Defines the business cost of sticking with basic retrieval in legal, finance, research, and customer-facing workflows.
  3. Positions NeuroGen Cortex — the knowledge layer behind every NeuroGen AI capability — as the production answer to the problems the research identifies.

2. The Science: Why Basic RAG Fails

2.1 The Original Promise

Retrieval-Augmented Generation was introduced by Lewis et al. at Facebook AI in 2020. The insight was simple and powerful: a language model that retrieves relevant documents before answering produces more accurate, more specific, and more factual responses than one that tries to answer from memory alone.

"We build RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index... RAG models generate more specific, diverse and factual language than a parametric-only baseline."

— Lewis et al. (2020)

This was a genuine advance, and it became the industry default. A single retrieval step, a single vector store, a single generation pass. For straightforward factual questions against a clean corpus, it worked well enough to build on.

But "well enough" quietly became "not nearly enough" as AI moved into domains where the questions are not simple and the documents are not clean.

2.2 The Noise Problem

The first failure mode is well documented. Gao et al.'s comprehensive 2024 survey of RAG systems catalogues it directly:

"Challenges faced by RAG include noise, rejection of valid information, context window limitations, and robustness issues. Noise refers to retrieval of passages that contain topic-relevant information but do not help answer the question, leading to confusion in generation."

— Gao et al. (2024)

The mechanism is unavoidable. Basic RAG retrieves the top k chunks with the highest similarity to the query. If k is 5 and only 2 of those chunks actually contain the answer, the other 3 are noise. That noise competes for the model's attention, dilutes the signal, and can actively mislead the final answer. Increasing k to catch more of the answer makes the noise problem worse. Decreasing k risks missing the answer entirely.

There is no good value of k. The architecture itself is the problem.

2.3 The Multi-Hop Problem

Basic RAG's second — and more serious — failure is structural. A single retrieval pass cannot answer questions that require combining information from multiple locations.

Consider a question like: "Compare the security recommendations in section 3 of this document with the implementation details in section 7." A single embedding of the question lands somewhere between the two topics and matches neither region well. The retrieval system either returns a blend that captures neither side, or it picks one side and ignores the other. Either way, the answer is wrong.

Gao et al. identify this as a core limitation:

"For multi-step reasoning tasks, the retrieval component may fail to capture all relevant information in a single pass, particularly when the answer requires synthesizing across multiple documents or distant passages within a single document."

— Gao et al. (2024)

The fix is not a better embedding model. It is a completely different retrieval strategy — one that breaks complex questions into sub-questions, runs each one separately, and synthesizes the results. Basic RAG does not do this.

2.4 The Adaptive Retrieval Problem

Not every question needs retrieval. Some questions need one retrieval pass. Some need three. Some need to retrieve, examine what came back, then retrieve again based on what was found. Basic RAG treats every query the same way, running the same pipeline regardless of complexity.

Asai et al. (2024) proposed a concrete fix in their Self-RAG paper:

"Self-RAG learns to retrieve, generate, and critique text passages and its own generation through the use of reflection tokens. This enables the LLM to adaptively retrieve passages on-demand, and generate and reflect on retrieved passages and its own generations using special critic tokens."

— Asai et al. (2024)

The key word is adaptive. A production knowledge system should analyze a question first — understand its complexity, identify what it is asking for, decide how to retrieve — and only then run the retrieval pipeline best suited to the question at hand. Basic RAG skips this step entirely.

2.5 The Structural Blindness Problem

Basic RAG treats documents as flat bags of text. Once a document is chunked and embedded, the retrieval layer has no idea that the chunks belong to sections, that sections belong to chapters, that some chunks are headers and others are footnotes, or that chunk #47 is adjacent to chunk #48 in the source.

This is the most architectural of the failures, and it is the one that a decade of long-context research keeps coming back to. Liu et al. (2023) demonstrated the consequences directly:

"We find that performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts... this is even the case for explicitly long-context models."

— Liu et al. (2023), "Lost in the Middle"

The point is that documents have structure, and a good knowledge system should use that structure. An AI that knows where a chunk came from can navigate to adjacent content, jump to a specific section, follow a cross-reference, or pull a definition from the glossary at the end. An AI that only knows embedding vectors can do none of those things — and it pays the price every time a question depends on content buried anywhere other than the start or end of a document.

2.6 What the Research Collectively Demands

Put the four papers together and a picture emerges of what a production knowledge system actually needs:

Requirement Source What it means in practice
Retrieval should augment generation Lewis et al. (2020) Baseline — retrieve, then generate
Pre-filter before vector search to eliminate noise Gao et al. (2024) Don't dump top-k blindly into context
Multi-hop questions need multi-pass retrieval Gao et al. (2024) Decompose complex questions first
Retrieval strategy should adapt to question complexity Asai et al. (2024) Analyze before retrieving
Documents should be navigable structures, not flat text Liu et al. (2023) + industry long-context research Position tracking, section awareness, targeted navigation

Basic RAG checks exactly one of these boxes. NeuroGen Cortex checks all five.


3. The Business Cost of Basic RAG

This is not an academic problem. The consequences show up in every business that relies on AI to read anything longer than a paragraph.

Legal. A firm runs a 400-page acquisition agreement through an AI review tool. The AI surfaces 43 of 46 material clauses. The three it misses are buried in an indemnification schedule that cross-references back to section 3.2 — a multi-hop question the basic RAG system could not handle. The client signs. The missed clauses surface a year later in litigation.

Finance. An analyst asks an AI to summarize risk factors from a 10-K filing. The summary is coherent and well-written. It omits the single most significant risk — which appeared in a supplementary schedule whose chunks embedded to vectors that did not match the query closely enough. Basic RAG never retrieved them. The model never saw them.

Research. A team asks an AI to synthesize findings across 200 academic papers on a controversial topic. The synthesis cites 30 papers that support its conclusion. The other 170 — including several that directly contradict it — were below the retrieval threshold. The team publishes a white paper that competitors tear apart within a week.

Customer Service. A company's support chatbot is grounded in a knowledge base of product documentation. A customer asks a specific question about a feature that was updated last month. The chatbot retrieves the old version of the documentation because its embeddings are still in the vector store. The answer is confidently, professionally, and completely wrong. The customer cancels.

Enterprise Search. An employee asks the internal AI assistant: "What did we decide about the Acme pricing negotiation and who was in the meeting?" The answer comes back with a plausible-sounding summary that is a composite of three unrelated meetings. There is no way to tell unless the employee already knows the correct answer — in which case the tool was unnecessary.

In every case, the failure pattern is identical. The AI speaks confidently while operating on incomplete, poorly-targeted, or out-of-context information. The output is plausible, coherent, and wrong — the worst possible combination for any workflow where accuracy matters.


4. NeuroGen Cortex: How NeuroGen Solves It

NeuroGen Cortex is the knowledge layer that sits behind every NeuroGen AI capability — contract review, research synthesis, customer-facing chatbots, internal search, analyst workflows. It is the reason a query against a 10-page document and a query against a 10,000-page archive produce equally accurate answers.

Cortex does not replace the language model. It replaces everything the language model reads before it answers.

4.1 Structural Reading, Not Flat Retrieval

When a document arrives in NeuroGen, Cortex parses it into a navigable structure rather than a flat sequence of text blocks. Sections, headings, tables, footnotes, and cross-references are preserved and indexed. Every piece of content knows where it came from — which section, which page, which neighbours — so the AI can move through the document the way a careful reader does: find the relevant area, read around it for context, follow references, and verify findings.

This is the opposite of basic RAG, which strips structure during ingestion and can never recover it.

4.2 Adaptive Retrieval for Real Questions

Cortex analyzes every question before retrieving anything. Simple questions get a simple retrieval pass. Complex questions — the ones involving comparisons, cross-references, or multi-step reasoning — are automatically decomposed into sub-questions, each executed separately against the source material, with the results synthesized into a complete answer.

Users never see this. They ask their question and get an answer. Behind the scenes, Cortex is doing the work a human researcher would do: asking the right follow-up questions and confirming the findings before responding.

4.3 Pre-Filtered Search, Not Top-K Gambling

Before any vector search runs, Cortex filters candidate passages using a keyword index — a fast, structural layer that eliminates obviously irrelevant content before the slower, fuzzier semantic layer sees it. The semantic layer then scores only the pre-filtered candidates, producing retrieval results that are both faster and cleaner than any single-pass approach.

This is how Cortex solves the noise problem that basic RAG has no answer for.

4.4 Grounded Answers With Verifiable Sources

Every answer Cortex produces is tied to specific passages in the source material. Users can verify any claim in seconds. When a question cannot be fully answered from the available content, Cortex says so clearly — rather than fabricating a plausible guess, which is the failure mode that makes basic RAG dangerous in regulated domains.

A reliable "I don't know, here is what is available" is infinitely more valuable than a confident hallucination. Cortex knows the difference.

4.5 Memory That Persists Across Sessions

Conventional AI chatbots are stateless. Every conversation starts from zero. NeuroGen is different: Cortex extracts and stores the facts, preferences, terminology, and relationships that emerge during conversations, and makes them available to the AI in every future session.

The effect compounds. An analyst who works with NeuroGen for six months has a knowledge layer that knows their domain, remembers which documents matter, understands their team's terminology, and recalls the conclusions of prior work. This is the difference between a tool that keeps making you start over and one that gets measurably better at your specific job the longer you use it.

4.6 Logarithmic Cost, Not Linear

Basic RAG platforms charge for every token in the input. Cortex is different: because it retrieves precisely rather than dumping top-k results into context, cost scales logarithmically with document size. Processing a million-token archive costs a fraction of what a brute-force approach would charge, because only the relevant content is ever read in depth.

Combined with per-operation tracking, configurable budget limits, and full audit trails, this makes enterprise AI spend a predictable line item rather than a surprise at the end of the month.


5. What This Looks Like in Practice

5.1 Questions Basic RAG Cannot Answer

Every capability in this list is a question basic RAG systems routinely fail on. Cortex answers all of them consistently:

  • "Find every clause in this 600-page contract that references the indemnification section, and summarize how they interact." (Multi-hop retrieval across a long document)
  • "Compare the risk factors in this company's last three 10-K filings and highlight what changed." (Cross-document synthesis)
  • "Across these 200 research papers, which ones contradict the hypothesis in paper 47?" (Adaptive retrieval at scale)
  • "Based on what I told you last month about our pricing model, does this proposal align with it?" (Cross-session memory)
  • "In this codebase, which modules depend on the authentication service, and which of those would break if I change its interface?" (Structural navigation across a large corpus)

5.2 Accurate Answers at Any Scale

NeuroGen customers routinely run queries against document collections that would collapse every single-model approach. A multi-hundred-page contract. A full SEC filing with its exhibits. A research archive spanning thousands of papers. A multi-million-token codebase. In every case, Cortex returns answers grounded in the actual content, with references to the specific sections the answer came from.

Users can verify any claim in seconds rather than re-reading the source material themselves.

5.3 One Knowledge Layer, Every Workflow

NeuroGen Cortex is not a standalone tool. It is the shared foundation underneath every AI capability in the NeuroGen platform. The same knowledge layer that reviews contracts powers research synthesis, drives customer-facing chatbots, grounds internal search, supports analyst workflows, and serves as the long-term memory for autonomous agents.

Every team in an organization benefits from the same underlying capability, and every improvement to Cortex improves every workflow at once.

5.4 Enterprise-Grade Data Controls

Retention is configurable per deployment: 30 days, 90 days, 365 days, or indefinite. All stored content is encrypted at rest. Training opt-in is explicit and off by default. Full audit logs are available for compliance review. Memory stored by Cortex respects the same per-user, per-tenant isolation boundaries as the rest of the platform — one customer's data is never visible to another.


6. Competitive Landscape

Capability Basic RAG Platforms NeuroGen Cortex
Retrieval strategy Single-pass top-k vector similarity Adaptive, multi-pass, pre-filtered
Noise handling Top-k gambling — noise in, noise out Keyword pre-filter eliminates noise before semantic search
Multi-hop questions Routinely fails Automatic question decomposition
Document structure Flat chunks, no positional awareness Navigable structure with section and position tracking
Verification None — confidence theater Grounded answers with explicit uncertainty handling
Memory across sessions Session-only; every chat starts fresh Persistent, compounding cross-session memory
Effective document scale Capped by context window No practical ceiling — accuracy maintained across archives
Cost behavior Linear with input size Logarithmic — read only what matters
Data controls Minimal or absent Configurable retention, encrypted at rest, per-tenant isolation

7. Conclusion

Basic RAG was a reasonable starting point in 2020. In 2026, it is the quiet cause of almost every AI failure that keeps enterprise buyers awake at night — the confidently wrong answer, the missed clause, the omitted risk, the outdated policy, the hallucinated citation. The research community has known this for years. The market is only now catching up.

The fix is not a bigger context window, not a better embedding model, and not a larger vector database. The fix is a different architecture — one that treats documents as structured objects, retrieves adaptively, pre-filters noise, verifies its own answers, and remembers what matters across sessions.

That architecture has a name at NeuroGen: Cortex. It is the knowledge layer behind every AI capability in the platform, and it is the reason NeuroGen delivers accurate, grounded answers on documents of any size — regardless of how basic RAG would have handled them.

For organizations whose AI touches anything that matters — contracts, filings, research, code, customer conversations, institutional memory — the difference is not a better model. It is a better layer for the model to read from. That is what NeuroGen Cortex provides, and that is why "beyond basic RAG" is a problem NeuroGen customers no longer have.


References

  1. Lewis, P., Perez, E., Piktus, A., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." arXiv:2005.11401 [cs.CL]. Facebook AI Research.
  2. Gao, Y., Xiong, Y., Gao, X., et al. (2024). "Retrieval-Augmented Generation for Large Language Models: A Survey." arXiv:2312.10997 [cs.CL].
  3. Asai, A., Wu, Z., Wang, Y., et al. (2024). "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection." arXiv:2310.11511 [cs.CL]. University of Washington.
  4. Liu, N.F., Lin, K., Hewitt, J., et al. (2023). "Lost in the Middle: How Language Models Use Long Contexts." arXiv:2307.03172 [cs.CL]. Stanford University.
  5. Tsinghua University NLP Group (2024). "LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding." arXiv:2308.14508 [cs.CL].

NeuroGen Intelligence Report NIR-008 — Beyond Basic RAG

Connecting