AI Search & Discovery

What Actually Gets a Biomedical Paper Cited by ChatGPT

Generative Engine Optimisation for biomedical papers — the one peer-reviewed study on what works, the 2026 citation-overlap data, and the tactics currently being sold to researchers that the evidence does not support.

Published 17 April 2026 ~14 min read

Google's AI Overviews now trigger on 48% of tracked queries as of February 2026, up from 31% a year earlier — a 58% year-on-year increase on BrightEdge's panel of nine industries. Education queries went from triggering AI Overviews 18% of the time to 83% in that same window. In parallel, Ahrefs' March 2026 analysis of 863,000 keyword SERPs and 4 million AI Overview URLs found that only 38% of AI-Overview citations also rank in Google's top 10 for the same query — down from 76% in a comparable July 2025 analysis. A separate BrightEdge panel published 12 February 2026 put the overlap at roughly 17%.

A biomedical paper's discoverability used to be decided by a ranking algorithm and read by a human. Increasingly, it is being decided by a retrieval system and read — first, and sometimes only — by a language model.

Generative Engine Optimisation (GEO) is the practice of making your paper extractable and quotable by that language model. It is a real technical question, not a marketing category, and there is one piece of peer-reviewed controlled research on what actually moves citations inside generative answers. Most of what is currently sold to researchers as “AI SEO” is not supported by that research, and some of it is directly contradicted.

This post walks through what the evidence says, what it does not say, and what a biomedical PI should actually do with it.

Key takeaway

In the only peer-reviewed controlled study on GEO (Aggarwal et al., KDD 2024), adding direct quotations to a source lifted its visibility in generative-engine answers by 41%, adding statistics by 33%, explicitly citing sources by 28%, and improving fluency by 28%. Keyword stuffing was the only tested tactic that hurt visibility, at −9%. For a biomedical paper, this maps directly onto how the abstract and introduction are written.

What is generative engine optimisation, actually?

The term GEO was introduced in Aggarwal, Murahari, Rajpurohit, Kalyan, Narasimhan and Deshpande (2023), published at KDD 2024. As of April 2026, this remains the only peer-reviewed controlled study of what tactics actually cause a source to be cited more often inside a generative-engine answer. Everything else in the public literature is either secondary commentary, a vendor case study, or a literature review. That matters, because the paper's findings are specific and some of them disagree with what is being sold.

The authors built a benchmark they call GEO-bench: 10,000 real queries drawn from nine datasets (MS Marco, ORCAS-I, Natural Questions, AllSouls, LIMA, Davinci-Debate, Perplexity.ai Discover, ELI5, and GPT-4-generated queries), split 80% informational / 10% transactional / 10% navigational, spanning 25 domains including Arts, Health, and Games. For each query, they randomly selected one of the retrieved sources, applied a candidate optimisation tactic to it, and measured whether the tactic changed that source's visibility in the generated answer. The headline metric is Position-Adjusted Word Count, which rewards a source both for being mentioned and for being mentioned early or at length in the answer.

GEO is not a rebrand of SEO. The two mechanisms reward different properties of the same page.

Classical SEO rewards ranking. A search engine fetches your page, a ranking model places your URL in a list, and the user either clicks your URL or does not. What moves the needle: link equity, crawl depth, title-tag match, on-page keyword presence, site speed, backlinks.

GEO rewards retrieval plus extraction. A retrieval-grounded language model — ChatGPT with browsing, Perplexity, Google AI Mode, Claude with web access, Google AI Overviews — fetches several candidate sources for a query, extracts a sentence or paragraph from each, and stitches those extractions into the generated answer. The user's surface action is reading the generated paragraph. The click back to your page, if it happens at all, is downstream of whether a sentence from your page made it into that paragraph.

Classical SEO versus GEO: two different mechanisms, two different optimisation targets A side-by-side comparison. On the left, the SEO flow: user query, ranking algorithm, ranked list of URLs, user click. On the right, the GEO flow: user query, retrieval of multiple sources, extraction of quotable sentences from each, a generated answer paragraph that the user reads directly, with a downstream click to one of the sources. Two mechanisms, two different things to optimise for Classical SEO — ranking User query Ranking model scores URLs Ranked list of URLs shown User clicks a URL Optimise for: rank position. The URL is the deliverable. GEO — retrieval + extraction User query Retrieve candidate sources (query fan-out, typically 5–10) Extract quotable sentences from each fetched page Generated answer paragraph Optimise for: extraction. A quotable sentence is the deliverable.
Figure 1. The SEO pipeline ends when a URL is returned and clicked. The GEO pipeline ends when a generated paragraph is shown to the user; the click, if any, is a downstream consequence of the page having contained a sentence the retrieval system could extract. Biomedical writing conventions are optimised for neither mechanism, but are further from GEO than from SEO.

For a biomedical paper this distinction matters more than for most other content, because biomedical writing is optimised for neither. Your title is declarative and often flattened by copy-editing. Your abstract uses field-specific shorthand. Your full text often sits behind a paywall the retrieval layer cannot reach — in our April 2026 measurement of the top 50 biomedical journals by h-index, 45 of 50 were blocked to compliant AI crawlers by either server-level 403 or robots.txt disallow. And your figures and tables carry the numeric payload, but a text-only retrieval system cannot see them.

How a retrieval-based AI system decides what to cite

The user enters a query. The retrieval system expands it — often using query fan-out, where the model rewrites the original question into several related queries and pulls candidate pages for each variant. The practical consequence is that a page the user never typed a query for can end up in the answer, and a page that ranks first for the user's original query can be absent. Ahrefs' March 2026 measurement shows this: only 38% of URLs cited in AI Overviews appear in the top 10 for the original query. The other 62% are reached through fan-out variants or from positions 11 and below.

For each retrieved URL, the system attempts to fetch the page. If the fetch succeeds and the page returns readable text, the retrieval layer extracts candidate passages and scores them on how completely they answer the user's question. The extractions with the highest scores are the ones the model cites.

That is the entire mechanism. Three failure modes matter to a biomedical paper:

  1. The fetch fails. The publisher server returns 403 regardless of what robots.txt declares, or robots.txt disallows the crawler by name. The system moves on to the next candidate. For 45 of 50 top biomedical journals, this is what currently happens to compliant AI bots.
  2. The fetch succeeds but the readable text is thin. A paywalled journal article page typically returns an abstract and bibliographic metadata. The model has less to extract from than if it had the full paper. An open-access preprint, a PubMed Central deposit, or a well-written plain-language summary on the author's institutional profile page all give it more.
  3. The text is readable but not quotable. The abstract is written in dense, field-specific language. There is no single sentence in it that directly answers a non-specialist query. The system scores the page lower and cites a competing page that does have such a sentence.

Aggarwal et al. does not address failure mode 1 (that is an access-layer problem outside GEO's scope). It addresses failure modes 2 and 3 directly, by asking: given that a retrieval system has fetched your page and extracted it, what changes to the page cause the extraction to win?

What the evidence shows works

The paper's headline finding is often quoted as “up to 40% improvement in visibility.” That phrase hides most of the useful detail. The actual finding is that of the nine tactics tested, four produced lifts of 28% or more, three produced modest lifts, one was essentially noise, and one actively hurt visibility. The spread is what matters.

Measured visibility lift for each of nine GEO tactics on GEO-bench (Aggarwal et al. 2024) A horizontal bar chart. Each row is one of nine generative-engine optimisation tactics, with the percentage change in position-adjusted word count relative to an unoptimised baseline. Quotation Addition is highest at plus 41 percent, Statistics Addition at plus 33 percent, Cite Sources and Fluency Optimization tied at plus 28 percent, Technical Terms at plus 18 percent, Easy-to-Understand at plus 14 percent, Authoritative at plus 12 percent, Unique Words at plus 6 percent, and Keyword Stuffing at minus 9 percent. Positive bars are green, negative is red, neutral values are grey. Visibility lift by GEO tactic, relative to baseline Aggarwal et al. 2024 — Position-Adjusted Word Count on GEO-bench (n=10,000 queries, 25 domains) 0% +10 +20 +30 +40 −10 Quotation Addition +41% Statistics Addition +33% Cite Sources +28% Fluency Optimization +28% Technical Terms +18% Easy-to-Understand +14% Authoritative tone +12% Unique Words +6% Keyword Stuffing −9% Green: ≥+28% lift. Grey: modest lift. Red: hurt visibility. Chart drawn from Aggarwal et al., Table 1 (Position-Adjusted Word Count metric).
Figure 2. The nine GEO tactics tested in Aggarwal et al. (KDD 2024), ranked by measured visibility lift on the Position-Adjusted Word Count metric across GEO-bench's 10,000 queries. The top four tactics all depend on what is written in the extractable text of the page; the bottom tactic (keyword stuffing) is the only tested intervention that reliably hurt visibility.

Four things in this chart matter for biomedical papers.

Quotation Addition lifts visibility by 41%, the largest effect observed. For a biomedical paper, the nearest equivalent is quoting a specific prior finding from a recognised authoritative source — a major reference work, a standards document, a well-known review — directly in your abstract or introduction, rather than paraphrasing it and burying the attribution in a numbered citation. Most biomedical abstracts paraphrase. That is, on this evidence, a visibility cost.

Statistics Addition lifts visibility by 33%. This is the “first sentence of the abstract” finding in a different form. An abstract that opens with “We tested whether compound X inhibits pathogen Y” gives the retrieval model one extractable sentence and no number. An abstract that opens with “Compound X inhibited pathogen Y at concentrations 16-fold lower than the standard reference, across n = 24 clinical isolates” gives it a quotable, numerically anchored sentence that a retrieval system will preferentially extract. The first version is generic; the second is specific and comparable.

Cite Sources and Fluency Optimization are tied at +28%. Citing authorities explicitly in prose (not only in a reference list) and writing fluent, readable prose both lift visibility by roughly the same amount. Most biomedical methods sections already have the first; most biomedical abstracts have room to improve on the second.

Keyword Stuffing hurts visibility by 9%. This is the strongest negative finding in the study and runs directly against most “AI SEO for academics” advice currently being marketed, which treats ChatGPT as if it were a 2010-era search engine. Aggarwal et al.'s data says it is not. Repetition is treated adversarially.

These findings are domain-agnostic — the benchmark covered arts, health, games and 22 other domains — but the pattern they suggest for biomedical abstracts is coherent: front-load a quotable declarative sentence with at least one quantified claim and at least one named authoritative reference, and do not repeat keywords for their own sake.

What does not work, despite being sold to researchers

The other side of the evidence is worth stating as plainly.

llms.txt has no measured effect on AI citations

The llms.txt standard, proposed in 2024, is a machine-readable manifest file that sits at the root of a website and tells AI crawlers what content it prefers them to ingest. It has been heavily promoted by SEO vendors throughout 2025 and 2026, and is sometimes recommended to academic institutions.

The available evidence says it does not change AI citation behaviour. ALM Corp's 2025 analysis tracked 10 websites for 90 days before and 90 days after llms.txt implementation, measuring three metrics: AI crawler request frequency in server logs, traffic from AI platforms (ChatGPT, Claude, Perplexity, Gemini), and changes in AI citations. Eight of the ten sites showed no measurable difference. The two that did grow had confounding variables — Bloomberg coverage, simultaneous content launches — that prevented attribution to llms.txt. A separate 300,000-domain analysis in the same study found, in the authors' words, “no measurable correlation between having llms.txt and receiving AI citations.”

None of ChatGPT, Claude, Perplexity, or Google AI Overviews has publicly confirmed that its retrieval layer parses llms.txt. Google has said it does not. Anthropic references the standard in its documentation, which is the strongest single endorsement from a frontier lab, but endorsement is not demonstrated citation uplift.

Practical consequence for biomedical researchers: adding an llms.txt to a lab website or institutional profile in April 2026 will not measurably change how often AI tools cite the papers on that page. The evidence to support any other claim does not currently exist. If a vendor is selling llms.txt implementation as an AI-citation service, ask them for the primary-source data.

Keyword-density and classical “AI SEO” tactics

Aggarwal et al. is explicit about this: keyword stuffing lifted visibility by −9%, the only intervention in the nine-tactic set that was actively harmful. Any tool or consultant recommending target-keyword repetition in abstracts or landing pages is recommending a practice the primary evidence says hurts.

Over-transferring classical SEO tactics

Ahrefs' separate cross-AI study of 15,000 long-tail queries measured the overlap between pages cited by AI tools (ChatGPT, Gemini, Copilot, Perplexity) and pages ranking in Google's top 10 for the same query. The overall overlap was 12%. Perplexity was highest at 28.6%. ChatGPT and Gemini hovered around 8%. In plain terms, ranking first on Google for a query is, at best, weakly predictive of being cited by an AI tool on the same query. “Unified SEO/GEO” consulting products that assume the two pipelines share most of their ranking signals are making a claim the data does not support.

Accuracy is not a solved problem, either

It is worth being honest about the state of AI citations in general. The Tow Center at Columbia Journalism Review (March 2025) tested eight AI search tools on 1,600 queries (20 publishers × 10 articles × 8 chatbots), giving each tool a direct excerpt and asking it to identify the headline, publisher, date, and URL. Perplexity got 37% of answers wrong. ChatGPT Search got 67% wrong (134 of 200). Grok-3 got 94% wrong. The premium tiers of these tools were often more confident in their wrong answers than the free tiers. These are not biomedical queries, but they are a floor on how reliable citation-from-excerpt can be on the general web. In a biomedical context, where terminology ambiguity and paper-of-record confusion are worse, the error rate on specific attribution questions is plausibly higher.

Three things that are specific to biomedical papers

The findings above are not biomedical-specific; GEO-bench covered 25 domains and Ahrefs tested general queries. Three things, however, apply more sharply to life-sciences research than to other content.

Open-access matters more for GEO than for traditional SEO. Classical search engines have ingested the abstracts of paywalled papers for years through publisher-Google agreements and through PubMed/MEDLINE. Retrieval-grounded AI systems often cannot. When a compliant AI crawler gets a 403 from an article page, it does not get the abstract either. This is why, on our measurement of the top 50 biomedical journals, 45 of 50 are effectively invisible to retrieval systems that obey their access policies. An open preprint on bioRxiv, medRxiv, or arXiv, or a PubMed Central deposit under NIH or UKRI open-access mandates, changes this. Not because the preprint is “better” than the journal version, but because the retrieval layer can actually read it.

The abstract is almost the entire game. Because the retrieval system's extraction is text-based and because most biomedical papers' findings are most densely expressed in the abstract, the abstract carries disproportionate weight in whether your paper is cited by an AI tool. Aggarwal et al.'s findings apply to the abstract directly: a quotation, a specific quantified claim, an explicit citation of an authoritative source, and fluent prose each independently lift visibility. Crucially, these are additive: an abstract can do all four without padding, and on the evidence, doing all four is strictly better than doing one.

The surface you control changes over the paper's lifetime. Your preprint is editable only until journal publication (see bioRxiv policy). Your journal version is frozen at acceptance. The surface that remains editable for the life of the paper is your institutional profile page — your ox.ac.uk, stanford.edu, or equivalent profile. A plain-language paragraph under each paper, on a university-domain page that the retrieval layer can read end-to-end, is the long-term hedge against every other access-layer problem this post has named. That matters because the one thing retrieval-grounded AI systems cannot do is invent a source that does not exist in their index: if there is no readable, authored description of your paper anywhere on the open web, the system will either paraphrase your abstract imperfectly or cite a competing paper that has one.

A minimal checklist, ranked by evidence strength

In priority order, based on the evidence above.

For a paper you are writing now

  1. Open the abstract with a quantified declarative sentence. State the finding and at least one number in the first sentence. (Aggarwal et al., Statistics Addition, +33%.)
  2. Quote a specific authoritative source in the introduction. Not “several studies have shown” with a numbered citation — an actual direct quote attributed to a named reference work. (Quotation Addition, +41%.)
  3. Cite your two or three most important prior results by name in the discussion text, not only in the reference list. (Cite Sources, +28%.)
  4. Do not repeat keywords in the abstract for their own sake. (Keyword Stuffing, −9%.)
  5. Prefer a venue that is open to AI crawlers. If not possible, ensure a compliant open-access route: preprint, PMC deposit, green OA.

For papers you have already published

  1. Write one plain-language paragraph per paper on your institutional profile page. This is the only surface still fully under your editorial control for the life of the paper. State the question, the finding with one number, and why it matters, in the words your field actually searches.
  2. Link it to your ORCID and the canonical DOI. Metadata interoperability has measurable effects on indexing (see Springer Scientometrics, 2025).
  3. Do not install llms.txt expecting a citation uplift. Current evidence does not support this.
  4. Verify the paper's canonical page is actually reachable by AI crawlers. If the publisher platform blocks them, ensure the open-access copy on PMC or a preprint server is indexable.

What we still do not know

Three honest limitations on the above.

First, Aggarwal et al. tested general-web content on general-purpose queries. The results are plausibly directionally correct for biomedical abstracts — the retrieval and extraction mechanisms are the same regardless of subject — but we do not yet have a controlled study replicating GEO-bench on biomedical-specific queries and papers. That experiment is worth running, and it has not been run.

Second, the 2026 citation-overlap numbers (Ahrefs, BrightEdge) are moving quickly. The drop from 76% to 38% top-10 overlap happened in roughly seven months. If Google's AI Overviews adjust their retrieval model again — a reasonable possibility, given the 27 January 2026 switch to Gemini 3 — these numbers will move again. Any post with a specific percentage in 2026 will be partially out of date in 2027. The direction (retrieval decoupling from ranking) looks structural; the specific magnitudes do not.

Third, the evidence on AI citation accuracy is worse than the evidence on AI citation mechanism. The Columbia Journalism Review study measured how often these tools get citations wrong, not how often they cite your paper in the first place. A paper that is cited 40% more often by ChatGPT but attributed with a 67% error rate on specific claims is not an unambiguous win. GEO is a visibility lever; it is not an accuracy guarantee.

FAQ

Is Generative Engine Optimisation the same as SEO?

No. Classical SEO optimises for ranking: the search engine returns a list of URLs and the user clicks one. GEO optimises for retrieval plus extraction: a language model fetches several sources, extracts sentences from them, and stitches those into a generated answer. The user's surface action is reading the generated paragraph, not clicking a URL. The optimisation targets are different, and Ahrefs' 12% top-10 overlap between AI-cited URLs and Google-top-10 URLs (15,000 long-tail queries) shows they do not overlap cleanly in practice either.

Does llms.txt improve how often AI tools cite my papers?

The available evidence says no. ALM Corp's 2025 analysis tracked 10 websites for 90 days before and 90 days after llms.txt implementation. Eight showed no measurable difference. A separate 300,000-domain analysis in the same study found no correlation between llms.txt presence and AI citation frequency. None of ChatGPT, Claude, Perplexity, or Google AI Overviews has confirmed parsing llms.txt; Google has said it does not.

What is the single most impactful change a biomedical researcher can make?

On the only controlled GEO study to date, the largest measured effect came from quotation addition: including a direct, attributed quote from a recognised authoritative source within the text (+41% visibility on GEO-bench's 10,000 queries). For a biomedical paper the equivalent is quoting a specific prior finding from an authoritative reference work in the abstract or introduction, not paraphrasing. Adding quantified statistics to prose (+33%) is the next largest and is also directly actionable for abstracts.

If my paper is in a journal that blocks AI crawlers, is GEO still worth doing?

Yes, because the paper's canonical page is not the only surface the retrieval system can reach. Even when the journal platform returns 403, the retrieval layer will typically find a PubMed Central deposit, a preprint, an author's institutional profile page, or a review article that summarises the finding. The text on those reachable pages is what determines whether your paper appears in the AI's answer. In our April 2026 measurement, 45 of the top 50 biomedical journals by h-index were blocked to compliant AI crawlers, so this situation is the norm, not the exception.

Sources

  1. Aggarwal, P., Murahari, V., Rajpurohit, T., Kalyan, A., Narasimhan, K., & Deshpande, A. (2024). GEO: Generative Engine Optimization. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. arXiv preprint: arXiv:2311.09735. Results drawn from Table 1 (Position-Adjusted Word Count metric) on GEO-bench, n=10,000 queries across 25 domains.
  2. Ahrefs Research (2 March 2026). Update: 38% of AI Overview Citations Pull From The Top 10. ahrefs.com/blog/ai-overview-citations-top-10. Dataset: 863,000 keyword SERPs and 4 million AI Overview URLs. Caveat: parsing methodology improved between the July 2025 and March 2026 studies; exact comparability is approximate.
  3. Ahrefs Research. Only 12% of AI Cited URLs Rank in Google's Top 10 for the Original Prompt. ahrefs.com/blog/ai-search-overlap. Dataset: 15,000 long-tail queries across ChatGPT, Gemini, Copilot, Perplexity.
  4. ALM Corp (2026). Google AI Overview Citations From Top-10 Pages Dropped From 76% to 38%. almcorp.com. Secondary source for BrightEdge's 12 February 2026 ~17% top-10 overlap figure and the 27 January 2026 switch of Google AI Overviews to Gemini 3.
  5. ALM Corp (2026). Google AI Overviews Surge 58% Across 9 Industries. almcorp.com. BrightEdge panel data: 48% AIO trigger rate in February 2026 (up from 31% in February 2025).
  6. ALM Corp (2025). Does llms.txt Actually Matter for AI Search? Expert Analysis. almcorp.com/blog/does-llms-txt-matter-data-analysis. Methodology: 10 websites tracked 90 days pre/post implementation; separate 300,000-domain correlation analysis.
  7. Jaźwińska, K., & Chandrasekar, A. (6 March 2025). AI Search Has A Citation Problem. Tow Center for Digital Journalism, Columbia Journalism Review. cjr.org. Methodology: 1,600 queries across 8 generative search tools and 20 publishers.
  8. Academic SEO (14 April 2026). We Measured 50 Top Biomedical Journals. Only 5 Are Open to AI Crawlers. academicseo.co.uk/blog/journal-platforms-blocking-ai-crawlers. Internal reproducible measurement of publisher access to compliant AI bots.
  9. Al-Jundi, M., et al. (2025). The effect of using ORCID iD on improving the visibility and retrieval of Arab University publications. Scientometrics, Springer. link.springer.com.

Audit your paper against the evidence above

If you want a per-paper GEO review — a concrete write-up of which Aggarwal et al. tactics your abstract currently uses, which it misses, and what changes would be compliant with your field's conventions — the 115-point paper visibility audit covers this directly.

Submit a paper for audit