Between 27 March and 8 April 2026, Google rolled out its first core update of the year. The public discussion has focused on the familiar story — another wave of template-spam sites losing 60 to 90 percent of their traffic, another round of SEO consultants shouting about schema markup. That is not why PIs should be paying attention.
The reason this update matters for research is quieter and more uncomfortable. In the months leading up to it, Search Engine Land and industry analysts documented a pattern that had already begun to reshape biomedical search: AI Overviews, launched in 2024, now sit on top of roughly half of all healthcare-related queries. And when those AI Overviews reach for a source to cite, they almost never reach for a research paper.
An analysis of more than 130,000 health queries found that Google AI Overviews appear in 51 percent of healthcare searches — roughly double the 16 percent all-industry average — yet only 0.48 percent of their cited sources are academic journals or research papers. The gap is not a prestige problem. It is a technical one, and it is fixable at the preprint level.
This post walks through what the March 2026 update actually changed, what the AI Overview ecosystem looks like for biomedical content today, and why a bioRxiv or medRxiv landing page in its default form is now structurally disadvantaged compared with a well-annotated secondary page. I finish with a practical checklist — the things I would do first if the preprint were mine.
What the March 2026 core update actually changed
Core updates are re-tunings of Google's ranking models. They do not target individual sites by name. What they do is shift the weighting of signals, so that the same page can rise or fall purely because the model now asks a slightly different question of it. The March 2026 update is the first core update to explicitly name a policy violation in its public documentation: scaled content abuse.
The definition is narrower than it sounds. Google's Search Central documentation describes three patterns that now earn a penalty: mass AI-generated pages without editorial review, pure template-with-variable-substitution at scale, and aggregator sites that copy public data without adding context. The update was a crackdown on content farms — the kind of sites that produce ten thousand near-identical "best cheap [thing] in [city]" pages and hope that sheer volume will catch long-tail queries.
The fallout was severe. Multiple analyses agreed on a rough shape: programmatic sites built around repetitive templates lost between 60 and 90 percent of their organic traffic over the 12-day rollout. Sites built around unique structured data — local directories with verified listings, comparison tools with live pricing, literature databases with real metadata — were largely untouched. A few actually gained, because the spam they had been competing with is gone.
On a plain reading, none of this has anything to do with research papers. The update is not after your preprint. But the indirect consequences are where it gets interesting, because Google re-tuned the model that decides which version of a piece of information to surface when multiple versions exist. And preprint landing pages, as they are currently rendered, lose that contest more often than they should.
How do AI Overviews fit into biomedical search?
The elephant in the room is that, for biomedical queries in particular, the search engine results page itself is no longer the whole answer. A year ago, a researcher typing "CRISPR base editing off-target effects" would see ten blue links. Today, the first thing they see is a paragraph of synthesised prose with three to eight inline citations, followed by the links.
For health-related queries this shift is especially aggressive. WebFX analysed more than 130,000 health queries in early 2026 and found AI Overviews on 51 percent of them — roughly double the baseline rate of 16 percent across all industries. In Germany, where Google appears to be testing a more permissive configuration, the rate reaches 82 percent. For any PI who assumes their work reaches readers through ranked search results, the base-rate change alone is worth pausing on.
• All-industry average: 16 percent of queries trigger an AI Overview.
• Healthcare queries (US): 51 percent trigger an AI Overview.
• Health queries in Germany: 82 percent.
• Citations to academic research or medical journals: 0.48 percent of all AI Overview source URLs.
• Citations to reliable medical sources more broadly: 34 percent — meaning two-thirds come from non-authoritative sources.
Sources: WebFX health AI Overview analysis, 2026; Guardian investigation and Google response reporting, January 2026.
The citation split is the part that should keep PIs up at night. When AI Overviews cite sources for a health question, the sources are more often YouTube videos, consumer health blogs, and wellness aggregators than the primary literature. A Guardian investigation in January 2026 documented the consequences in stark terms: an AI Overview recommended that pancreatic cancer patients avoid high-fat foods, advice that multiple oncologists publicly described as dangerous. Google responded by pulling AI Overviews from a narrower set of medical queries, but the default for the vast majority of biomedical searches remains an AI-generated answer stitched together from whatever the retrieval system found easiest to parse.
The takeaway is not that Google dislikes primary research. There is no evidence of that. The takeaway is that the retrieval layer underneath AI Overviews has a strong preference for sources that are easy to ingest, and a bare preprint landing page is harder to ingest than a YouTube transcript.
Why are preprints rarely cited in AI Overviews?
This is the question I care about most, and the answer is unsatisfying in a useful way: it is almost entirely technical. There is nothing wrong with the science on bioRxiv or medRxiv. There is quite a lot wrong with how that science is rendered to a non-human reader.
Walk through what a modern retrieval system sees when it lands on a preprint page. Most preprint servers expose a minimalist HTML page built around the abstract, a list of authors, a DOI, and a download link. That is enough for a human reader. It is not enough for a system that ranks candidate sources by how quickly and confidently it can extract discrete, attributable claims. Compared with a well-optimised secondary source, the preprint landing page is missing:
- Schema.org structured data. A well-constructed page exposes a
ScholarlyArticleJSON-LD block withauthor.url,datePublished,dateModified,publisher, and adescription. Most preprint landing pages expose a Highwire Press meta tag set and very little else. The result is that the retrieval system knows the title and the DOI but not what the paper claims. - A Knowledge Graph entity anchor. In 2026, Google's retrieval layer leans heavily on entity recognition — matching a name on a page to a persistent identity in the Knowledge Graph. An ORCID on the page is a start. An ORCID linked to an
Organizationschema for the lab or institution is considerably better. Preprint landing pages rarely get this far. - A FAQPage or Question/Answer block. Research by several 2025 teams converged on a median 22 percent lift in AI citation rates when a page added a genuine FAQPage schema. The lift is largest in Perplexity and Bing Copilot, smaller in ChatGPT, and measurable in Google AI Overviews. Preprint pages have none of this.
- Clear answer patterns in the first 200 words. Google's retrieval model — partially disclosed in patent WO2024064249A1 — references "information density" and "specificity signals" as factors in passage selection. A narrative-style abstract that buries its main finding in sentence six is at a structural disadvantage against a structured abstract that states the quantified result by sentence two.
- A fresh
dateModified. Retrieval systems weight recent content for time-sensitive queries. Preprint landing pages show the deposition date and then go quiet, even when the paper has been revised or the community response has moved on. There is no technical reason a landing page cannot also carry a "last indexed" or "last summarised" date, but none of the major preprint servers expose one.
None of these gaps is the fault of the author of the paper. bioRxiv and medRxiv have been astonishingly successful on their original terms — they run a non-profit infrastructure that has indexed, at current rates, more than six thousand new manuscripts a month with eight million monthly page views — and their design choices predate the era when a search engine results page started summarising papers before anyone clicked on them. The point is not that the preprint servers are broken. The point is that the landscape around them has changed in ways that their default presentation does not yet reflect.
What signals actually matter to crawlers now?
The unifying theme across every piece of 2026 guidance from Google, from independent SEO research, and from the GEO (generative engine optimisation) community is the same: retrieval systems now reward pages that make the factual claim density obvious, and punish pages that hide it. A few specific signals deserve highlighting because they map directly to changes a PI can influence.
First-200-words answer patterns
AI retrieval systems read the opening of a document with disproportionate weight. A structured abstract that states, in sentence two, "Knockdown of gene X reduced tumour volume by 43 percent in the B16 model at day 21 (n = 12, p < 0.01)" is more extractable than an abstract that builds to the same finding after four sentences of context. This is not a style preference. It is a signal that maps onto how passage-selection models rank candidates.
Specificity and numbers
Retrieval systems have been explicitly trained to prefer sources that contain verifiable, quantified claims over sources that contain qualitative assertions. "Significantly improved survival" rates lower than "a 2.3-fold reduction in median tumour burden (95% CI 1.8 to 3.0)". The second phrase anchors to an extractable number that can be cited, verified, and summarised. The first phrase does not.
Entity graph completeness
In 2026, the single highest-leverage technical addition to any page — scholarly or otherwise — is entity markup that ties content to persistent identifiers. For research, this means ORCID on every author, ROR identifiers for institutions, persistent DOIs for datasets, and a linked sameAs path from the paper to its authors' public profiles. When a retrieval system can say "this claim is made by Jane Doe at the Oxford Centre for Y, who has thirty other indexed papers in this area," its confidence in citing that page rises substantially.
Freshness signals
Most researchers never revise the metadata on a preprint after deposition. Yet freshness is a ranking factor for time-sensitive queries in both traditional search and AI Overviews. A dateModified that moves when the paper is versioned, when a corrigendum is published, or when supplementary data is added is a cheap, accurate signal that almost no one sends.
How big is the gap between a default preprint page and an optimised one?
Honest answer: we do not have a peer-reviewed longitudinal study on this yet. What we do have are a number of converging experiments.
The Geneo 2025 schema markup analysis measured a median 22 percent lift in AI citation rates when a content page added FAQPage schema aligned to its genuine content. A 2026 Ahrefs study on organic click reduction found that pages cited in AI Overviews received roughly 3x the downstream referral traffic of pages that merely ranked for the same query without being cited. Crucially, the pages that earned citations were not always the most authoritative sources; they were the ones with the clearest structured data exposure.
The implication for PIs is that the citation-stack problem is addressable from the outside. You do not need to wait for bioRxiv to rebuild its landing pages. You can create a secondary page for your own paper — on your lab site, a personal page, or a third-party optimisation service — that exposes the structured data the retrieval systems are looking for, and you can link it bidirectionally to your preprint. That page will begin to absorb citation traffic that would otherwise go to a YouTube video.
What should a PI do this week?
Here is the checklist I would work through if I had an hour on a Tuesday afternoon and one preprint I wanted to protect.
The one-hour preprint hygiene checklist
- Rewrite the first two sentences of the abstract so they state the quantified finding. Move context to the back half.
- Add at least two specific numbers to the abstract. Not p-values — effect sizes, sample counts, or confidence intervals.
- Add ORCIDs for every author to the preprint page. If your co-authors do not have them, five minutes each on orcid.org solves it.
- Link your lab or institutional homepage to the preprint with descriptive anchor text (not "here" or "link").
- On your lab site, create a dedicated page for the paper that includes the abstract, a plain-language summary, the DOI, the author list with ORCIDs, and a
ScholarlyArticleJSON-LD block. - Add a genuine FAQ section on that page — three questions that reflect what a PI in your field would actually ask about the paper.
- When you update the preprint (revisions, corrigenda, new supplementary data) also update the
dateModifiedon the lab-site page. This is the signal retrieval systems use to decide what is current. - If you are preparing a new preprint, run the title through a plain-English test: would an educated reader outside your subfield guess, from the title alone, what question the paper answers?
Half of that list is about the preprint itself and half is about a secondary page under your own control. The secondary page is the part most researchers skip, and it is also the part with the highest return. It is the difference between relying on a generic landing page that was not designed to be cited by a retrieval system, and controlling the source of truth that the retrieval system actually sees.
Is this the end of the preprint as a discovery primitive?
No. Preprints are more important now than they were a year ago, not less. The earlier literature on the preprint citation advantage found bioRxiv-deposited papers attracting 49 percent more attention and 36 percent more citations than matched non-deposited controls. More recent analyses — especially those tracking COVID-era preprints — reported even larger effects, with some journals seeing a fivefold citation lift for preprint-distributed papers compared with directly submitted ones. The underlying value proposition of posting a preprint early and publicly has not moved.
What has changed is how much of that value a researcher captures by default. Depositing a paper on bioRxiv in 2018 put it in front of every serious life-sciences reader, because serious life-sciences readers read bioRxiv directly. Depositing a paper on bioRxiv in 2026 still gets it indexed, but a growing fraction of the readers who would have found it via Google will now read an AI-generated summary that cites somewhere else. The preprint is still the source; it is just no longer the surface.
The practical response is to treat the preprint as the primary record and the discoverability layer as a separate, maintainable thing. Every large journal publisher has been quietly doing this for years — they wrap a press release, a plain-language summary, and a structured landing page around each paper specifically to control what retrieval systems cite. What has historically been a big-publisher capability is now implementable at the lab level with a few hours of work.
Where Academic SEO fits in — and where it does not
I want to be direct about this because there is a lot of bad advice in circulation. Academic SEO, the service I run, exists specifically to close the gap between "this is a good paper on bioRxiv" and "this is a good paper on bioRxiv that retrieval systems can cite confidently." We generate structured landing pages, run 115-point audits against the criteria described in this post, and maintain the GEO-readiness of the resulting pages as Google re-tunes the model. That is the paid service.
Most of what is in this post, though, a PI can do alone in an afternoon with no tools beyond a text editor and five minutes per co-author on orcid.org. If you do nothing else, do the first-200-words rewrite and the ORCID linking. Those two steps address the two largest 2026 signals and they cost nothing. The audit, the Review schema, the FAQPage block, the per-paper preview images — all of that is upside on top of a foundation you can lay yourself.
The uncomfortable framing that the March 2026 update forces on biomedical research is that presentation now affects discoverability for reasons that have nothing to do with the quality of the underlying science. That is not a moral judgement on Google or on retrieval systems. It is a fact about how the reading layer now works. Treating it as such — as an engineering problem with a known set of signals — is the fastest way to stop losing citations you have already earned.
Frequently asked questions
Did the March 2026 update penalise research papers?
No. The update targeted scaled content abuse — specifically, template-generated pages that add no unique value. Research papers, by definition, are original content. The indirect effect on preprints is that the bar for being seen has moved: a bare preprint landing page now sits below secondary pages with richer schema and clearer structured data, even when the preprint is the primary source. The science is untouched; the presentation layer is what has been re-weighted.
Are AI Overviews removing citations to academic sources entirely?
Not removing, but systematically under-weighting. Across 130,000 health queries analysed in early 2026, academic journals and research papers accounted for 0.48 percent of all AI Overview source citations, while YouTube was cited more often than any hospital website. The gap is technical rather than prestige-based: retrieval systems rank candidate sources by how cleanly they can extract and attribute claims, and most preprint landing pages expose very little extractable structure.
Is it worth optimising a paper I have already published?
Yes, if you care about its continued visibility. The highest-return interventions on an existing paper are (1) updating the preprint with ORCID links and a tightened abstract, and (2) creating a structured landing page for the paper on your lab site. Both take under an hour per paper. Neither requires a journal to reopen the manuscript. For papers more than two years old, the gain per hour is lower than for a current preprint, but the intervention is still net positive because the pool of competing optimised pages is still relatively small.
Will bioRxiv and medRxiv fix this at the platform level?
Probably yes over the medium term. The openRxiv non-profit that took over bioRxiv and medRxiv in March 2025 has signalled that platform-level metadata improvements are on the roadmap. In the meantime, there is no reason individual labs cannot layer their own optimised page on top. The preprint stays canonical. The lab-controlled page becomes the citation surface.
Does any of this matter for fields other than biomedical research?
The underlying mechanisms do, but the intensity varies. AI Overviews fire more aggressively on health queries than on any other category. Physics, computer science, and mathematics queries still return mostly traditional ten-blue-link pages. If you are a biomedical PI, the changes in this post are immediately relevant. If you work in a field where AI Overviews are still rare, the same changes are still useful insurance for the next two years, because the biomedical pattern is the leading indicator for the rest of scholarly search.
Want a 115-point audit of your preprint?
We run the checks in this post — plus a hundred more — against your title, abstract, metadata, and citation readiness, and we produce a short report you can act on before your next revision.
Submit your paper →