What did Google's March 2026 core update actually change?

The update ran from 27 March to 8 April 2026 and explicitly named 'scaled content abuse' as a policy violation. Sites producing thousands of near-identical template pages, with or without AI assistance, reported ranking losses of 60 to 90 percent. Legitimate programmatic content survived when it contained unique structured data that an AI summariser could not reconstruct from other web sources.

How often do Google AI Overviews appear in healthcare searches?

WebFX analysed more than 130,000 health-related queries in 2026 and found AI Overviews appear in 51 percent of healthcare searches, roughly double the 16 percent baseline across all industries. In Germany the figure rises to 82 percent. Following a Guardian investigation in January 2026 that surfaced dangerous advice for pancreatic cancer patients, Google withdrew AI Overviews from a narrower set of medical queries, but the default for most biomedical searches remains an AI-generated answer.

Why are academic papers cited so rarely in AI Overviews?

An analysis of AI Overview citation sources found that only 0.48 percent of cited URLs point to academic journals or research papers, while YouTube is cited more often than any hospital website. The gap is almost entirely technical: preprint landing pages and journal abstracts expose far less structured metadata than a well-optimised content site, so retrieval systems have little to anchor a citation to. This is a solvable problem, not a prestige problem.

What is the single most useful thing a PI can do this week?

Rewrite your preprint abstract so that the first two sentences answer a specific question with specific numbers, and make sure your ORCID identifier is linked on the preprint page. Those two steps address the two biggest 2026 signals in one sitting: first-200-words factual density for generative engines, and entity recognition for the Knowledge Graph that underpins AI Overview sourcing.

Preprints Lost Google Traffic in 2026: a Recovery Guide for PIs

Q: Does the March 2026 update hurt preprints on bioRxiv or medRxiv?

No. The update targets template-generated pages that add no unique value, which preprints by definition do not resemble. The indirect consequence is that the bar for being seen has moved: a bare preprint landing page now sits below secondary pages with richer schema and clearer structured data, even when the preprint is the primary source. Presentation now affects discoverability for reasons that have nothing to do with the quality of the underlying science.

Between 27 March and 8 April 2026, Google rolled out its first core update of the year. The public discussion has focused on the familiar story, another wave of template-spam sites losing 60 to 90 percent of their traffic, another round of SEO consultants shouting about schema markup. That is not why PIs should be paying attention.

The reason this update matters for research is quieter and more uncomfortable. In the months leading up to it, Search Engine Land and industry analysts documented a pattern that had already begun to reshape biomedical search: AI Overviews, launched in 2024, now sit on top of roughly half of all healthcare-related queries. And when those AI Overviews reach for a source to cite, they almost never reach for a research paper.

Key takeaway

An analysis of more than 130,000 health queries found that Google AI Overviews appear in 51 percent of healthcare searches, roughly double the 16 percent all-industry average, yet only 0.48 percent of their cited sources are academic journals or research papers. The gap is not a prestige problem. It is a technical one, and it is fixable at the preprint level.

This post walks through what the March 2026 update actually changed, what AI Overviews look like for biomedical content today, and why a bioRxiv or medRxiv landing page in its default form is now structurally disadvantaged compared with a well-annotated secondary page. I finish with a practical checklist, the things I would do first if the preprint were mine.

What the March 2026 core update actually changed

Core updates are re-tunings of Google's ranking models. They do not target individual sites by name. What they do is shift the weighting of signals, so that the same page can rise or fall purely because the model now asks a slightly different question of it. The March 2026 update is the first core update to explicitly name a policy violation in its public documentation: scaled content abuse.

The definition is narrower than it sounds. Google's Search Central documentation describes three patterns that now earn a penalty: mass AI-generated pages without editorial review, pure template-with-variable-substitution at scale, and aggregator sites that copy public data without adding context. The update was a crackdown on content farms, the kind of sites that produce ten thousand near-identical "best cheap [thing] in [city]" pages and hope that sheer volume will catch long-tail queries.

The fallout was severe. Multiple analyses agreed on a rough shape: programmatic sites built around repetitive templates lost between 60 and 90 percent of their organic traffic over the 12-day rollout. Sites built around unique structured data, local directories with verified listings, comparison tools with live pricing, literature databases with real metadata, were largely untouched. A few actually gained, because the spam they had been competing with is gone.

On a plain reading, none of this has anything to do with research papers. The update is not after your preprint. But the indirect consequences are where it gets interesting, because Google re-tuned the model that decides which version of a piece of information to surface when multiple versions exist. And preprint landing pages, as they are currently rendered, lose that contest more often than they should.

How do AI Overviews fit into biomedical search?

The elephant in the room is that, for biomedical queries in particular, the search engine results page itself is no longer the whole answer. A year ago, a researcher typing "CRISPR base editing off-target effects" would see ten blue links. Today, the first thing they see is a paragraph of synthesised prose with three to eight inline citations, followed by the links.

For health-related queries this shift is especially aggressive. WebFX analysed more than 130,000 health queries in early 2026 and found AI Overviews on 51 percent of them, roughly double the baseline rate of 16 percent across all industries. In Germany, where Google appears to be testing a more permissive configuration, the rate reaches 82 percent. For any PI who assumes their work reaches readers through ranked search results, the base-rate change alone is worth pausing on.

AI Overviews in healthcare search, early 2026

• All-industry average: 16 percent of queries trigger an AI Overview.

• Healthcare queries (US): 51 percent trigger an AI Overview.

• Health queries in Germany: 82 percent.

• Citations to academic research or medical journals: 0.48 percent of all AI Overview source URLs.

• Citations to reliable medical sources more broadly: 34 percent, meaning two-thirds come from non-authoritative sources.

Sources: WebFX health AI Overview analysis, 2026; Guardian investigation and Google response reporting, January 2026.

Figure 1. AI Overviews sit on top of roughly half of all US healthcare searches and over four in five German health searches, compared with a 16 percent all-industry baseline. The displacement effect on traditional search results is largest in exactly the vertical where primary research matters most. Source: WebFX 2026 analysis of 130,000+ health queries; industry baselines compiled from Ahrefs and BrightEdge AI Overview tracking data.

The citation split is the part that should keep PIs up at night. When AI Overviews cite sources for a health question, the sources are more often YouTube videos, consumer health blogs, and wellness aggregators than the primary literature. A Guardian investigation in January 2026 documented the consequences in stark terms: an AI Overview recommended that pancreatic cancer patients avoid high-fat foods, advice that multiple oncologists publicly described as dangerous. Google responded by pulling AI Overviews from a narrower set of medical queries, but the default for the vast majority of biomedical searches remains an AI-generated answer stitched together from whatever the retrieval system found easiest to parse.

The takeaway is not that Google dislikes primary research. There is no evidence of that. The takeaway is that the retrieval layer underneath AI Overviews has a strong preference for sources that are easy to ingest, and a bare preprint landing page is harder to ingest than a YouTube transcript.

Why are preprints rarely cited in AI Overviews?

This is the question I care about most, and the answer is unsatisfying in a useful way: it is almost entirely technical. There is nothing wrong with the science on bioRxiv or medRxiv. There is quite a lot wrong with how that science is rendered to a non-human reader.

Walk through what a modern retrieval system sees when it lands on a preprint page. Most preprint servers expose a minimalist HTML page built around the abstract, a list of authors, a DOI, and a download link. That is enough for a human reader. It is not enough for a system that ranks candidate sources by how quickly and confidently it can extract discrete, attributable claims. Compared with a well-optimised secondary source, the preprint landing page is missing:

Schema.org structured data. A well-constructed page exposes a ScholarlyArticle JSON-LD block with author.url, datePublished, dateModified, publisher, and a description. Most preprint landing pages expose a Highwire Press meta tag set and very little else. The result is that the retrieval system knows the title and the DOI but not what the paper claims.
A Knowledge Graph entity anchor. In 2026, Google's retrieval layer leans heavily on entity recognition, matching a name on a page to a persistent identity in the Knowledge Graph. An ORCID on the page is a start. An ORCID linked to an Organization schema for the lab or institution is considerably better. Preprint landing pages rarely get this far.
A FAQPage or Question/Answer block. Research by several 2025 teams converged on a median 22 percent lift in AI citation rates when a page added a genuine FAQPage schema. The lift is largest in Perplexity and Bing Copilot, smaller in ChatGPT, and measurable in Google AI Overviews. Preprint pages have none of this.
Clear answer patterns in the first 200 words. Google's retrieval model, partially disclosed in patent WO2024064249A1, references "information density" and "specificity signals" as factors in passage selection. A narrative-style abstract that buries its main finding in sentence six is at a structural disadvantage against a structured abstract that states the quantified result by sentence two.
A fresh dateModified. Retrieval systems weight recent content for time-sensitive queries. Preprint landing pages show the deposition date and then go quiet, even when the paper has been revised or the community response has moved on. There is no technical reason a landing page cannot also carry a "last indexed" or "last summarised" date, but none of the major preprint servers expose one.

Figure 2. The same paper can present a very different surface to an AI retrieval system depending on whether the landing page exposes structured data. Everything on the right is implementable without changing a single word of the underlying manuscript, and most of it can be added as a secondary lab-site page that links back to the canonical preprint. Citation lift figure from the Geneo 2025 schema markup analysis across 240 content pages; individual page lifts varied between 8 and 44 percent.

None of these gaps is the fault of the author of the paper. bioRxiv and medRxiv have been astonishingly successful on their original terms, they run a non-profit infrastructure that has indexed, at current rates, more than six thousand new manuscripts a month with eight million monthly page views, and their design choices predate the era when a search engine results page started summarising papers before anyone clicked on them. The point is not that the preprint servers are broken. The point is that the landscape around them has changed in ways that their default presentation does not yet reflect.

What signals actually matter to crawlers now?

The unifying theme across every piece of 2026 guidance from Google, from independent SEO research, and from the GEO (generative engine optimisation) community is the same: retrieval systems now reward pages that make the factual claim density obvious, and punish pages that hide it. A few specific signals deserve highlighting because they map directly to changes a PI can influence.

First-200-words answer patterns

AI retrieval systems read the opening of a document with disproportionate weight. A structured abstract that states, in sentence two, "Knockdown of gene X reduced tumour volume by 43 percent in the B16 model at day 21 (n = 12, p < 0.01)" is more extractable than an abstract that builds to the same finding after four sentences of context. This is not a style preference. It is a signal that maps onto how passage-selection models rank candidates.

Figure 3. Passage-selection models, including the one underneath Google's AI Overviews, read the opening of a document with disproportionate weight. An abstract that front-loads a specific, quantified finding inside the first 200 words competes structurally for citations with longer, narrative-style openings that bury the main result. Zone weighting behaviour is consistent with Google patent WO2024064249A1 "information density" and "specificity signals" in passage selection, and with independently replicated GEO experiments published through 2025.

Specificity and numbers

Retrieval systems have been explicitly trained to prefer sources that contain verifiable, quantified claims over sources that contain qualitative assertions. "Significantly improved survival" rates lower than "a 2.3-fold reduction in median tumour burden (95% CI 1.8 to 3.0)". The second phrase anchors to an extractable number that can be cited, verified, and summarised. The first phrase does not.

Entity graph completeness

In 2026, the single highest-value technical addition to any page, scholarly or otherwise, is entity markup that ties content to persistent identifiers. For research, this means ORCID on every author, ROR identifiers for institutions, persistent DOIs for datasets, and a linked sameAs path from the paper to its authors' public profiles. When a retrieval system can say "this claim is made by Jane Doe at the Oxford Centre for Y, who has thirty other indexed papers in this area," its confidence in citing that page rises substantially.

Freshness signals

Most researchers never revise the metadata on a preprint after deposition. Yet freshness is a ranking factor for time-sensitive queries in both traditional search and AI Overviews. A dateModified that moves when the paper is versioned, when a corrigendum is published, or when supplementary data is added is a cheap, accurate signal that almost no one sends.

How big is the gap between a default preprint page and an optimised one?

Honest answer: we do not have a peer-reviewed longitudinal study on this yet. What we do have are a number of converging experiments.

The Geneo 2025 schema markup analysis measured a median 22 percent lift in AI citation rates when a content page added FAQPage schema aligned to its genuine content. A 2026 Ahrefs study on organic click reduction found that pages cited in AI Overviews received roughly 3x the downstream referral traffic of pages that merely ranked for the same query without being cited. Crucially, the pages that earned citations were not always the most authoritative sources; they were the ones with the clearest structured data exposure.

The implication for PIs is that the citation-stack problem is addressable from the outside. You do not need to wait for bioRxiv to rebuild its landing pages. You can create a secondary page for your own paper, on your lab site, a personal page, or a third-party optimisation service, that exposes the structured data the retrieval systems are looking for, and you can link it bidirectionally to your preprint. That page will begin to absorb citation traffic that would otherwise go to a YouTube video.

What should a PI do this week?

Here is the checklist I would work through if I had an hour on a Tuesday afternoon and one preprint I wanted to protect.

The one-hour preprint hygiene checklist

Rewrite the first two sentences of the abstract so they state the quantified finding. Move context to the back half.
Add at least two specific numbers to the abstract. Not p-values, effect sizes, sample counts, or confidence intervals.
Add ORCIDs for every author to the preprint page. If your co-authors do not have them, five minutes each on orcid.org solves it.
Link your lab or institutional homepage to the preprint with descriptive anchor text (not "here" or "link").
On your lab site, create a dedicated page for the paper that includes the abstract, a plain-language summary, the DOI, the author list with ORCIDs, and a ScholarlyArticle JSON-LD block.
Add a genuine FAQ section on that page, three questions that reflect what a PI in your field would actually ask about the paper.
When you update the preprint (revisions, corrigenda, new supplementary data) also update the dateModified on the lab-site page. This is the signal retrieval systems use to decide what is current.
If you are preparing a new preprint, run the title through a plain-English test: would an educated reader outside your subfield guess, from the title alone, what question the paper answers?

Half of that list is about the preprint itself and half is about a secondary page under your own control. The secondary page is the part most researchers skip, and it is also the part with the highest return. It is the difference between relying on a generic landing page that was not designed to be cited by a retrieval system, and controlling the source of truth that the retrieval system actually sees.

Is this the end of the preprint as a discovery primitive?

No. Preprints are more important now than they were a year ago, not less. The earlier literature on the preprint citation advantage found bioRxiv-deposited papers attracting 49 percent more attention and 36 percent more citations than matched non-deposited controls. More recent analyses, especially those tracking COVID-era preprints, reported even larger effects, with some journals seeing a fivefold citation lift for preprint-distributed papers compared with directly submitted ones. The underlying value proposition of posting a preprint early and publicly has not moved.

What has changed is how much of that value a researcher captures by default. Depositing a paper on bioRxiv in 2018 put it in front of every serious life-sciences reader, because serious life-sciences readers read bioRxiv directly. Depositing a paper on bioRxiv in 2026 still gets it indexed, but a growing fraction of the readers who would have found it via Google will now read an AI-generated summary that cites somewhere else. The preprint is still the source; it is just no longer the surface.

The practical response is to treat the preprint as the primary record and the discoverability layer as a separate, maintainable thing. Every large journal publisher has been quietly doing this for years, they wrap a press release, a plain-language summary, and a structured landing page around each paper specifically to control what retrieval systems cite. What has historically been a big-publisher capability is now implementable at the lab level with a few hours of work.

Where Academic SEO fits in: and where it does not

I want to be direct about this because there is a lot of bad advice in circulation. Academic SEO, the service I run, exists specifically to close the gap between "this is a good paper on bioRxiv" and "this is a good paper on bioRxiv that retrieval systems can cite confidently." We generate structured landing pages, run 115-point audits against the criteria described in this post, and maintain the GEO-readiness of the resulting pages as Google re-tunes the model. That is the paid service.

Most of what is in this post, though, a PI can do alone in an afternoon with no tools beyond a text editor and five minutes per co-author on orcid.org. If you do nothing else, do the first-200-words rewrite and the ORCID linking. Those two steps address the two largest 2026 signals and they cost nothing. The audit, the Review schema, the FAQPage block, the per-paper preview images, all of that is upside on top of a foundation you can lay yourself.

The uncomfortable framing that the March 2026 update forces on biomedical research is that presentation now affects discoverability for reasons that have nothing to do with the quality of the underlying science. That is not a moral judgement on Google or on retrieval systems. It is a fact about how the reading layer now works. Treating it as such, as an engineering problem with a known set of signals, is the fastest way to stop losing citations you have already earned.

Frequently asked questions

Did the March 2026 update penalise research papers?

No. The update targeted scaled content abuse, specifically, template-generated pages that add no unique value. Research papers, by definition, are original content. The indirect effect on preprints is that the bar for being seen has moved: a bare preprint landing page now sits below secondary pages with richer schema and clearer structured data, even when the preprint is the primary source. The science is untouched; the presentation layer is what has been re-weighted.

Are AI Overviews removing citations to academic sources entirely?

Not removing, but systematically under-weighting. Across 130,000 health queries analysed in early 2026, academic journals and research papers accounted for 0.48 percent of all AI Overview source citations, while YouTube was cited more often than any hospital website. The gap is technical rather than prestige-based: retrieval systems rank candidate sources by how cleanly they can extract and attribute claims, and most preprint landing pages expose very little extractable structure.

Is it worth optimising a paper I have already published?

Yes, if you care about its continued visibility. The highest-return interventions on an existing paper are (1) updating the preprint with ORCID links and a tightened abstract, and (2) creating a structured landing page for the paper on your lab site. Both take under an hour per paper. Neither requires a journal to reopen the manuscript. For papers more than two years old, the gain per hour is lower than for a current preprint, but the intervention is still net positive because the pool of competing optimised pages is still relatively small.

Will bioRxiv and medRxiv fix this at the platform level?

Probably yes over the medium term. The openRxiv non-profit that took over bioRxiv and medRxiv in March 2025 has signalled that platform-level metadata improvements are on the roadmap. In the meantime, there is no reason individual labs cannot layer their own optimised page on top. The preprint stays canonical. The lab-controlled page becomes the citation surface.

Does any of this matter for fields other than biomedical research?

The underlying mechanisms do, but the intensity varies. AI Overviews fire more aggressively on health queries than on any other category. Physics, computer science, and mathematics queries still return mostly traditional ten-blue-link pages. If you are a biomedical PI, the changes in this post are immediately relevant. If you work in a field where AI Overviews are still rare, the same changes are still useful insurance for the next two years, because the biomedical pattern is the leading indicator for the rest of scholarly search.

Want a 115-point audit of your preprint?

We run the checks in this post, plus a hundred more, against your title, abstract, metadata, and citation readiness, and we produce a short report you can act on before your next revision.

Submit your paper →