Research discoverability is the set of signals that make Google Search, AI retrieval tools (ChatGPT, Perplexity, Claude), Google Scholar, and PubMed surface your paper to the researchers who need it. Academic SEO (ASEO) is how you control those signals. This page is the hub: it maps the four surfaces where your paper must appear, the seven pillars you can actually control, and the deeper posts that document each step with data.
The problem it solves is discoverability failure. Published is not the same as findable. A DOI and a PDF on a publisher page are not enough. Most biomedical papers are functionally invisible: they sit behind a login or a subdomain no crawler reaches, with a title that answers no real search query and an abstract whose first sentence says "This study examines" instead of naming the finding. That is a solvable problem, not a prestige problem.
The 2025 shift in how search actually works makes this more urgent. Google's AI Overviews cut organic click-through on AI-Overview queries by roughly 61% (Ahrefs, 2026). AI-tool citations overlap with Google's top 10 only 17 to 38% of the time, down from around 75% a year earlier (ALM Corp, 2026). About 12% of ChatGPT, Claude, and Perplexity citations come from Google top-10 pages (Discovered Labs, 2025). A paper that ranks well in Scholar is no longer guaranteed to surface in the places researchers actually look. You now need to optimise for four surfaces in parallel, not one.
Academic SEO is not about gaming algorithms. It is about removing friction between your research and the people who need to find it. Optimise the seven things you control (title, abstract, keywords, structure, metadata, preprint, AI-readiness) so your paper appears in the four surfaces researchers now use (Google Search, AI tools, Google Scholar, PubMed). The citations follow.
How this guide is structured
The diagram below is the whole argument in one figure. Four surfaces at the top, seven pillars at the bottom, arrows connecting what you control to where your paper appears. The rest of this guide unpacks each arrow, and every deeper post on academicseo.co.uk sits somewhere on this diagram.
Deeper reads, by surface
If you want to skip to the posts that go one level deeper on a specific surface or pillar, the map below is the fastest way in. Each card lists the two or three posts most directly relevant to getting your paper to appear on that surface.
AI Overviews now appear on a large share of health-related queries and cite a mix of review articles, guidelines, and primary research. Winning a citation here depends on indexable HTML, a crawlable full-text page, and structure that the Overview model can quote.
These tools answer direct questions by quoting sources. Your paper gets cited if its URL is reachable, its structure is clean enough for a model to parse, and a sentence inside it directly answers the question a researcher typed.
Still the largest single index of scholarly documents and the reflex tool for most PIs. Citation count dominates the ranking, but for a new paper with zero citations, title keyword match and full-text match decide whether anyone reaches you in the first place.
The biomedical reflex index. Coverage is gated by journal inclusion, so most of the work happens at the journal-selection and metadata-submission stage, not after publication. MeSH indexing and structured abstracts are the levers once you are in.
What is academic SEO?
Academic SEO is the application of search-optimisation principles to research papers and their metadata. The term was formalised in Beel and Gipp's 2009 paper for IEEE RCIS, which defined academic search engines as systems that index "scholarly documents (journal articles, books, conference papers, preprints, theses, dissertations, technical reports, etc.) and provide the possibility of searching them."
The thesis is simple: if researchers and AI systems cannot find your paper, they cannot cite it. Your paper's impact is bottlenecked by its discoverability.
The old assumption was that authors only needed to submit papers to journals and conferences. Peer review, journal reputation, and citation networks would handle visibility. That assumption is no longer safe. Four things changed:
- Researchers now use multiple discovery channels in parallel (Google Search, Google Scholar, PubMed, Semantic Scholar, ResearchGate, institutional repositories, and increasingly AI tools like ChatGPT and Perplexity).
- AI retrieval systems synthesise research and cite sources, but only if those papers are reachable and well-formatted. A paper behind an AI-hostile publisher platform is invisible to this entire surface (see which platforms block AI crawlers).
- Preprints on bioRxiv and medRxiv now rival journal articles in visibility and early citation impact (bioRxiv SEO guide).
- Full-text search, figure-caption indexing, and machine-readable metadata now matter as much as raw citation counts for new papers.
Academic SEO is the response: a deliberate approach to making your research reachable through every channel that now matters.
The four surfaces, in more detail
Researchers and AI systems discover papers through four primary surfaces. Optimising for one does not automatically optimise for the others, because each surface reads different signals from your paper.
1. Google Search + AI Overviews
Google's main search engine is now the front door for a large share of preprint and paper traffic, and the AI Overview that sits above organic results has become a meaningful citation target in its own right. AI Overviews trigger on a large share of biomedical queries, and when they appear, organic click-through drops by roughly 61% on those queries (Ahrefs, 2026). That means being cited inside the Overview matters as much as ranking in the blue links below it.
Ranking here depends on standard web-SEO signals: indexable HTML, working canonical URLs, a reachable full-text page, backlinks, domain authority, page speed, and mobile-friendliness. The specific levers that apply to papers are covered in the research-paper SEO checklist and GEO for biomedical papers.
2. AI retrieval tools (ChatGPT, Perplexity, Claude)
ChatGPT, Perplexity, Claude, and Gemini answer direct questions by quoting a small number of sources. They do not use traditional ranking algorithms. They embed the query, search a corpus (their training data, plus live retrieval APIs), and cite the small set of passages that most directly answer the question.
Two failure modes matter here. First, the AI tool cannot cite your paper if it cannot reach the URL, so journal platforms that block AI crawlers remove your paper from this surface entirely (platform-by-platform breakdown). Second, the tool cannot cite your paper if its structure is too noisy for the model to extract a clean quote, so PDF-only papers without an HTML version, or HTML buried under login walls, are at a large disadvantage. The AI citation study maps which features actually predict being cited by these tools.
AI citation overlap with Google top-10 pages: 17 to 38% (ALM Corp, 2026), 12% (Discovered Labs, 2025)
AI Overview click-through impact: around 61% drop in organic CTR on AI-Overview queries (Ahrefs, 2026)
Sources: ALM Corp 2026 (Google AI Overview citations drop on top-ranking pages); Discovered Labs 2025 (how each AI platform cites sources); Ahrefs 2026 (AI Overviews reduce clicks).
3. Google Scholar
Google Scholar indexes over 400 million academic documents and is the reflex discovery tool for most biomedical researchers. It crawls journal websites, preprint servers, university repositories, institutional research databases, ResearchGate, and other scholarly sources. Ranking is driven by citation count (the dominant signal), full-text keyword relevance, metadata quality, author reputation, and venue reputation.
For a paper with zero citations, the only lever you have is title and full-text keyword match. That means Scholar visibility for a new paper depends almost entirely on whether your title contains the phrases that people actually search for. If your paper is not appearing at all, the diagnostic post is why Google Scholar is not indexing your paper.
4. PubMed
PubMed is the biomedical reflex index. It is gated: coverage depends on journal inclusion, so unlike Google Scholar, you cannot get onto PubMed simply by publishing. Most of the PubMed-visibility work happens before you pick a journal. Once you are in, the levers are MeSH indexing (applied by NLM staff, guided by your title and abstract keywords), structured abstract sections, and correct funding declarations so NIH-funded work is tagged.
Preprints are a partial exception: NIH preprints on medRxiv and Europe PMC can surface in PubMed Central before the journal version appears, which buys you weeks of biomedical-search visibility.
Why most papers get zero citations: the discovery problem
Beel and Gipp's 2009 work established that citation count is the dominant ranking signal in Google Scholar. That creates a chicken-and-egg problem for new papers: they start with zero citations. How do they break through?
The answer is discoverability. A paper has to be found before it can be cited. Discovery happens through:
- Title and keyword match. Researchers search for specific phrases. If your title does not contain phrases they search for, they do not find you. The calibrated study on this site found that title_has_method_term was the single strongest individual lever (β = +0.31, p = 0.001) in a 983-paper regression.
- Abstract clarity. The abstract is the first thing a researcher reads after the title. A dense or hedge-opening abstract ("This study examines…") causes researchers to bounce. The same 983-paper regression found abstract_first_sentence_descriptive_opener was significantly negative (β = −0.28, p = 0.01).
- Metadata richness. Keywords, affiliations, funding information, MeSH terms, and structured data tell search engines and AI systems what your work is about.
- Full-text accessibility. Paywalled papers are less discoverable on Google Search and invisible to several AI retrieval tools.
- Preprint strategy. Publishing a preprint before journal submission increases early discoverability and citation velocity.
- Figure and caption quality. Captions are indexed independently of body text on many platforms and surface in image search and multimodal AI retrieval.
Papers with generic titles, hedge-opening abstracts, thin metadata, no preprint, and no HTML version languish. This is the discovery gap. The "why your paper is not getting cited" diagnostic walks through each of these failure modes.
The 7 pillars of research discoverability
Academic SEO rests on seven pillars. Each contributes to visibility across all four surfaces, though in different proportions.
Pillar 1: Title optimisation
Your title is the first and loudest signal to every surface. A strong biomedical title:
- Puts the primary search phrase in the first 8 to 10 words.
- Names the specific method or intervention used (not just "a novel approach").
- Is specific enough to differentiate your work from similar studies.
- Uses the exact phrasing a working researcher would type into a search box.
Evidence: title optimisation evidence reviews the Paiva, Letchford, and Sagi studies (title-effect estimates 20 to 24%). The calibrated 2026 biomedicine study found a smaller but still significant β = +0.31 effect for naming a method term in the title. The common-terms versus declarative-titles study asks whether you are better off with a plain-language question-style title or a declarative finding-style title for AI-Overview pickup.
Pillar 2: Abstract optimisation
Your abstract is written for three readers at once: the busy PI, the Google Scholar indexer, and the AI retrieval model. Best practice:
- Open with the finding, not a descriptive hedge. "We show that X increases Y by Z" beats "This study examines the relationship between X and Y" by a measurable margin.
- Include the primary search phrase in the first 100 words.
- Structure as problem, methods, results, implications (or the journal's required headings).
- Include quantifiable findings. AI Overviews prefer to quote specific numbers over vague claims (+33% visibility lift from statistics tactic, Aggarwal et al. 2024).
- Pair the technical abstract with a plain-language summary when the journal allows one. Funders increasingly require them, and retrieval systems index them as a second, cleaner signal.
Pillar 3: Keywords and phrases
Academic databases and PubMed's MeSH indexing lean on metadata keywords. For biomedical papers:
- Pick 3 to 5 primary keywords (your main research area, named with the phrasing working researchers use).
- Add 2 to 3 supporting keywords (adjacent topics, methodologies, or reference compounds).
- Every keyword you list should also appear in your full text. Keywords that appear only in the metadata field get discounted.
- Include both narrow technical terms (gene symbols, specific assay names) and broader terms (disease names, method families) so the paper surfaces on both niche and general queries.
Pillar 4: Readability and structure
Papers with clear structure are read more, quoted more by AI tools, and cited more often. Readability and citations reviews the evidence. Concrete practice:
- Use clear section headings (Introduction, Methods, Results, Discussion, Conclusion). AI tools use these as retrieval anchors.
- Keep paragraphs to 3 to 5 sentences.
- Prefer active voice when it is clear who acted.
- Use bullet points for parallel comparisons (a pattern that AI Overview models pick up and reproduce).
Pillar 5: Metadata and figure quality
Structured metadata helps every surface understand your work. For biomedical papers specifically:
- Full author names with correct institutional affiliations and ORCID IDs.
- Funding information and grant numbers (required for NIH-funded work in PubMed).
- Subject classification (MeSH, ACM Classification, arXiv category).
- Open access status and licence.
- Links to code, data, and supplementary materials.
- Descriptive figure captions. Figure-caption SEO covers how captions get indexed independently of body text on many platforms and in multimodal AI retrieval.
Pillar 6: Preprint publication
For STEM fields that allow preprints, posting on arXiv, bioRxiv, or medRxiv before journal submission is the single cheapest lever to pull. It:
- Buys 4 to 6 weeks of early discoverability.
- Generates citations before the journal version exists.
- Signals active research to the community (which feeds the virtuous cycle of reads leading to citations).
- Creates a versioning trail that shows the research evolving.
bioRxiv SEO covers the biomedical-specific practice.
Pillar 7: AI-readiness
As AI retrieval tools become primary discovery surfaces, making your paper AI-ready is the newest pillar. Concretely:
- Publish an HTML version (not just a PDF) that a crawler can parse.
- Provide a structured abstract (background, methods, results, conclusions).
- Make the full text open access when possible.
- State your contribution in a single explicit sentence inside the abstract.
- Publish code and data availability statements in machine-readable form.
- Check that your journal's platform is not blocking AI crawlers (platform-by-platform list).
For a deeper tactical breakdown, see GEO for biomedical papers, which catalogues nine specific tactics from Aggarwal et al. (KDD 2024) with measured visibility lifts.
How Google Scholar actually ranks papers
The Google Scholar algorithm is not fully public, but Beel and Gipp's research and the fifteen years of testing since have identified the main ranking factors.
#1 ranking factor: citation count. Papers with more citations rank higher. This is confirmed across academic-search-engine research.
Secondary factors: full-text keyword relevance, title keyword match, author reputation, venue reputation, recency, metadata quality.
Source: Beel & Gipp, 2009, "Academic Search Engine Optimization," IEEE RCIS. Fifteen years of practical SEO testing since. Full breakdown in Google Scholar ranking: 7 factors and why citations dominate.
Citation count creates a virtuous cycle: well-ranked papers get more visibility, which drives more citations, which improves ranking further. That is why the first citations are decisive. They break the zero-citation barrier.
How AI tools select papers to cite
LLM retrieval behaviour looks nothing like Scholar ranking. Instead of citation-count sorting, AI tools:
- Retrieve via embeddings. The system encodes the query as a vector and finds papers whose embeddings sit closest in semantic space.
- Score by passage-level relevance. Within each paper, specific passages are scored for how well they answer the query. A well-written sentence in an obscure paper can beat a vague sentence in a famous one.
- Filter by availability. The tool can only cite papers it has ingested (training data, or reached via retrieval APIs like Semantic Scholar, OpenAlex, or bioRxiv).
- Parse structured data. Papers with clean HTML, structured abstracts, and machine-readable metadata get cited more often because the model can extract a clean quote.
The implication: for AI citation, clarity and structure matter as much as popularity. An obscure but well-structured paper with rich metadata can be cited by an AI tool before it is cited by human researchers. The AI citation study goes deeper on which specific features predict AI pickup.
Evidence: how much does this actually move citations?
The honest answer is smaller than the marketing literature claims, and larger than zero.
Earlier title-optimisation studies (Paiva, Letchford, Sagi) estimated title effects in the 20 to 24% range. Those numbers are useful but averaged across all of science. The calibrated citation study on this site ran a fresh regression on 983 biomedical papers from 2015 to 2019 and found a smaller, biomedicine-specific effect:
Top-quartile vs bottom-quartile discoverability: 1.07× higher cumulative citations at year 5 (95% CI 1.04–1.11×).
Two individual levers reached significance: title_has_method_term (β = +0.31, p = 0.001) and abstract_first_sentence_descriptive_opener (β = −0.28, p = 0.01).
Null in the matched-pair sensitivity check (t = 1.00, p = 0.32). The headline effect is modest and the lower bound of the confidence interval is close to 1.0×. Caveat: this is one study on one biomedical sample; it does not override the broader prior from earlier work.
Source: Calibrated citation study, 2026, 983-paper OpenAlex biomedicine sample 2015–2019.
Two things to take from this. First, the effects are real but modest. Expect a 4 to 11% lift in cumulative citations from a strong discoverability posture, not a 2×. Second, the two significant levers (naming a method in the title, and not opening the abstract with a hedge) are the two cheapest things you can do. Both are covered in depth in title optimisation and abstract optimisation.
The common-terms versus declarative-titles study adds a second data point on the title-phrasing question.
Your 30-minute discoverability checklist
If you want the tactical version of this guide, the research-paper SEO checklist walks through every pre-publication and post-publication action in order. The short version:
Before submission
- Title: write 10 variations. Search each in Google Scholar and Google. Pick the one that surfaces the most similar prior work. That is the one your audience is searching.
- Abstract: write the first sentence last. It should name the finding in one line. Cut any opener that starts "This study examines" or "The present work aims to."
- Keywords: list 20 candidates. Drop any that do not also appear in your full text. Keep 5 to 8.
- Structure: clear section headings, 3 to 5 sentences per paragraph, active voice where it is natural.
- Metadata: full affiliations, ORCID IDs, funding numbers, data and code availability statements.
- Preprint: pick the right server (arXiv for CS and physics, bioRxiv for life sciences, medRxiv for medicine, ChemRxiv for chemistry). Post 1 to 3 months before journal submission.
- AI-readiness: state your contribution in one sentence. Make sure your preprint has an HTML version. Check whether your target journal's platform blocks AI crawlers.
After publication
- Audit the title and first sentence of the abstract against a fresh set of Google searches. Fix anything that does not surface.
- If the paper is paywalled, deposit a preprint or institutional-repository version. Open-access papers are cited more and picked up by more AI retrieval tools.
- Update your Google Scholar, ResearchGate, and ORCID profiles with the new paper and full metadata.
- Share on relevant academic social channels (Bluesky, departmental newsletters, lab mailing lists), not for vanity but because early reads drive the first citations.
- Cite yourself where it is genuinely relevant in your next paper. Self-citation is a standard part of building a research programme.
The bottom line
Academic SEO is not a shortcut to impact. A badly designed study with excellent SEO will still fail to generate citations. But excellent research that is invisible to the four surfaces researchers now use might as well not exist.
Optimise the seven pillars: title, abstract, keywords, structure, metadata and figures, preprint, AI-readiness. Check your paper's visibility on the four surfaces: Google Search, AI retrieval tools, Google Scholar, PubMed. Fix the gaps. That is the whole of academic SEO.
In a research environment increasingly mediated by search engines and AI retrieval, getting this right is no longer optional. Every post linked from this page is one piece of how to do it.
Frequently Asked Questions
What is academic SEO?
Academic SEO (ASEO) is the practice of optimising research papers and metadata to increase discoverability across four surfaces: Google Search (including AI Overviews), AI retrieval tools (ChatGPT, Perplexity, Claude), Google Scholar, and PubMed. The goal is to increase citation impact by ensuring researchers and AI systems can find your work.
What are the four surfaces a paper needs to appear in?
Google Search (including AI Overviews), AI retrieval tools (ChatGPT, Perplexity, Claude, Gemini), Google Scholar, and PubMed. Each indexes papers differently and each is now a meaningful source of readers. Optimising for one does not automatically optimise for the others.
Is Google Scholar still the most important surface?
It is still the largest single index of scholarly documents (over 400 million), but it is no longer the only surface researchers use. AI retrieval tools cite sources that overlap with Google top-10 only 17 to 38% of the time (ALM Corp, 2026), so a Scholar-only strategy misses a growing share of readers.
Does academic SEO mean manipulating citations?
No. Academic SEO is about removing friction to discoverability, optimising your title, abstract, keywords, figures, and metadata so your work reaches researchers who would benefit from it. It does not involve gaming systems or artificial inflation.
What is the most important ranking factor on Google Scholar?
Citation count is the #1 ranking factor on Google Scholar, confirmed by Beel & Gipp (2009). Secondary factors include full-text keyword relevance, title clarity, author reputation, and recency. For new papers with zero citations, title and abstract keyword match are the levers that actually move position.
Should I publish a preprint before submitting to a journal?
Yes, if your field allows it. Preprints increase early discoverability by 4 to 6 weeks, generate citations before formal publication, and signal active research to the community. Most STEM fields accept preprints; check your target journal's policy.
How much do these changes actually move citations?
In a 2026 regression on 983 biomedical papers from 2015–2019, top-quartile discoverability was associated with a 1.07× higher cumulative citation count at year 5 (95% CI 1.04–1.11×) compared to the bottom quartile, with two individual levers reaching statistical significance: naming a specific method in the title and avoiding a descriptive hedge in the first sentence of the abstract. The effect is modest but real, and it compounds over years.
Ready to optimise your paper before you publish?
We optimise your title, abstract, keywords, readability, and metadata for Google Search, AI retrieval tools, Google Scholar, and PubMed.
Submit your paper →