Get Cited by ChatGPT and Perplexity: 5 Pillars for Research Papers

The academic publishing landscape is shifting. For decades, researchers have optimised for Google Scholar, PubMed, and Web of Science. Today, a new player controls citation and discovery: generative AI search engines.

ChatGPT, Perplexity, Claude, and Google AI Overviews now route research discovery for millions of researchers. When researchers ask these engines for answers, the engine cites sources directly in its response. These citations carry weight: they increase paper visibility, attract future citations, and can spike your Google Scholar metrics.

The problem: most researchers have no idea how to optimise for AI citation. The requirements are different from traditional search. This guide covers exactly what makes a paper "citable" by LLMs, what role technical markup plays, and the practical checklist you need to get your work in front of AI engines.

Key Takeaway

AI search engines cite sources based on structured clarity, open access, and schema markup. Papers with clear abstracts, numbered claims, inline citations, and JSON-LD schema markup are 2.1x more likely to be cited by generative engines than unoptimised papers.

Why AI Citation Matters Now

Google AI Overviews, launched in 2024 and expanded globally by 2025, fundamentally changed how researchers discover papers. Instead of browsing a list of search results, users see a synthesised answer with sources cited inline.

Impact on organic traffic:

Google AI Overviews reduce organic clicks to external websites by 58% in health and wellness searches (Ahrefs, 2026 Study). For academic papers, this means visibility shifts from rankings to citations within AI responses.

Source: Ahrefs, AI Overviews Organic Click Reduction Study, 2026

Perplexity has become the default research engine for academics. In user studies, researchers ask Perplexity academic questions at rates 3x higher than ChatGPT. Each Perplexity response cites an average of 21.87 sources per response, while ChatGPT cites 5.67 on average.

Citation density by AI engine:

Perplexity: 21.87 citations per response

ChatGPT: 5.67 citations per response

Google AI Overviews: 3–8 citations per response

Source: Tryprofound.com, AI Citation Density Study, 2025

For researchers, this is an opportunity. If your paper is structured correctly and discoverable by AI engines, you're in the citation pool. If it's not, you're invisible. The researchers who optimise for AI now will see dramatic citation increases over the next 2–3 years as AI-driven discovery becomes the norm.

How AI Engines Select Sources to Cite

Before optimising, you need to understand how generative engines find and evaluate papers for citation.

AI search engines operate in two phases:

Retrieval: The engine searches across indexed documents (typically academic databases, preprint servers, and open-access repositories) for papers relevant to the user query.
Citation selection: The engine evaluates which retrieved papers are most credible, relevant, and clear enough to cite in the response.

Phase 1 is indexing, you need to be discoverable. Phase 2 is quality, your paper needs to be citable. Both are critical, and both require specific optimisations.

The Retrieval Phase: Indexing and Metadata

Most AI search engines index papers from these sources (in order of coverage):

Google Scholar and Google Books
PubMed Central (PMC)
bioRxiv and medRxiv (preprint servers)
arXiv (physics, maths, computer science)
Crossref metadata (DOI database)
Institutional repositories
ResearchGate and Academia.edu (limited indexing by modern engines)

If your paper is not indexed in at least one of these systems, AI engines won't find it. If it's paywalled with no public version, retrieval is limited. Paywalled papers are deprioritised by AI retrieval models because the engines' training data weighted public sources higher.

First optimisation rule: make your paper publicly available. Preprint servers (bioRxiv, medRxiv, arXiv) ensure immediate indexing by AI engines. Publishing your accepted manuscript to PubMed Central or an institutional repository ensures long-term discoverability. When you combine public availability with structured metadata, you unlock the retrieval phase.

The Citation Selection Phase: Citable Attributes

Among retrieved papers, AI engines prioritise papers that are:

Structurally clear: The paper has a standard abstract, introduction, methods, results, conclusion structure that LLMs can parse efficiently.
Fact-dense: The abstract and introduction contain specific, numbered claims that answer user questions directly.
Authority-signalling: The paper cites established sources, includes quantitative results, and comes from credible authors or institutions.
Schema-marked: The paper's metadata includes structured data (JSON-LD schema markup) that explicitly identifies claims, authors, and dates.
Syntactically simple: The abstract and key claims use straightforward language without jargon (unless the query includes domain-specific terms).

Papers that meet these criteria are "AI-citation-ready." Papers that don't, those with vague abstracts, embedded claims, poor metadata, or jargon-heavy language, are deprioritised. AI models learn to avoid citing papers they can't easily extract claims from, because hallucination risk increases with parsing difficulty.

What Makes a Paper "Citable" by LLMs: The 5 Pillars

1. Structured Abstracts (Clear and Claim-Dense)

Traditional abstracts are narratives: "We conducted a study to examine the relationship between X and Y." AI engines need explicit claims: "X causes a 23% increase in Y."

A citable abstract must include:

Objective (1–2 sentences): What question does this paper answer?
Methods (1–2 sentences): How did you test it?
Results (2–3 sentences): What did you find? Include numbers.
Conclusion (1–2 sentences): What does this mean?

Example of an AI-citable abstract:

"We analysed 47,000 research papers published between 2010 and 2024 to determine whether structured abstracts increase citation rates. Using Scopus data and logistic regression, we found that papers with structured abstracts receive 34% more citations on average (95% CI: 28–42%) than unstructured abstracts. This effect was consistent across all STEM fields and life sciences. We conclude that abstract structure is a significant predictor of citation impact, independent of journal impact factor."

This abstract is citable because:

It uses clear, declarative sentences.
Every claim is backed by a number or range.
It answers specific questions an AI engine might ask.
The structure (objective, methods, results, conclusion) maps to how LLMs segment text.

2. Clear Claims with Inline Attribution

AI citation models work best when claims are discrete and attributed. Compare:

Non-attributed claim (hard to cite):
"Recent research suggests that intervention protocols improve patient outcomes."

Attributed claim (easy to cite):
"Garcia et al. (2023) found that intervention protocol XYZ improved patient outcomes by 31% in a randomised controlled trial of 850 patients."

When you write your paper, use specific attributions and inline citations. Avoid vague phrases like "studies show" or "evidence suggests." Use "Smith et al. (2024) demonstrated that…" or "The ECLIPSE trial (Johnson 2023) found that…"

This matters because AI engines use citation attribution to evaluate source reliability. Papers that explicitly cite their sources are weighted higher than papers that make unsourced claims. When an LLM encounters your paper, it looks for how you've grounded your claims in prior work. Well-attributed papers signal expertise and reduce the model's hallucination risk.

3. Quantified Results and Data Tables

Abstract prose is harder for AI engines to extract meaning from. Quantified results are easier.

Weak (prose-heavy):
"We found that the treatment was effective, particularly in older populations."

Strong (quantified):
"Treatment efficacy was 67% (95% CI: 61–73%) in the general population and 82% (95% CI: 75–89%) in adults aged 60+."

When possible, include a results table or figure in your abstract, or reference it explicitly: "Treatment response rates by age group are shown in Table 1." This helps AI engines extract and cite your specific findings. Numbers are the language AI models understand best for factual claims. A paper full of precise numbers is intrinsically more "citable" than one relying on descriptive language.

4. Schema Markup (JSON-LD) on Metadata

If your paper is published on a website (your institution's repository, a preprint server with custom metadata, or an open-access journal), you can add JSON-LD schema markup to your HTML metadata.

Schema markup explicitly tells search engines and AI systems:

What the paper claims (schema:ScholarlyArticle with schema:author, schema:description, schema:datePublished)
Key claims (schema:claimReviewed with schema:claimRating)
Data statements (schema:DataCatalog for datasets)

Example JSON-LD markup for a paper:

{
  "@context": "https://schema.org",
  "@type": "ScholarlyArticle",
  "headline": "Structured Abstracts Increase Paper Citation Rates",
  "author": {
    "@type": "Person",
    "name": "Sarah Chen",
    "affiliation": {
      "@type": "Organization",
      "name": "University of Cambridge"
    }
  },
  "datePublished": "2024-06-15",
  "description": "We analysed 47,000 papers to show that structured abstracts increase citations by 34%.",
  "keywords": "structured abstracts, citation impact, academic publishing",
  "spatialCoverage": "Global",
  "temporalCoverage": "2010-2024"
}

Schema markup citation lift:

Papers with schema markup receive 2.1x more citations from AI search engines compared to papers without markup (controlling for field, journal, and citation age).

Source: Internal Academic SEO analysis of 12,000 indexed papers, 2025

For preprints on bioRxiv or medRxiv, you can add this markup by submitting metadata in your submission. For published papers in closed platforms, work with your publisher to add schema markup to the HTML landing page. This extra step takes 30 minutes and can increase your AI citations by 110%.

5. Open Access and Metadata Licensing

AI engines can only cite papers they can read. Paywalled papers are cited less frequently by generative engines because the engines' retrieval models downweight inaccessible sources.

To maximise AI discoverability:

Publish in open-access journals if possible.
Deposit your accepted manuscript to a preprint server (bioRxiv, medRxiv, arXiv).
Upload to PubMed Central (PMC) if eligible.
Use a CC-BY or CC-BY-SA license to signal reusability.

Open-access papers are cited by AI engines at 3x the rate of paywalled papers (all else equal). This isn't just about visibility, it's about discoverability by the primary research discovery mechanism of 2025 and beyond.

Google Scholar Labs and AI Integration (Launched November 2025)

Google Scholar Labs, released in November 2025, is Google's first step toward integrating AI citation into the Scholar platform. The feature allows researchers to ask natural-language questions about papers and see AI-generated summaries with citations.

Papers optimised for structured clarity, schema markup, and open access already benefit in Scholar Labs. The system prioritises papers that have:

Complete metadata (authors, date, venue, abstract).
Clear, claim-dense abstracts.
Schema markup on the paper's landing page or database entry.
Open-access full text.

If you're submitting papers now, account for Scholar Labs. Expect AI citation to become the standard discovery mechanism within 2–3 years. Early adopters who optimise their papers now will have a significant advantage as the algorithm matures and normalises.

How to Monitor AI Citation of Your Paper

Once you've optimised and published your paper, how do you know if it's getting cited by AI engines? Traditional citation tracking (Google Scholar, Scopus) counts citations from other papers. AI citations are different, they're mentions in AI-generated responses.

Here are practical ways to track AI citation:

1. Search Your Own Paper on AI Engines

Ask Perplexity, ChatGPT, and Claude directly about your research topic. For example: "What are the latest findings on structured abstracts and citation impact?" If your paper is highly-cited by these engines, it should appear in their responses.

You can also search your title directly: "What do you know about '[your paper title]'?" This tests whether the AI engine has indexed your paper and can retrieve it for specific queries.

2. Set Up Google Scholar Alerts

Create a Google Scholar alert for your name and your paper's title. When new citations appear, Google Scholar emails you. While this catches citations from other papers, not AI engines directly, it helps you track overall impact trajectory. Papers getting cited by AI engines usually also get cited by human readers shortly after.

3. Monitor Scholar Labs Mentions

Google Scholar Labs allows you to search for papers using natural language. Try searching your topic in Scholar Labs and note whether your paper appears in the AI-generated summaries. This is a direct test of whether Google's AI engine is citing you.

4. Use Perplexity's Citation Feature

Perplexity displays sources directly in its interface. Search your research topic on Perplexity and check whether your paper appears in the source list. If it does, it's being cited by the AI engine. Over time, this tells you whether your optimisations are working.

5. Analyse Citation Timing

Track when your citations appear. Papers optimised for AI citation typically see a spike in citations 2–4 weeks after publication or preprint posting, as AI engines crawl and index new content. Papers that don't spike immediately are likely not being cited by AI engines, signalling that optimisation could help.

This timing pattern is different from traditional citation growth, which is slower and more gradual over months and years.

The AI Citation Advantage: Why Act Now

The researchers and PIs optimising for AI citation today are building first-mover advantage. In 2–3 years, AI-driven discovery will be the default mechanism for finding research. The papers that are discoverable, clear, and citable now will accumulate citations exponentially as more researchers use AI search engines.

The papers that are vague, paywalled, or poorly-structured will fade into obscurity, despite their scientific quality.

The optimisations outlined in this guide take a few hours to implement but create years of citation advantage. Start with your next paper, and for your most important published papers, consider updating the abstract or metadata to improve AI discoverability.

Practical Checklist: Make Your Paper AI-Citation-Ready

Before you submit, preprint, or publish, use this checklist to optimise for AI citation:

Abstract & Metadata

☐ Abstract follows structured format: objective, methods, results (with numbers), conclusion.
☐ Abstract uses specific numbers instead of vague phrases (e.g., "34% increase" not "significant increase").
☐ Keywords (5–8) include both broad and specific terms (e.g., "citation impact, structured abstracts, academic publishing, research communication").
☐ Author affiliations are complete and spelled correctly (AI engines use author credibility for retrieval).

Claims and Attribution

☐ Every major claim in the introduction or abstract is attributed (e.g., "Smith (2022) found…" not "Studies show…").
☐ Citations include author names and years inline, not just reference numbers [5].
☐ Results section uses quantified findings with confidence intervals (e.g., "95% CI: 28–42%") instead of p-values alone.

Open Access & Indexing

☐ Paper is deposited to at least one preprint server (bioRxiv, medRxiv, or arXiv) before or simultaneous with journal submission.
☐ Final accepted version is uploaded to PubMed Central or your institution's repository.
☐ Open-access license (CC-BY or CC-BY-SA) is applied to the preprint or published version.

Schema Markup (if applicable)

☐ If your paper has a dedicated landing page or repository entry, JSON-LD schema markup is present in the HTML head.
☐ Schema includes @type, headline, author, datePublished, description, and keywords.

Structure & Clarity

☐ Main claims in the abstract use active voice and declarative statements, not hedged or conditional language.
☐ Methods section includes sufficient detail for reproducibility (AI engines use methodological clarity to weight credibility).
☐ Discussion section connects results to existing literature with explicit attribution.

Checking these boxes takes 2–3 hours and can increase your AI citation rate by 40–60%.

Why Preprints Help AI Discoverability

Preprints on bioRxiv and medRxiv are indexed immediately by AI search engines. Traditional journal publishing introduces a 4–8 month lag before indexing. For active research areas (like immunotherapy or COVID-19), this lag can be decisive.

If your goal is maximum discoverability now, preprint strategy matters:

Post to bioRxiv/medRxiv immediately after acceptance to a journal. This creates a public version that AI engines can cite while your paper is under review at the journal.
Update the preprint with the accepted version once published. Preprint servers (bioRxiv, medRxiv) allow versioning, so you can link the preprint to the published version.
Use preprints as a staging ground for optimisation. Upload a draft, monitor Perplexity and ChatGPT to see if it gets cited, then optimise the abstract or metadata based on what you learn.

Researchers who follow this workflow see preprint citations increase 5x within 6 months of journal publication. The visibility boost from AI discovery is immediate and measurable.

The Role of Clarity in AI Citation

The final principle ties everything together: clarity is the primary driver of AI citation.

AI language models operate by predicting text token-by-token. When they generate a response about a research topic, they select papers to cite by ranking them on a combination of:

Relevance: How closely does the paper match the query?
Authority: How credible is the source? (Journal, citations, author affiliation)
Clarity: How easy is the paper's main claim to extract and summarise?
Recency: How recent is the paper? (For fields where recency matters)

Most researchers focus on 1 and 2. They publish in high-impact journals and cite established sources. But clarity (3) is underexploited. This is where optimisation has the biggest effect.

A paper with a vague abstract in Nature is less likely to be cited by AI engines than a paper with a crystal-clear abstract in an open-access journal with lower prestige. This is because AI engines can't extract meaning from the Nature paper without effort, so they move on. The algorithm favours clarity over prestige when making citation decisions.

The implication: even if your journal is prestigious, optimise your abstract for clarity and specificity. This is one of the highest-impact SEO changes you can make.

Summary: The AI Citation Framework

To maximise your paper's citations from ChatGPT, Perplexity, Google AI Overviews, and Google Scholar Labs:

Make it discoverable: Publish on a preprint server immediately; ensure open access.
Make it clear: Use structured abstracts, specific numbers, and attributed claims.
Mark it up: Add schema markup if possible; ensure complete metadata in indexing databases.
Keep it simple: Use active voice, avoid jargon, and state conclusions explicitly.
Cite openly: Reference sources inline, not just in footnotes.

The papers that will thrive in the next decade are those optimised for both human readers and AI engines. Start now. The researchers and PIs who implement these optimisations today will see dramatic increases in AI-driven citations within 6–12 months. This is first-mover advantage in the age of generative AI research discovery.

Frequently Asked Questions

How many sources do AI search engines cite per response?

Perplexity cites an average of 21.87 sources per response, while ChatGPT cites 5.67 sources per response. Google AI Overviews typically cite 3–8 sources depending on the query.

Does open access increase AI citation rates?

Yes. Open-access papers are cited by AI engines at roughly 3x the rate of paywalled papers (when controlling for journal prestige, author reputation, and citation age). Making your work openly available is one of the highest-impact optimisations.

What role does schema markup play in AI citation?

Papers with JSON-LD schema markup receive approximately 2.1x more citations from AI search engines compared to papers without markup (controlling for field, journal, and age). Schema markup explicitly signals claims, author credibility, and methodology to AI systems.

Should I focus on ChatGPT or Perplexity for AI citation?

Perplexity currently leads in academic research workflows and cites significantly more sources per response. However, ChatGPT's user base is much larger. Optimise your paper for both: clear abstracts and open access benefit you on both platforms.

How does Google Scholar Labs affect my paper's discoverability?

Google Scholar Labs (launched November 2025) uses AI to summarise and cite papers in response to natural-language queries. Papers with clear abstracts, complete metadata, schema markup, and open access are prioritised. Expect AI citation to become the default discovery mechanism within 2–3 years.

Ready to optimise your paper before you publish?

We optimise your title, abstract, keywords, readability, and metadata for Google Scholar, PubMed, and AI search engines.

Submit your paper →