Grant Writing

NIH Data Management and Sharing Plans: What Reviewers Actually Check

13 April 2026 9 min read

When PIs sketch a Data Management and Sharing Plan (DMSP) for an NIH application, most treat it like a compliance checkbox. Write something about depositing data in a repository, set a timeline, move on. That is a structural mistake that costs proposals.

Here is the asymmetry most PIs miss: at NIH, the DMSP is reviewed separately from your science by programme staff, not by peer reviewers. Programme staff are not asking "Is this an innovative solution?" They are asking "Does this violate policy?" That is a different test, with a different bar — but it is also a test where weak answers genuinely hurt you. A vague DMSP signals lack of planning and creates friction at the compliance stage, delaying award activation and raising questions about your project management. At NSF, the situation is inverted: data management is scored as part of merit review, meaning a weak plan directly damages your competitiveness score.

Key takeaway

The DMSP is programme-staff reviewed, not peer-reviewed. The test is compliance with the 2023 NIH DMS Policy, not persuasion. Weak answers here do not cost you points directly, but they do create post-award friction and leave you vulnerable if circumstances force a change later. Your DMSP also becomes a binding commitment that shows up annually in your RPPR — make a promise you can actually keep.

This post walks through the 2023 NIH Data Management and Sharing Policy, the six elements that define a strong DMSP, common failures, and how FAIR principles apply in practice. I finish with a checklist for writing one that will pass compliance and survive contact with reality.

The 2023 NIH Data Management and Sharing Policy: what it requires

In January 2023, NIH released a new policy on research data management and sharing, effective for applications submitted after 25 January 2023. The policy applies to all NIH-funded research that generates scientific data — with narrow exceptions for clinical trial data, which has its own infrastructure.

The policy is built on a single principle: research data funded by the public should be deposited in a repository and made available for reuse in a timely manner, with metadata that lets people find it. That is not new. What changed in 2023 is the mechanics.

Before 2023, NIH asked for a Supplementary Data Sharing Plan as a separate attachment. The process was messy, the guidance was scattered across different IC websites, and compliance varied wildly. In 2023, NIH consolidated this into a single 2-page DMSP section of the Research Plan. The change forced clarity: the plan now sits next to your Significance and Innovation sections, and programme staff read it with the same scrutiny they apply to everything else.

The 2023 policy also introduced a key shift: data sharing is no longer optional. Your choice is not "Will you share?" but "When and how will you share?" If you have legitimate reasons to restrict access — HIPAA constraints, patient privacy, materials agreements with collaborators — you describe those explicitly. The burden is on you to justify the restriction, not on NIH to justify the requirement.

How NIH evaluates the DMSP differently from NSF

This is worth a digression, because the two agencies are not equivalent.

At NIH: Programme staff review the DMSP for policy compliance and feasibility. They check whether your plan aligns with the 2023 policy, whether the repository you name is real and appropriate for your data type, whether your timeline is reasonable, and whether you appear to understand what you are committing to. This is a pass-fail gate: a DMSP that fails compliance creates a delay in award activation while programme staff ask for revisions. But compliant DMSPs are treated equally — a programme officer does not score a "good" DMSP above a "bare-minimum but compliant" DMSP. This has a tactical implication: once your DMSP clears compliance, writing more about data sharing beyond the two pages is wasted effort. Write enough to be clear and defensible, then stop.

At NSF: The data management plan (DMP) is scored by reviewers as part of the merit review. It typically counts for 10-20 percent of the overall score. A weak DMP directly damages your competitiveness score, and a strong DMP can lift a borderline proposal. This is why NSF DMPs are often more elaborate — you are pitching reviewers, not just informing programme staff. If you apply to both agencies, build the NIH DMSP first to the compliance standard, then expand it for the NSF reviewers.

The six elements of a strong DMSP

The NIH DMSP checklist has six sections. Here is what each one should contain:

1. Data types and amount

Name the data types your project will generate: raw sequencing reads, processed gene expression matrices, microscopy images, patient survey responses, et cetera. Specify scale in concrete terms. Not "large datasets" but "approximately 2 TB of whole-genome sequencing reads" or "500 images at 4 GB each." This grounds your plan in reality and helps programme staff assess whether your proposed repository can handle the volume.

2. Tools, software, code

List any software, scripts, or custom tools that are essential for reusing the data. If someone downloads your dataset, what do they need to interpret it? If your analysis depends on a Python pipeline you built, say so. Commit to releasing it under an open licence (GPL, MIT, Apache 2.0) or explain why you cannot. Code that processes your data is part of reproducibility; pretending it does not exist is a compliance gap.

3. Standards for metadata and file format

This is where many plans fail. PIs write "data will be annotated with appropriate metadata" and leave it at that. Name the standard. If you are depositing genomics data in GEO, your metadata follows the Minimum Information About a Microarray Experiment (MIAME) standard. If you are depositing structural biology data in PDB, you use the PDBx/mmCIF format. If you have custom data, you reference an established metadata schema — DataCite, Dublin Core, or a discipline-specific standard like MIAMIQA for mass spectrometry. Do not invent standards. Use existing ones and explain why you chose them.

4. Preservation, access, and timelines

When will data become available, and for how long will you preserve it? A concrete answer looks like: "Gene expression data will be deposited in GEO within 6 months of completion of the corresponding analysis, no later than the final budget period. Raw sequencing reads will be deposited in the Sequence Read Archive (SRA) immediately upon receipt. Both repositories commit to indefinite preservation. De-identified patient survey data will be embargoed for 12 months to allow publication, then released via the Inter-university Consortium for Political and Social Research (ICPSR) for 10 years." Note the timeline, the justification for embargo, the repository commitment to preservation, and the post-award period covered.

5. Access, distribution, and reuse considerations

Here is where you address constraints. If data must remain restricted due to privacy concerns, intellectual property, or materials agreements, describe the restriction and justify it. Do not vague it. "Some data will be restricted" fails. "Patient data will remain restricted due to HIPAA constraints; de-identified phenotype data and processed counts matrices will be released" passes. If access requires a data use agreement, say so and commit to reviewing requests promptly. If data will be available on request only (rather than in a public repository), explain why — this is rare at NIH and requires strong justification.

6. Oversight and monitoring

Who in your lab is responsible for ensuring deposits happen on schedule? How will you track compliance? This section is often left blank because PIs assume "the PI will do it," but writing it down forces clarity. A concrete answer: "The postdoctoral fellow listed as key personnel will submit de-identified data to ICPSR by [date]. The PI will review the submission and confirm deposit within [timeframe]. A shared spreadsheet will track submission status and timelines."

FAIR principles in practice — what "findable" actually means

The acronym FAIR — Findable, Accessible, Reusable, Interoperable — appears everywhere in data policy now. Most PIs know the words but misunderstand what they mean operationally.

Findable does not mean "I put it on the internet somewhere." It means: (1) the dataset has a persistent, unique identifier (a DOI); (2) metadata about the dataset are indexed by search engines and data discovery tools; (3) the metadata follow a standard schema that machines can parse. A dataset in Zenodo without structured metadata is like a book in a library with no catalogue entry — it exists physically but is not findable except by accident. A deeper discussion of FAIR findability and its connection to AI discovery is available in a companion post.

Accessible means the data can be retrieved by humans and machines. For public data, this is straightforward — direct download or API access. For restricted data (e.g., due to privacy), it means a defined access mechanism exists, even if access is limited.

Interoperable means the data are in a format and with metadata that let someone else use them without heroic effort. CSV is interoperable. An image in a proprietary microscope format without documentation of the acquisition settings is not.

Reusable means the data come with sufficient documentation (methods, protocols, software) that someone can actually use them for a new analysis. Reusable is the hardest of the four.

Key takeaway

FAIR is not a marketing term. It is a checklist: Does the dataset have a DOI? Are the metadata machine-readable? Is the format standard? Is access defined and trackable? Your DMSP should commit to FAIR principles by name and then say which repository enforces them on your behalf.

Choosing the right repository

The repository you name in your DMSP matters. Programme staff will verify it is real, active, and suitable for your data type.

Domain-specific repositories are always preferred. If one exists for your field, use it. Genomics: GEO, SRA, dbGaP. Structural biology: PDB. Proteomics: ProteomeXchange. Metabolomics: MetaboLights. Patient-level genetic data with consent restrictions: dbGaP. These repositories enforce metadata standards, issue DOIs, and commit to long-term preservation. When programme staff see that you are depositing in the appropriate domain repository, they move on. When they see you are using Google Drive, they ask questions.

When no domain repository exists, use a generalist repository. Zenodo (CERN-backed, indefinite preservation), Dryad, Open Science Framework (OSF). Do not use general file storage as a substitute — it is not a repository.

Repository selection also affects discoverability. Data deposited in domain repositories are indexed by specialty search engines and aggregators (e.g., Google Dataset Search, bioPortal). Data in generic repositories reach fewer downstream discovery systems. This matters for your post-award goals: if data are hard to find, they are less likely to be reused, and reuse is how grants create research value. Choose repositories that are discoverable.

Common DMSP failures and how to avoid them

Vagueness

The most common failure is a plan that says "Data will be shared in an appropriate repository" and leaves it at that. Programme staff want names: which repository? Why that one? If you do not know yet (e.g., early-stage exploratory work), say so: "Data type and volume are still being determined. By Month 3, we will assess data characteristics and select a repository from the following candidates: [list]. The PI will document the selection rationale in the first RPPR." This is honest and defensible.

Mismatched repository

Genomics data in Zenodo. Patient survey data in GitHub. These are not violations, but they flag to programme staff that you may not understand what the target repository was designed for. Use domain repositories. If no domain fit is available, acknowledge the mismatch in your plan: "No established repository serves [data type]; we will deposit data in Zenodo with the following metadata standards: [names standard]."

Ignoring embargo periods

A DMSP that commits to immediate public release but does not account for the publication lag sets you up for a conflict later. NIH allows reasonable publication embargoes — typically 6 to 12 months. Build that into the plan: "Data will be embargoed for 12 months to allow peer review and publication, then released publicly. This timeline aligns with the expected publication date of [month/year]." Vague embargoes ("data will be released upon publication") fail because "upon publication" can stretch if the paper takes time to review. Give a hard date.

Underselling costs

De-identification, anonymisation, and metadata curation cost time and money. Some PIs hide this by saying "the PI will manage data sharing." Programme staff know this is unsustainable. If you need budget to pay a data manager or analyst for de-identification and deposit work, put it in. A salary line for "Data Management and Sharing" is explicitly allowed.

Not addressing constraints

If your data include identifiable human subjects, proprietary information, or material transfer agreements that limit sharing, do not pretend they do not exist. Name the constraint, describe the limitation it imposes, and commit to releasing what you can when you can. "Patient data will remain restricted due to HIPAA; processed, de-identified count matrices will be released." This is far better than "all data will be shared" followed by a post-award crisis when you realise you cannot actually share the original records.

Missing the annual reporting requirement

This is the trap many PIs walk into. Your DMSP does not end when the grant is awarded. The commitments you make in the DMSP appear in your annual RPPR (Research Performance Progress Report). If you said "data will be released 6 months after completion," the RPPR asks "Has this happened? If not, why not?" Deviations require documentation. Plan for this reality when you write the original DMSP. Make commitments you can meet.

Your DMSP is a binding commitment with annual reporting

Here is the asymmetry that surprises PIs: the DMSP is not a vision statement. It is a contract between you and NIH. Every commitment you make — the repository you name, the timeline you set, the standards you commit to — becomes a line item in your post-award compliance obligations.

At the end of each budget period, you submit an RPPR. In the "Other Products" section, you report on data sharing progress. If your DMSP said "data will be deposited in GEO by Month 12," the RPPR will ask whether that happened. If it did, you report success. If it did not, you must explain why and provide a revised timeline. Repeated deviations can affect future funding decisions and create friction at no-cost extension time.

This is why the most important quality of a strong DMSP is honesty. Do not write a plan that looks impressive but assumes perfect execution and zero friction. Write a plan you can actually meet. If you are running a ten-person lab generating multi-terabyte datasets, budget for data management. If you are a single-PI project with modest data volumes, say so. NIH will approve reasonable plans from both labs. It will scrutinise over-confident plans from either.

Checklist: Writing a DMSP that passes compliance and survives reality

The FAQs

"Is the DMSP actually scored by peer reviewers?"

At NIH, no — programme staff evaluate it for policy compliance, not scientific merit. At NSF, yes — the data management plan is scored as part of the merit review, typically accounting for 10-20 percent of your overall score. Know which agency you are applying to and adjust accordingly.

"What if I do not yet know what data I will generate?"

Be honest. Write: "The specific data types depend on [preliminary results / funding decisions / etc.]. We anticipate [categories]. By Month 3 of the project, we will have sufficient data to select a repository and finalise the metadata standards. We commit to documenting that decision in the first RPPR." This is far better than guessing and then deviating post-award.

"Can I change my DMSP after the award?"

Yes, with programme officer approval. Any significant changes — switching repositories, delaying access, restricting data further than planned — must be documented in your RPPR with justification. Programme officers are reasonable about legitimate changes in circumstances, but undisclosed deviations can affect future funding. Always communicate with your programme officer first.

"How long do I need to preserve data after the project ends?"

NIH expects data to remain available for the duration of the award plus a reasonable period afterward — typically a minimum of three to five years. Many repositories commit to long-term preservation (10+ years or indefinitely). Your DMSP should specify timelines explicitly and identify which repository will hold the data. The preservation obligation transfers to the repository once you deposit.

"What is the single most important thing to get right in a DMSP?"

Honesty. Do not write a plan that assumes perfect execution and zero friction. Write a plan you can actually meet. NIH will approve reasonable plans from all types of labs. It scrutinises over-confident plans. The most common failure is not a technical one — it is a PI who commits to something they cannot deliver and then creates friction at the compliance stage.

Strengthen your competitive edge in grant writing

A strong DMSP is part of demonstrating project maturity and understanding of current funding landscape. Read our guide to NIH grant writing for a broader view of how all sections of your application come together.

Read the NIH Grant Writing Guide