2026-07-11 –, P2P Portal
How modern science gets funded, conducted, published, and preserved was never engineered, it emerged, and what emerged simply doesn't work anymore. This working session takes a real paper, breaks it into atoms, and walks those atoms through a proposed stack of DIDs, ATProto Lexicons, and IPFS. On this walk, we are foraging for the gaps in a technical substrate that enables persistent, atomic, compositional science.
Why this matters for the dweb
The scientific record runs on a preservation substrate nobody designed. It emerged from grant cycles, journal economics, and decades of host migrations. And what emerged loses the record as a matter of course. Registries vanish on grant cycles. Repositories disappear when a host migrates servers... 191 of them since 2012, at a 12-year median operational age. A stolen laptop carries off the only copy — six Iranian salt-lake populations gone that way, and the lake they came from has since lost 88% of its surface area, so the data can never be recreated. A retraction collapses into an erasure rather than an audit trail. The underlying data of 73–93% of published research can no longer be produced on request.
Simultaneously, the existing substrate forces the scientific record to ship as a monolith: figure, method, claim, dataset, and software all welded into one PDF, addressable only at the level of the whole document and opaque to anything else.
Even more simultaneously... that record sits on someone else's server, with no recourse for the people who depend on it when the service stops, the host migrates, or the funder cuts the line.
This systemic failure is one of single-copy, outsourced, non-verifiable architecture, and there is now both regulatory and economic tailwinds behind solving it. Compliance regimes are hardening from self-attestation to verifiable proof: NIST SP 800-171 and CMMC are landing on DoD-grant institutions; the Gates Foundation built VeriXiv to check whether grantee data is actually deposited, not just stated; and the EU Data Act forces research data to stay in-jurisdiction with auditable controls. The federal mandate regime is moving from "did you write a data management plan?" to "did you do it, and can you prove it?" On the economic side, that 73–93% of irretrievable data means a representative R1 institution carries roughly $1.1 billion per year in unverifiable research output — a latent liability the False Claims Act's implied-certification doctrine is available to surface as funder verification goes programmatic — while the grant lines funding today's hosted repositories are themselves being cut, taking the infrastructure down with them. The status quo is getting more expensive to keep, and more expensive to lose.
The only durable answer is content-addressed payloads, distributed mirroring, append-only provenance, and resolvers that cannot 404. Sovereign infrastructure adds the question of control: who decides whether a service stays up, and what happens to the data when they decide otherwise? Local-first systems, self-hosted nodes, and decentralized preservation networks turn "someone else's server" into infrastructure the people who depend on it actually own, and the economics hold up: networks of institutions pooling their own hardware are a real counterweight to privatized data centers and offer unique business and sustainability opportunities.
Every layer of the architectural substrate necessary for sovereign, resilient data infrastructure — the kind that enables persistent, atomic, and compositional science — is being built at DWeb Camp. Self-sovereign identity. Append-only signed records. Content-addressed storage. Federated transport. The pieces of the next substrate for the scientific record already exist. The work ahead is to assemble them into a substrate scientific institutions can actually adopt.
What we'll do together
This is a working session. Participation starts immediately and runs all the way through. Three pathways are open during the hands-on segment so no one is gated out by background — architects, builders, and anyone curious about what this means for their own work or community all have a clear role.
90 minutes, structured for participation:
- 0:00–0:03 — Welcome + room setup. Quick read of who's in the room — researchers, builders, organizers, artists, librarians, students.
- 0:03–0:13 — Opening jam: what's broken? In pairs, each person shares one moment of trying to find a dataset that vanished, replicate a result that turned out to depend on unrecoverable code or data, follow a citation into a repository that's been shut down, or trust an audit trail that turned out to have gaps. We surface the patterns on a shared pad. The room's collective diagnosis becomes the working brief for the rest of the session.
- 0:13–0:21 — What's already pushing this forward. Eight minutes on the forcing functions that have made this architectural shift no longer optional:
- NIST SP 800-171 and the CMMC enforcement regime landing on DoD-grant institutions this year, with NASA and NIH extending similar cybersecurity requirements into their own grants — research data infrastructure has to satisfy mechanically verifiable compliance, third-party assessed rather than self-attested from November 2026.
- The 2025 NIH and NSF grant terminations — over 4,000 grants cut — taking their hosted data infrastructure down with them as the funding lines that kept them up disappear.
- The Gates Foundation's January 2025 open-access policy shift — ending its article-processing-charge program, requiring grantees to post preprints, and building VeriXiv to check whether the underlying data is actually deposited in an approved repository rather than trust a self-reported statement — pulling the philanthropic side of the funding ecosystem onto the same verify-don't-attest trajectory.
- Parallel sovereignty mandates in the EU and the US sovereign-cloud push, forcing institutions to keep research data in-jurisdiction with verifiable controls.
- The Dana-Farber December 2025 settlement — $15M under the False Claims Act for misrepresented data in NIH-funded publications — a signal that federal enforcement of research integrity is sharpening.
None of this is hypothetical anymore. The architecture this room builds primitives for is the architecture that satisfies what's already coming due.
- 0:21–0:33 — The proposal, shown not told. Live walkthrough of a real artifact: rdf.scios.tech, a recent paper of mine decomposed into a discourse graph where every claim, every piece of evidence, every method is independently addressable. I'll use its Generate a Narrative feature to compose, live, a fresh reading path through the graph and a dated view that regenerates as evidence accumulates. That's the experience layer: compositional science happening in real time. Then the proposed substrate underneath, a four-layer stack: Decentralized Identifiers for sovereign identity, Discourse Graphs for the semantic grammar of scientific claims, ATProto Lexicons (carried by Personal Data Servers and AT-URIs) for federated transport, and IPFS for content-addressed permanent archival. We take one atom from that graph, a specific claim, and trace it through the proposed substrate on the diagram: signed under a DID, minted with an AT-URI and CID in a PDS, broadcast over a research-scoped relay, unpacked block-level onto IPFS. That's the proposal on the table. The rest of the session is the room deciding whether it holds.
- 0:33–0:48 — Five failure points, room-driven. A stack like this introduces real engineering risk. I name five primary failure points — relay centralization, lexicon isolation, edge rot, opaque archives, and retractions/audit-trails. The room votes on which two are most worth digging into, and we spend the rest of this block stress-testing those two with proposed mitigations.
- 0:48–1:18 — Hands-on jam, three pathways. Pick whichever fits your work and curiosity:
- Architects — small groups attack and extend a proposed mitigation. Outputs go to the shared pad as a meta-graph of the substrate's architecture, critique included.
- Builders — for people who actually run ATProto, IPFS, DIDs, or IPLD. Tear down the proposed stack: does it compose, where does it break, what's missing. Output is a ranked gap list.
- Use-cases & questions — for everyone whose curiosity is "what would this mean for my work / my community / my organization / the systems I live with." Map use cases, surface unanswered questions, and contribute them back to the meta-graph. Artists, organizers, librarians, scientists, students, and citizen-science communities all have a clear role here — and so, explicitly, do the researchers the current scientific record locks out first: those at under-resourced institutions, independent researchers with no institutional backing, and scientists in the Global South priced out by article-processing charges and stranded when a single host goes down.
- 1:18–1:26 — Synthesis. Each pathway reports out in 90 seconds. The room reads the graph the session itself just produced.
- 1:26–1:30 — Open invitation. The future scientific record is antifragile or it isn't durable. The work continues at the Institute of Open Science Practices in Leiden this October, with the ATProto, IPFS, Akave, and DeSci folks already RSVP'd to keep building this stack. DWeb Camp attendees who want to continue have a direct on-ramp.
The session is designed to adapt to room size: works as well with 15 people (one group per pathway) as with 45 (multiple groups per pathway). No prerequisites. Strong familiarity with any of (DIDs, ATProto, IPFS, IPLD) deepens the architects and builders pathways but is explicitly not required to participate. The use-cases pathway is the entry point for anyone who wants to be in the conversation without needing to know the primitives.
Jonathan Starr is the Executive Director of the Open Source Endowment, a 501(c)(3) building a community-managed permanent endowment for critical open source infrastructure. He also directs SciOS and the Institute of Open Science Practices, where he coordinates researchers and technologists building sustainable infrastructure for open science. His work spans funding mechanisms, coordination systems, and the shared technical substrate connecting diverse scientific systems.
