A Human Approach to Computational Science

Manifesto

March 2025 Participants: Anil Madhavapeddy, Nate Foster, Aurojit Panda, Michael Coblenz, Mark Elvers, Thomas Gazagnaire, Cyrus Omar, Jonathan Sterling, Ian Brown.

Computational science uses large-scale computation and data to gain insights into how to solve complex physical problems ranging from the microscopic to the planetary scale. The legitimacy of computational science emerges from a continuous process of hypothesising, experimentation and recombination of code across these datasets. We've asked ourselves how we might recenter science on the numerous people involved in this process, and so propose the following first set of principles to start the conversation. We invite interested people to join our exploration.

1. Rehumanising science for all

Science is a human endeavor shaped by contributions from many individuals, and scientific insights stem from and flow through people. All stakeholders, such as academics, curators, traditional knowledge holders, journalists, reviewers, policymakers, and citizens, should be included in and benefit from the process. Scientific digital infrastructure therefore exists to both support inquisitive exploration and also to distribute the benefits of evidence-driven actions across society.

2. Melding institutions and crowds

Opening the door to public participation in the scientific discourse means broadening access to data and the skills required to analyse it. Today, large institutions play the important role of bringing together researchers under one roof, but emerging digital infrastructure promises new opportunities for participation across institutional and geographic boundaries. We aim to decouple access to the tools for science from institutional affiliation, and thus reduce the barrier to meaningful participation.

3. Supporting positive scientific feedback loops

Academics, journalists, and politicians all collaborating effectively across their specialisms results in maximal return on investments in science. We aim to empower creators to disseminate datasets towards the accumulation of reliable evidence and to reduce ongoing maintenance costs. By tracking positive outcomes, we seek to incentivize the responsible, ongoing curation of datasets.

Modern computational science should leverage both traditional and community-centered sources of insights and data, and enable such groups to take full advantage of digital resources. We want to develop these in an open and collaborative process that follows the principles above. We invite you to work with us!

"Our vision is a world where everybody can access tools to explore, participate in and benefit from computational science."

Reinforcing Use Cases

We are identifying use cases that can be used to help design and iterate on systems towards implementing our manifesto. The use cases are intended to represent needs that different kinds of scientists have. There will be many other additional use cases to bring to light concerns that may not be represented here. Please consider contributing yours!

1. Conservation and sensing

Some large-scale research projects require acquiring, aggregating, manipulating, analyzing, and reporting on a multitude of data sets. Sometimes, the outputs may even connect to real-time systems, such as data dashboards or sensor networks, which need to be configured as a result of the analysis. For example, the IUCN Red List of Threatened Species is derived from a multitude of data regarding species sightings, habitat loss, and climate predictions. Among this data is output from the Centre for Earth Observation, which itself constructs complex models that assimilate data. As another example, a network of sensing buoys may receive control inputs from a control system that sends commands on the basis of weather predictions and mission inputs.

Some inputs must be protected. For example, locations of threatened species may be useful in the analysis but need to be kept private lest poachers destroy those populations. Thus, although this scenario benefits from openness and transparency in general, not everything can be made public.

Investigators: Michael Coblenz, Anil Madhavapeddy, Cyrus Omar

2. Breaking Specialist Training out of the University

Meaningful participation in important scientific discourses requires specialist knowledge. For example, to evaluate the safety of a plan to deploy machine learning in public infrastructure, some understanding of current techniques and their range of applicability is required; likewise, climate data cannot be meaningfully analysed by someone who lacks prior experience in statistics. Access to specialist training of this kind is presently centralised in the universities and gated behind tuition fees and time barriers.

We will answer the erosion of public trust in the scientific process with a credible invitation: join us, learn our methods, and contribute to the discourse. To that end, we aim to build sustainable infrastructure for a decentralised network of publicly accessible online courseware, textbooks, tutorials, and social media that will serve as the roots of a new and serious partnership between the public and the scientific community.

Investigators: Jonathan Sterling, Anil Madhavapeddy

3. Collaboration in the Small

Many research projects necessitate the involvement of inputs from other researchers, including those in a different area. For example:

Research on schedulers depends on the availability of traces (e.g., from large ML clusters or servers running databases for web servers) that are not readily available to the average computer systems researcher. At the same time, these traces are considered sensitive (because they reveal information about cluster size and machine specification) and are not often posted online.
Research on verification and testing requires system specifications, which need to be validated by the programmers who built the software but are not experts in verification. Few groups have sufficient expertise to both write specifications for real-world software and use them for verification tasks, and collaboration is thus key to these efforts.
Research that extends prior work, which constructs data or analysis artifacts, and would benefit from starting from these artifacts.

However, finding other researchers' digital artifacts, and then using (or extending) them is challenging today: we have relatively few mechanisms to discover existing digital artifacts (the ones we have usually require serendipitously reading a research paper or website that describes the artifact); no mechanisms to ensure that contributions from publicly available artifacts are recognized; and nearly no good mechanism to share artifacts among a small set of research groups (each of which might have an evolving set of individual participants). We will address these challenges by developing systems that make it easier to discover digital artifacts, make it easier to track the contributions of each artifact's provider, and reduce friction for sharing artifacts between research groups.

Investigators: Michael Coblenz

Timeline

Upcoming Events

October 2025

Programming for the Planet (PROPL) will bring us together again to refine the manifesto and continue designing our systems for decentralised science.

Past Events

March 2025

We held the Bellairs research summit on planetary computing, where we launched the first version of the manifesto towards human computational science.

March 2025

Jon Sterling publishes the Forester 5.0 design for global identity.

February 2025

Ian Brown contrasts Mastodon vs BlueSky in a deep-dive of their architectures.

January 2025

Thomas Gazagnaire launches SpaceOS to perform scientific computation in orbit.

September 2024

Mark Elvers operates the first capability-based distributed build platform for open source software.

August 2024

Aurojit Panda rethinks the architecture of edge Internet services to bring back end-to-end simplicity.

April 2024

Michael Coblenz designs a theory of scientific programming efficacy.

March 2024

Cyrus Omar talks about building live programming interfaces for planetary computing at PROPL 2024.

February 2024

Anil Madhavapeddy makes the case for planetary computing for data-driven environmental policy-making to handle the ingestion, transformation, analysis and publication of environmental data products.

August 2023

Nate Foster considers how programming languages might help to capture property conveyancing, sparking an interest in the legal applications of ownership.