GNU Guix: Source code archiving in Guix: new publication
We are glad to announce the publication of a new research paper entitled
Source Code Archiving to the Rescue of Reproducible
Deployment for the ACM Conference
on Reproducibility and Replicability.
The paper presents work that has been done since we started connecting
Guix with the Software Heritage (SWH)
archive
five years ago:
The ability to verify research results and to experiment with
methodologies are core tenets of science. As research results are
increasingly the outcome of computational processes, software plays a
central role. GNU Guix is a software deployment tool that supports
reproducible software deployment, making it a foundation for
computational research workflows. To achieve reproducibility, we must
first ensure the source code of software packages Guix deploys remains
available.We describe our work connecting Guix with Software Heritage, the
universal source code archive, making Guix the first free software
distribution and tool backed by a stable archive. Our contribution is
twofold: we explain the rationale and present the design and
implementation we came up with; second, we report on the archival
coverage for package source code with data collected over five years and
discuss remaining challenges.
The ability to retrieve package source code is important for researchers
who need to be able to
replay
scientific workflows, but it’s just as important for engineers and
developers alike, who may also have good reasons to redeploy or to
audit past
package sets.
Support for source code archiving and recovery in Guix has improved a
lot over the past five years, in particular with:
- Support for recovering source code tarballs (
tar.gz
and similar
files): this is made possible by
Disarchive, written by
Timothy Sample.
- The ability to look up data by nar
hash in the
SWH archive (“nar” is the normalized archive format used by Nix
and Guix), thanks to fellow SWH hackers. This, in turn, allows Guix
to look up any version control checkout by content
hash—Git, Subversion, Mercurial,
you name it! - The monitoring of archival coverage with Timothy’s Preservation of
Guix reports has allowed us
to identify discrepancies in Guix, Disarchive, and/or SWH and to
increase archival coverage.
94% of the packages in a January 2024 snapshot of Guix are known to have
their source code archived!
Check out the paper to learn more
about the machinery at play and the current status.
Source: Planet GNU