| by Arround The Web | No comments

GNU Guix: Source code archiving in Guix: new publication

We are glad to announce the publication of a new research paper entitled
Source Code Archiving to the Rescue of Reproducible
Deployment
for the ACM Conference
on Reproducibility and Replicability
.
The paper presents work that has been done since we started connecting
Guix with the Software Heritage (SWH)
archive

five years ago:

The ability to verify research results and to experiment with
methodologies are core tenets of science. As research results are
increasingly the outcome of computational processes, software plays a
central role. GNU Guix is a software deployment tool that supports
reproducible software deployment, making it a foundation for
computational research workflows. To achieve reproducibility, we must
first ensure the source code of software packages Guix deploys remains
available.

We describe our work connecting Guix with Software Heritage, the
universal source code archive, making Guix the first free software
distribution and tool backed by a stable archive. Our contribution is
twofold: we explain the rationale and present the design and
implementation we came up with; second, we report on the archival
coverage for package source code with data collected over five years and
discuss remaining challenges.

The ability to retrieve package source code is important for researchers
who need to be able to
replay
scientific workflows, but it’s just as important for engineers and
developers alike, who may also have good reasons to redeploy or to
audit
past
package sets.

Support for source code archiving and recovery in Guix has improved a
lot over the past five years, in particular with:

  • Support for recovering source code tarballs (tar.gz and similar
    files): this is made possible by
    Disarchive, written by
    Timothy Sample.

Diagram taken from the paper showing Disarchive tarball “disassembly” and “assembly”.

  • The ability to look up data by nar
    hash
    in the
    SWH archive (“nar” is the normalized archive format used by Nix
    and Guix), thanks to fellow SWH hackers. This, in turn, allows Guix
    to look up any version control checkout by content
    hash
    —Git, Subversion, Mercurial,
    you name it!
  • The monitoring of archival coverage with Timothy’s Preservation of
    Guix
    reports
    has allowed us
    to identify discrepancies in Guix, Disarchive, and/or SWH and to
    increase archival coverage.

Graph taken from the paper showing package source code archival coverage over time.

94% of the packages in a January 2024 snapshot of Guix are known to have
their source code archived!

Check out the paper to learn more
about the machinery at play and the current status.

Share Button

Source: Planet GNU

Leave a Reply