| by Arround The Web | No comments

Amin Bandali: Mirroring Protesilaos’ videos to Internet Archive

I enjoy reading and watching the writings and videos that
Protesilaos publishes on his website, with his work ranging from
philosophy and various life issues to GNU Emacs and programming.
Currently, Prot uploads his videos to YouTube and embeds them on his
website. YouTube, diligently working their way down the spiral of
enshittification, have been making it increasingly difficult to
watch the videos without using their nonfree JavaScript interface
or their nonfree mobile applications. This got me thinking about
mirroring Prot’s videos to the Internet Archive to make them more
easily accessible in freedom.

To mirror all of Prot’s videos to the Internet Archive is a nontrivial
task: as of the time of this writing, there are a total of 298 videos
uploaded to Prot’s YouTube channel. Thankfully, Prot makes publicly
available the git repository containing the sources used to build his
website, and we have several excellent tools at our disposal to help
extract the information we need and carry this out.

Note: Prot publishes his works under free/libre copyleft
licenses like CC BY-SA 4.0 and GPLv3+, so we do not violate his
copyright by sharing or redistributing his work so long as we do it
with proper credit, following the terms of the licenses.

The idea is to write a program that would walk through the set of
markdown files in the source repository for Prot’s website and for
each file that has a mediaid metadata field, download the video
with that ID from YouTube using yt-dlp, and upload it along with
accompanying metadata to the Internet Archive using the
internetarchive Python module. Given that these two key tools are
written in Python, I opted to use Python for my own implementation
as well. (I initially started the implementation as a POSIX shell
script, but then decided that I would like the convenience of a
‘proper programming language’ and being able to interact with these
tools through their respective API, so I ported what I had to Python
and continued there.)

The full implementation is available at
protesilaos_videos_archive.py. Note that some of the required
modules are not part of Python’s standard library, namely markdown,
yt-dlp, and internetarchive. You can install these using your
distribution’s package manager or using pip, the Python package
manager.

The script takes several command line arguments. There is a required
positional argument for specifying the directory to search through
(recursively) for markdown files. Normally, this would be the path
to your local copy of the source repository for Prot’s website.
There are also two --cookie-file and --working-dir options for
optionally specifying the path to a cookie file for use with yt-dlp
and the working directory for storing the downloaded videos and the
progress file, respectively. Considering YouTube’s somewhat
aggressive rate-limiting of IPs, if you will be downloading a
nontrivial number of videos, you will probably want to use
--cookie-file to specify the file that contains cookies from a
YouTube session. (You would log into YouTube using your account,
then use an add-on like cookies.txt to extract and save your
session’s cookies into a text file.)

Example invocation of the program:

./protesilaos_videos_archive.py --cookie-file=cf.txt ~/src/protesilaos.gitlab.io

Also, considering the large number of videos to be downloaded and
uploaded, making this a long-running task, I thought it would be
helpful to allow interrupting the work partway through by stopping
the program by pressing Ctrl-c in the terminal to send a SIGINT.
Upon receiving a SIGINT, the program will stop the work after the
current download or upload is finished, and write the progress to
a progress file, .pva-progress.jsonl, which it will use on the
next run to resume the work where it was left off.

As of the time of this writing, all of the videos published by Prot
on his YouTube channel have been mirrored to the Internet Archive, and
are available from the Video Publications by Protesilaos Stavrou
collection.

I’ll wrap up by thanking Prot for clarifying the license of his
video publications and for his blessing for me to mirror them on
the Internet Archive. Thanks, Prot. 🙂

Take care, and so long for now.

P.S. yt-dlp has a --write-description option, which causes it to
write a .description file along with the downloaded video containing
its description text from YouTube. I still opted to go with the above
approach of using each post’s body text as ‘description’ in part
because the markdown source file for each video post contains more
metadata fields that I was planning on uploading to the Archive
anyway.

Share Button

Source: Planet GNU