Parallel shells with xargs: Utilize all your cpu cores on UNIX and Windows
One particular frustration with the UNIX shell is the inability to easily schedule multiple, concurrent tasks that fully utilize CPU cores presented on modern systems. The example of focus in this article is file compression, but the problem rises with many computationally intensive tasks, such as image/audio/media processing, password cracking and hash analysis, database Extract, Transform, and Load, and backup activities. It is understandably frustrating to wait for gzip * running on a single CPU core, while most of a machine's processing power lies idle.
This can be understood as a weakness of the first decade of Research UNIX which was not developed on machines with SMP. The Bourne shell did not emerge from the 7th edition with any native syntax or controls for cohesively managing the resource consumption of background processes.
Utilities have haphazardly evolved to perform some of these functions. The GNU version of xargs is able to exercise some primitive control in allocating background processes, which is discussed at some length in the documentation. While the GNU extensions to xargs have proliferated to many other implementations (notably BusyBox, including the release for Microsoft Windows, example below), they are not POSIX.2-compliant, and likely will not be found on commercial UNIX.
Historic users of xargs will remember it as a useful tool for directories that contained too many files for echo * or other wildcards to be used; in this situation xargs is called to repeatedly batch groups of files with a single command. As xargs has evolved beyond POSIX, it has assumed a new relevance which is useful to explore.
Why is POSIX.2 this bad?
A clear understanding of the lack of cohesive job scheduling in UNIX requires some history of the evolution of these utilities.