From 47c1de22df30aa6c4a2f0d8249be93c3c7bc0022 Mon Sep 17 00:00:00 2001 From: Ludovic Courtès Date: Thu, 5 Jan 2023 12:39:06 +0100 Subject: doc: cookbook: Add "Installing Guix on a Cluster" chapter. This is derived from the article at , with clarifications and updates. * doc/guix-cookbook.texi (Installing Guix on a Cluster): New chapter. --- doc/guix-cookbook.texi | 433 ++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 414 insertions(+), 19 deletions(-) diff --git a/doc/guix-cookbook.texi b/doc/guix-cookbook.texi index bbbd1cde67..b9fb916f4a 100644 --- a/doc/guix-cookbook.texi +++ b/doc/guix-cookbook.texi @@ -21,7 +21,8 @@ Copyright @copyright{} 2020 Brice Waegeneire@* Copyright @copyright{} 2020 André Batista@* Copyright @copyright{} 2020 Christine Lemmer-Webber@* Copyright @copyright{} 2021 Joshua Branson@* -Copyright @copyright{} 2022 Maxim Cournoyer* +Copyright @copyright{} 2022 Maxim Cournoyer@* +Copyright @copyright{} 2023 Ludovic Courtès Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or @@ -73,8 +74,9 @@ Weblate} (@pxref{Translating Guix,,, guix, GNU Guix reference manual}). * Packaging:: Packaging tutorials * System Configuration:: Customizing the GNU System * Containers:: Isolated environments and nested systems -* Advanced package management:: Power to the users! +* Advanced package management:: Power to the users! * Environment management:: Control environment +* Installing Guix on a Cluster:: High-performance computing. * Acknowledgments:: Thanks! * GNU Free Documentation License:: The license of this document. @@ -83,28 +85,45 @@ Weblate} (@pxref{Translating Guix,,, guix, GNU Guix reference manual}). @detailmenu --- The Detailed Node Listing --- -Scheme tutorials - -* A Scheme Crash Course:: Learn the basics of Scheme - Packaging -* Packaging Tutorial:: Let's add a package to Guix! +* Packaging Tutorial:: A tutorial on how to add packages to Guix. System Configuration -* Auto-Login to a Specific TTY:: Automatically Login a User to a Specific TTY -* Customizing the Kernel:: Creating and using a custom Linux kernel on Guix System. -* Guix System Image API:: Customizing images to target specific platforms. -* Using security keys:: How to use security keys with Guix System. -* Connecting to Wireguard VPN:: Connecting to a Wireguard VPN. -* Customizing a Window Manager:: Handle customization of a Window manager on Guix System. -* Running Guix on a Linode Server:: Running Guix on a Linode Server. Running Guix on a Linode Server -* Setting up a bind mount:: Setting up a bind mount in the file-systems definition. -* Getting substitutes from Tor:: Configuring Guix daemon to get substitutes through Tor. -* Setting up NGINX with Lua:: Configuring NGINX web-server to load Lua modules. +* Auto-Login to a Specific TTY:: Automatically Login a User to a Specific TTY +* Customizing the Kernel:: Creating and using a custom Linux kernel on Guix System. +* Guix System Image API:: Customizing images to target specific platforms. +* Using security keys:: How to use security keys with Guix System. +* Connecting to Wireguard VPN:: Connecting to a Wireguard VPN. +* Customizing a Window Manager:: Handle customization of a Window manager on Guix System. +* Running Guix on a Linode Server:: Running Guix on a Linode Server +* Setting up a bind mount:: Setting up a bind mount in the file-systems definition. +* Getting substitutes from Tor:: Configuring Guix daemon to get substitutes through Tor. +* Setting up NGINX with Lua:: Configuring NGINX web-server to load Lua modules. * Music Server with Bluetooth Audio:: Headless music player with Bluetooth output. +Containers + +* Guix Containers:: Perfectly isolated environments +* Guix System Containers:: A system inside your system + +Advanced package management + +* Guix Profiles in Practice:: Strategies for multiple profiles and manifests. + +Environment management + +* Guix environment via direnv:: Setup Guix environment with direnv + +Installing Guix on a Cluster + +* Setting Up a Head Node:: The node that runs the daemon. +* Setting Up Compute Nodes:: Client nodes. +* Cluster Network Access:: Dealing with network access restrictions. +* Cluster Disk Usage:: Disk usage considerations. +* Cluster Security Considerations:: Keeping the cluster secure. + @end detailmenu @end menu @@ -3635,6 +3654,380 @@ will have predefined environment variables and procedures. Run @command{direnv allow} to setup the environment for the first time. + +@c ********************************************************************* +@node Installing Guix on a Cluster +@chapter Installing Guix on a Cluster + +@cindex cluster installation +@cindex high-performance computing, HPC +@cindex HPC, high-performance computing +Guix is appealing to scientists and @acronym{HPC, high-performance +computing} practitioners: it makes it easy to deploy potentially complex +software stacks, and it lets you do so in a reproducible fashion---you +can redeploy the exact same software on different machines and at +different points in time. + +In this chapter we look at how a cluster sysadmin can install Guix for +system-wide use, such that it can be used on all the cluster nodes, and +discuss the various tradeoffs@footnote{This chapter is adapted from a +@uref{https://hpc.guix.info/blog/2017/11/installing-guix-on-a-cluster/, +blog post published on the Guix-HPC web site in 2017}.}. + +@quotation Note +Here we assume that the cluster is running a GNU/Linux distro other than +Guix System and that we are going to install Guix on top of it. +@end quotation + +@menu +* Setting Up a Head Node:: The node that runs the daemon. +* Setting Up Compute Nodes:: Client nodes. +* Cluster Network Access:: Dealing with network access restrictions. +* Cluster Disk Usage:: Disk usage considerations. +* Cluster Security Considerations:: Keeping the cluster secure. +@end menu + +@node Setting Up a Head Node +@section Setting Up a Head Node + +The recommended approach is to set up one @emph{head node} running +@command{guix-daemon} and exporting @file{/gnu/store} over NFS to +compute nodes. + +Remember that @command{guix-daemon} is responsible for spawning build +processes and downloads on behalf of clients (@pxref{Invoking +guix-daemon,,, guix, GNU Guix Reference Manual}), and more generally +accessing @file{/gnu/store}, which contains all the package binaries +built by all the users (@pxref{The Store,,, guix, GNU Guix Reference +Manual}). ``Client'' here refers to all the Guix commands that users +see, such as @code{guix install}. On a cluster, these commands may be +running on the compute nodes and we'll want them to talk to the head +node's @code{guix-daemon} instance. + +To begin with, the head node can be installed following the usual binary +installation instructions (@pxref{Binary Installation,,, guix, GNU Guix +Reference Manual}). Thanks to the installation script, this should be +quick. Once installation is complete, we need to make some adjustments. + +Since we want @code{guix-daemon} to be reachable not just from the head +node but also from the compute nodes, we need to arrange so that it +listens for connections over TCP/IP. To do that, we'll edit the systemd +startup file for @command{guix-daemon}, +@file{/etc/systemd/system/guix-daemon.service}, and add a +@code{--listen} argument to the @code{ExecStart} line so that it looks +something like this: + +@example +ExecStart=/var/guix/profiles/per-user/root/current-guix/bin/guix-daemon --build-users-group=guixbuild --listen=/var/guix/daemon-socket/socket --listen=0.0.0.0 +@end example + +For these changes to take effect, the service needs to be restarted: + +@example +systemctl daemon-reload +systemctl restart guix-daemon +@end example + +@quotation Note +The @code{--listen=0.0.0.0} bit means that @code{guix-daemon} will +process @emph{all} incoming TCP connections on port 44146 +(@pxref{Invoking guix-daemon,,, guix, GNU Guix Reference Manual}). This +is usually fine in a cluster setup where the head node is reachable +exclusively from the cluster's local area network---you don't want that +to be exposed to the Internet! +@end quotation + +The next step is to define our NFS exports in +@uref{https://linux.die.net/man/5/exports,@file{/etc/exports}} by adding +something along these lines: + +@example +/gnu/store *(ro) +/var/guix *(rw, async) +/var/log/guix *(ro) +@end example + +The @file{/gnu/store} directory can be exported read-only since only +@command{guix-daemon} on the master node will ever modify it. +@file{/var/guix} contains @emph{user profiles} as managed by @code{guix +package}; thus, to allow users to install packages with @code{guix +package}, this must be read-write. + +Users can create as many profiles as they like in addition to the +default profile, @file{~/.guix-profile}. For instance, @code{guix +package -p ~/dev/python-dev -i python} installs Python in a profile +reachable from the @code{~/dev/python-dev} symlink. To make sure that +this profile is protected from garbage collection---i.e., that Python +will not be removed from @file{/gnu/store} while this profile exists---, +@emph{home directories should be mounted on the head node} as well so +that @code{guix-daemon} knows about these non-standard profiles and +avoids collecting software they refer to. + +It may be a good idea to periodically remove unused bits from +@file{/gnu/store} by running @command{guix gc} (@pxref{Invoking guix +gc,,, guix, GNU Guix Reference Manual}). This can be done by adding a +crontab entry on the head node: + +@example +root@@master# crontab -e +@end example + +@noindent +... with something like this: + +@example +# Every day at 5AM, run the garbage collector to make sure +# at least 10 GB are free on /gnu/store. +0 5 * * 1 /usr/local/bin/guix gc -F10G +@end example + +We're done with the head node! Let's look at compute nodes now. + +@node Setting Up Compute Nodes +@section Setting Up Compute Nodes + +First of all, we need compute nodes to mount those NFS directories that +the head node exports. This can be done by adding the following lines +to @uref{https://linux.die.net/man/5/fstab,@file{/etc/fstab}}: + +@example +@var{head-node}:/gnu/store /gnu/store nfs defaults,_netdev,vers=3 0 0 +@var{head-node}:/var/guix /var/guix nfs defaults,_netdev,vers=3 0 0 +@var{head-node}:/var/log/guix /var/log/guix nfs defaults,_netdev,vers=3 0 0 +@end example + +@noindent +... where @var{head-node} is the name or IP address of your head node. +From there on, assuming the mount points exist, you should be able to +mount each of these on the compute nodes. + +Next, we need to provide a default @command{guix} command that users can +run when they first connect to the cluster (eventually they will invoke +@command{guix pull}, which will provide them with their ``own'' +@command{guix} command). Similar to what the binary installation script +did on the head node, we'll store that in @file{/usr/local/bin}: + +@example +mkdir -p /usr/local/bin +ln -s /var/guix/profiles/per-user/root/current-guix/bin/guix \ + /usr/local/bin/guix +@end example + +We then need to tell @code{guix} to talk to the daemon running on our +master node, by adding these lines to @code{/etc/profile}: + +@example +GUIX_DAEMON_SOCKET="guix://@var{head-node}" +export GUIX_DAEMON_SOCKET +@end example + +To avoid warnings and make sure @code{guix} uses the right locale, we +need to tell it to use locale data provided by Guix (@pxref{Application +Setup,,, guix, GNU Guix Reference Manual}): + +@example +GUIX_LOCPATH=/var/guix/profiles/per-user/root/guix-profile/lib/locale +export GUIX_LOCPATH + +# Here we must use a valid locale name. Try "ls $GUIX_LOCPATH/*" +# to see what names can be used. +LC_ALL=fr_FR.utf8 +export LC_ALL +@end example + +For convenience, @code{guix package} automatically generates +@file{~/.guix-profile/etc/profile}, which defines all the environment +variables necessary to use the packages---@code{PATH}, +@code{C_INCLUDE_PATH}, @code{PYTHONPATH}, etc. Thus it's a good idea to +source it from @code{/etc/profile}: + +@example +GUIX_PROFILE="$HOME/.guix-profile" +if [ -f "$GUIX_PROFILE/etc/profile" ]; then + . "$GUIX_PROFILE/etc/profile" +fi +@end example + +Last but not least, Guix provides command-line completion notably for +Bash and zsh. In @code{/etc/bashrc}, consider adding this line: + +@verbatim +. /var/guix/profiles/per-user/root/current-guix/etc/bash_completion.d/guix +@end verbatim + +Voilà! + +You can check that everything's in place by logging in on a compute node +and running: + +@example +guix install hello +@end example + +The daemon on the head node should download pre-built binaries on your +behalf and unpack them in @file{/gnu/store}, and @command{guix install} +should create @file{~/.guix-profile} containing the +@file{~/.guix-profile/bin/hello} command. + +@node Cluster Network Access +@section Network Access + +Guix requires network access to download source code and pre-built +binaries. The good news is that only the head node needs that since +compute nodes simply delegate to it. + +It is customary for cluster nodes to have access at best to a +@emph{white list} of hosts. Our head node needs at least +@code{ci.guix.gnu.org} in this white list since this is where it gets +pre-built binaries from by default, for all the packages that are in +Guix proper. + +Incidentally, @code{ci.guix.gnu.org} also serves as a +@emph{content-addressed mirror} of the source code of those packages. +Consequently, it is sufficient to have @emph{only} +@code{ci.guix.gnu.org} in that white list. + +Software packages maintained in a separate repository such as one of the +various @uref{https://hpc.guix.info/channels, HPC channels} are of +course unavailable from @code{ci.guix.gnu.org}. For these packages, you +may want to extend the white list such that source and pre-built +binaries (assuming this-party servers provide binaries for these +packages) can be downloaded. As a last resort, users can always +download source on their workstation and add it to the cluster's +@file{/gnu/store}, like this: + +@verbatim +GUIX_DAEMON_SOCKET=ssh://compute-node.example.org \ + guix download http://starpu.gforge.inria.fr/files/starpu-1.2.3/starpu-1.2.3.tar.gz +@end verbatim + +The above command downloads @code{starpu-1.2.3.tar.gz} @emph{and} sends +it to the cluster's @code{guix-daemon} instance over SSH. + +Air-gapped clusters require more work. At the moment, our suggestion +would be to download all the necessary source code on a workstation +running Guix. For instance, using the @option{--sources} option of +@command{guix build} (@pxref{Invoking guix build,,, guix, GNU Guix +Reference Manual}), the example below downloads all the source code the +@code{openmpi} package depends on: + +@example +$ guix build --sources=transitive openmpi + +@dots{} + +/gnu/store/xc17sm60fb8nxadc4qy0c7rqph499z8s-openmpi-1.10.7.tar.bz2 +/gnu/store/s67jx92lpipy2nfj5cz818xv430n4b7w-gcc-5.4.0.tar.xz +/gnu/store/npw9qh8a46lrxiwh9xwk0wpi3jlzmjnh-gmp-6.0.0a.tar.xz +/gnu/store/hcz0f4wkdbsvsdky3c0vdvcawhdkyldb-mpfr-3.1.5.tar.xz +/gnu/store/y9akh452n3p4w2v631nj0injx7y0d68x-mpc-1.0.3.tar.gz +/gnu/store/6g5c35q8avfnzs3v14dzl54cmrvddjm2-glibc-2.25.tar.xz +/gnu/store/p9k48dk3dvvk7gads7fk30xc2pxsd66z-hwloc-1.11.8.tar.bz2 +/gnu/store/cry9lqidwfrfmgl0x389cs3syr15p13q-gcc-5.4.0.tar.xz +/gnu/store/7ak0v3rzpqm2c5q1mp3v7cj0rxz0qakf-libfabric-1.4.1.tar.bz2 +/gnu/store/vh8syjrsilnbfcf582qhmvpg1v3rampf-rdma-core-14.tar.gz +… +@end example + +(In case you're wondering, that's more than 320@ MiB of +@emph{compressed} source code.) + +We can then make a big archive containing all of this (@pxref{Invoking +guix archive,,, guix, GNU Guix Reference Manual}): + +@verbatim +$ guix archive --export \ + `guix build --sources=transitive openmpi` \ + > openmpi-source-code.nar +@end verbatim + +@dots{} and we can eventually transfer that archive to the cluster on +removable storage and unpack it there: + +@verbatim +$ guix archive --import < openmpi-source-code.nar +@end verbatim + +This process has to be repeated every time new source code needs to be +brought to the cluster. + +As we write this, the research institutes involved in Guix-HPC do not +have air-gapped clusters though. If you have experience with such +setups, we would like to hear feedback and suggestions. + +@node Cluster Disk Usage +@section Disk Usage + +@cindex disk usage, on a cluster +A common concern of sysadmins' is whether this is all going to eat a lot +of disk space. If anything, if something is going to exhaust disk +space, it's going to be scientific data sets rather than compiled +software---that's our experience with almost ten years of Guix usage on +HPC clusters. Nevertheless, it's worth taking a look at how Guix +contributes to disk usage. + +First, having several versions or variants of a given package in +@file{/gnu/store} does not necessarily cost much, because +@command{guix-daemon} implements deduplication of identical files, and +package variants are likely to have a number of common files. + +As mentioned above, we recommend having a cron job to run @code{guix gc} +periodically, which removes @emph{unused} software from +@file{/gnu/store}. However, there's always a possibility that users will +keep lots of software in their profiles, or lots of old generations of +their profiles, which is ``live'' and cannot be deleted from the +viewpoint of @command{guix gc}. + +The solution to this is for users to regularly remove old generations of +their profile. For instance, the following command removes generations +that are more than two-month old: + +@example +guix package --delete-generations=2m +@end example + +Likewise, it's a good idea to invite users to regularly upgrade their +profile, which can reduce the number of variants of a given piece of +software stored in @file{/gnu/store}: + +@example +guix pull +guix upgrade +@end example + +As a last resort, it is always possible for sysadmins to do some of this +on behalf of their users. Nevertheless, one of the strengths of Guix is +the freedom and control users get on their software environment, so we +strongly recommend leaving users in control. + +@node Cluster Security Considerations +@section Security Considerations + +@cindex security, on a cluster +On an HPC cluster, Guix is typically used to manage scientific software. +Security-critical software such as the operating system kernel and +system services such as @code{sshd} and the batch scheduler remain under +control of sysadmins. + +The Guix project has a good track record delivering security updates in +a timely fashion (@pxref{Security Updates,,, guix, GNU Guix Reference +Manual}). To get security updates, users have to run @code{guix pull && +guix upgrade}. + +Because Guix uniquely identifies software variants, it is easy to see if +a vulnerable piece of software is in use. For instance, to check whether +the glibc@ 2.25 variant without the mitigation patch against +``@uref{https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt,Stack +Clash}'', one can check whether user profiles refer to it at all: + +@example +guix gc --referrers /gnu/store/…-glibc-2.25 +@end example + +This will report whether profiles exist that refer to this specific +glibc variant. + + @c ********************************************************************* @node Acknowledgments @chapter Acknowledgments @@ -3656,8 +4049,10 @@ information on these fine people. The @file{THANKS} file lists people who have helped by reporting bugs, taking care of the infrastructure, providing artwork and themes, making suggestions, and more---thank you! -This document includes adapted sections from articles that have previously -been published on the Guix blog at @uref{https://guix.gnu.org/blog}. +This document includes adapted sections from articles that have +previously been published on the Guix blog at +@uref{https://guix.gnu.org/blog} and on the Guix-HPC blog at +@uref{https://hpc.guix.info/blog}. @c ********************************************************************* -- cgit v1.2.3