aboutsummaryrefslogtreecommitdiff
path: root/README.org
blob: 567c74502a0485f870b3abdf6b18ff8a00af1210 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
-*- mode: org -*-

This is a utility for managing a collection of nars (normalized
archives, in the context of Guix) along with the corresponding narinfo
files which contain some signed metadata.

While tasks like publishing local store items as nars is easy with
tools like =guix publish=, =nar-herder= is aimed at enabling serving
the same collection of nars from multiple machines at once, including
moving the nars from machine to machine according to different
criteria.

A reverse proxy (like nginx) should be used for the actual serving of
the nars, as well as handling proxying the requests to =nar-herder=.

* Design

This utility was designed to help manage a collection of nars from a
substitute server. It can help move the nars between machines, as well
as assist in setting up machines to serve the nars (mirrors).

Both these tasks can be accomplished without and specialised
tooling. For example, rsync can be used to move nars between machines,
and there are many tools for setting up reverse proxies which function
as mirrors.

Even though this is the case, I think there are a few reasons why I
think some value can be added by the nar-herder.

Firstly, storing the narinfo information in a SQLite database can
facilitate things like tagging the narinfos and doing garbage
collection like tasks. Plus, because the narinfos are quite small, I
believe storing them in a database is actually more performant and
efficient than storing them as files on the filesystem, even with the
duplication that comes with the database schema being used. It's also
easier to copy all the narinfos between machines when you can download
a single "database", rather than copying the files individually.

Secondly, while tools like NGinx work great as a reverse proxy for the
nar files, proxying the requests for the narinfo files can be
problematic, particularly when caching is involved. Especially in
cases like using guix weather, lots of requests for narinfos are made
in quick succession. Through using the nar-herder to respond to
narinfo requests from the database, the performance can be improved.

** Example uses

*** Mirroring

In this example, foo.example.com is a substitute server using the
nar-herder. We want to setup a mirror, mirror.example.com. To do this,
NGinx will be used as caching reverse proxy, and the nar-herder will
be used to serve the narinfo's.

Run the nar-herder like:

#+BEGIN_SRC sh
  nar-herder run-server --mirror=https://foo.example.com
#+END_SRC

When run for the first time, the nar-herder will download the database
from foo.example.com and then apply any recent changes (new or removed
nars). Once this has happened, it will periodically check for changes
and apply them.

Then, configure NGinx to reverse proxy the \star.narinfo requests to
the nar-herder (by default it listens on port 8080), and the
/nar/\star requests to foo.example.com. By adding caching, you can
improve the performance for frequently requested files.

*** Moving nars between machines

Like in the previous example, foo.example.com is a substitute server
using the nar-herder. This time though, we only want it to store some
of the nars, all of them will be stored on storage.example.com, and
foo.example.com will reverse proxy and requests for nars it doesn't
have to storage.example.com.

Looking first at the nar-herder configuration for foo.example.com, the
important options are the storage limit and storage nar removal
criteria. The storage limit is the limit in bytes that the storage
directory should not exceed. By setting it to 0, we're saying that the
storage directory should be empty. To delete a nar though, the storage
nar removal criteria must be met. In this case, it says the nar must
be stored on storage.example.com. When looking for nars to delete, the
nar-herder on foo.example.com will query storage.example.com to check
if the nar-herder there is storing the files.

#+BEGIN_SRC sh
  nar-herder run-server --storage=/var/lib/nars --storage-limit=0 --storage-nar-removal-criteria=stored-on=https://storage.example.com
#+END_SRC

On the storage.example.com side, this is similar to the previous
mirror example, but because we want the nar-herder to actually
download and store the nars from foo.example.com, we set a storage
directory. Note that this will currently just keep downloading nars
until they've either all been downloaded, or there's no more disk
space. Like on foo.example.com, you can set a --storage-limit to
prevent this.

#+BEGIN_SRC sh
  nar-herder run-server --mirror=https://foo.example.com --storage=/var/lib/nars
#+END_SRC