posts/software_glue.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125

[[!meta title="Software Glue" ]]

Take any software project, on its own its probably not very useful. First of
all, you probably need a complier or interpreter, something to directly run the
software, or convert the source form (preferred form for editing), to a form
which can be run by the computer.

In addition to this compiler or interpreter, it's very unusual to have software
which does not use other software projects. This might require the availability
of these other projects when compiling, or just at runtime.

So say you write some software, the other bits of software that your users must
have to build it (generate the useful form of the software, from the source
form) are called build dependencies. Any bits of software that are required
when your software is run are called runtime dependencies.

This complexity can make trying to use software a bit difficult... You find
some software on the web, it sounds good, so you download it. First of all, you
need to satisfy all the build dependencies, and their dependencies, and so
on... If you manage to make it this far, you can then actually compile/run the
software. After this, you then need to install all the runtime dependencies,
and their dependencies, and so on... before you can run the software.

This is a rather offputting situation. Making modular software is good
practice, but even adding one direct dependency can add many more indirect
dependencies.

Now there are systems to help with this, but unfortunately I don't think there
is yet a perfect, or even good approach. The above description may make this
seem easy to manage, but many of the systems around fall short.

## Software Packages

Software packages, or just packages for short is a term describing some
software (normally a single software project), in some form (source, binary, or
perhaps both), along with some metadata (information about the software, e.g.
version or contributors).

Packages are the key component of the (poor) solutions discussed below to the
problem of distributing, and using software.

## Debian Packages

[Debian](https://www.debian.org/), "The universal operating system" uses
packages (*.deb's). Debian packages are written as source packages, that can be
built to create binary packages (one source package can make many binary
packages). Debian packages are primarily distributed as binary packages (which
means that the user does not have to install the build dependencies, or spend
time building the package).

Packaging the operating system from the bottom up has its advantages. This
means that Debian can attempt to solve complex issues like bootstrapping
(building all packages from scratch), reproducible builds (making sure
the build process works exactly the same when the time, system name, or other
irrelevant things are different).

Using Debian's packages does have some disadvantages. They only work if you are
installing the package into the operating system. This is quite a big deal,
especially if you are not the owner of the system which you are using. You can
also only install one version of a Debian package on your system. This means
that for some software projects, there are different packages for different
versions (normally different major versions) of the software.

## npm Packages

On the other end of the spectrum, you have package managers like
[npm](https://www.npmjs.com/). This is a language specific package manager for
the JavaScript language. It allows any user to install packages, and you can
install one package several times on your system.

However, npm has no concept of source packages, which means its difficult to
ensure that the software you are using is secure, and that it does what it says
it will. It is also of limited scope (although this is not necessarily bad).

## Something better...

I feel that there must be some middle ground between these two situations.
Maybe involving, one, two, or more separate or interconnected bits of software
that together can provide all the desirable properties.

I think that language specific package managers are only currently good for
development, when it comes to deployment, you often need something that can
manage more of the system.

Also, language specific package managers do not account for dependencies that
cross language boundaries. This means that you cannot really reason about
reproducible builds, or bootstrapping with a language specific package manager.

On the other end of the scale, Debian binary packages are effectively just
archives that you unpack in to the root directory. They assume absolute and
relative paths, which makes them unsuitable for installing elsewhere (e.g. in a
users home directory). This means that it is not possible to use them if you do
not have root access on the system.

## All is not yet lost...

There are some signs of light in the darkness. [Debian's reproducible builds
initiative](https://wiki.debian.org/ReproducibleBuilds) is progressing well. In
the Debian way, this has ramifications for everyone, as an effort will be made
to include any changes made in Debian, in the software projects themselves.

I am also hearing more and more about package managers that seem to be in
roughly the right spot. [Nix](https://nixos.org/nix/) and
[Guix](https://www.gnu.org/software/guix/), although I have used neither both
sound enticing, promising "atomic upgrades and rollbacks, side-by-side
installation of multiple versions of a package, multi-user package management
and easy setup of build environments" (from the Nix homepage). Although with
great power comes great responsibility, performing security updates in Debian
would probably be more complex if there could be multiple installations, of
perhaps versions of an insecure piece of software on a system.

Perhaps some semantic web technologies can play a part. URI's could prove
useful as unique identifiers for software, and software versions. Basic package
descriptions could be written in RDF, using URI's, allowing these to be used by
multiple packaging systems (the ability to have sameAs properties in RDF might
be useful).

At the moment, I am working on Debian packages. I depend on these for most of
my computers. Unfortunately, for some of the software projects I write, it is
not really possible to just depend on Debian packages. For some I have managed
to get by with git submodules, for others I have entered the insane world of
shell scripts which just download the dependencies off the web, sometimes also
using Bower and Grunt.

Needless to say I am always on the look out for ways to improve this situation.