aboutsummaryrefslogtreecommitdiff
path: root/doc/todo/git-annex_support.mdwn
blob: f1a4e787c59b912aa2f64b596a7d2630adcacb81 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
A dear [[wishlist]] which would resolve [[this question|forum/ikiwiki_and_big_files]]: ikiwiki should support git-annex repositories.

I am not sure how this would work, but from my POV, it should do a `git annex get` when new commits are pushed to its bare repo. This would assume, of course, that there's another repo somewhere that ikiwiki has access to, which works for HTTP-style remotes, but could be more problematic for SSH remotes that require a key.

Another solution would be to make ikiwiki a remote itself and allow users to push big files to it. The only problem I see with this is those files would end up in the bare repository and not necessarily show up in the web rendering. Ideally, a big file pushed would be hardlinked between the two repos, but it seems [git-annex doesn't support that yet](http://git-annex.branchable.com/todo/wishlist:_use_hardlinks_for_local_clones). --[[anarcat]]

> One technical problem with this is that ikiwiki doesn't allow symlinks
> for [[security]], but git-annex relies on symlinks (unless you're in
> direct mode, but I'm not sure that's really desirable here).
> I'd like to make symlinks possible without compromising security,
> but it'll be necessary to be quite careful. --[[smcv]]

First implementation
====================

So as the [[discussion]] shows, it seems it's perfectly possible to actually do this! There's this [gallery site](http://stockholm.kalleswork.net) which uses the [[plugins/contrib/album]] plugin and git-annex to manage its files.

The crucial steps are:

 1. setup a git annex remote in `$srcdir`

 2. configure direct mode because ikiwiki ignores symlinks for [[security]] reasons:

        cd $srcdir
        git annex init
        git annex direct

 3. configure files to be considered by git-annex (those will be not committed into git directly):

        git config annex.largefiles 'largerthan=100kb and not (include=*.mdwn or include=*.txt)'

 4. make the bare repository (the remote of `$srcdir`) ignored by git-annex:

        cd $srcdir
        git config remote.origin.annex-ignore true
        git config remote.origin.annex-sync false

    (!) This needs to be done on *ANY* clone of the repository, which is annoying, but it's important because we don't want to see git-annex stuff in the bare repo. (why?)

 5. deploy the following crappy plugin to make commits work again and make sure the right files are added in git-annex:

[[!format perl """
#!/usr/bin/perl
package IkiWiki::Plugin::gitannex;

use warnings;
use strict;
use IkiWiki 3.00;

sub import {
        hook(type => "getsetup", id => "gitannex", call => \&getsetup);
	hook(type => "savestate", id => "gitannex", call => \&rcs_commit);
        # we need to handle all rcs commands maybe?
}

sub getsetup () {
        return
                plugin => {
                        safe => 1, # rcs plugin
                        rebuild => undef,
                        section => "misc",
                },
}

# XXX: we want to copy or reuse safe_git

sub rcs_commit (@) {
    chdir $config{srcdir};
    `git annex add --auto`;
    `git annex sync`;
}

sub rcs_commit_staged (@) {
    rcs_commit($@);
}

1
"""]]
This assumes you know what `srcdir`, `repository` and so on mean, if you forgot (like me), see this reference: [[rcs/git/]].


What doesn't work
-----------------

 * the above plugin is kind of flaky and ugly.
 * it's not an RCS plugin, but probably should be, replacing the git plugin, because really: git doesn't work at all anymore at this point

What remains to be clarified
----------------------------

 * how do files get pushed to the `$srcdir`? Only through the web interface?
 * why do we ignore the bare repository?

See the [[discussion]] for a followup on that. --[[anarcat]]

Alternative implementation
==========================

An alternative implementation, which remains to be detailed but is mentionned in [[forum/ikiwiki_and_big_files]], is to use the [[underlay]] feature combined with the `hardlink` option to deploy the git-annex'd files. Then git-annex is separate from the base ikiwiki git repo. See also [[tips/Ikiwiki_with_git-annex__44___the_album_and_the_underlay_plugins]] for an example.

Also note that ikiwiki-hosting has a [patch waiting](https://ikiwiki-hosting.branchable.com/todo/git-annex_support) to allow pushes to work with git-annex. This could potentially be expanded to sync content to the final checkout properly, avoiding some of the problems above (esp. wrt to non-annex bare repos).

Combined with the [[underlay]] feature, this could work very nicely indeed... --[[anarcat]]

Here's an attempt:

<pre>
cd /home/user
git clone source.git source.annex
cd source.annex
git annex direct
cd ../source.git
git annex group . transfer
git remote add annex ../source.annex
git annex sync annex
</pre>

Make sure the `hardlink` setting is enabled, and add the annex as an underlay, in `ikiwiki.setup`:

<pre>
hardlink: 1
add_underlays:
- /home/w-anarcat/source.annex
</pre>

Then moving files to the underlay is as simple as running this command in the bare repo:

<pre>
#!/bin/sh

echo "moving big files to annex repository..."
git annex move --to annex
</pre>

I have added this as a hook in `$HOME/source.git/hooks/post-receive` (don't forget to `chmod +x`).

The problem with the above is that the underlay wouldn't work: for some reason it wouldn't copy those files in place properly. Maybe it's freaking out because it's a full copy of the repo... My solution was to make the source repository itself a direct repo, and then add it as a remote to the bare repo. --[[anarcat]]

Back from the top
=================

Obviously, the final approach of making the `source` repository direct mode will fail because ikiwiki will try to commit files there from the web interface which will fail (at best) and (at worst) add big files into git-annex (or vice-versa, not sure what's worse actually).

Also, I don't know how others here made the underlay work, but it didn't work for me. I think it's because in the "source" repository, there are (dead) symlinks for the annexed files. This overrides the underlay, because of [[security]] - although I am unclear as to why this is discarded so early. So in order to make the original idea above work properly (ie. having a separate git-annex repo in direct mode) work, we must coerce ikiwiki into tolerating symlinks in the srcdir a little more:

<pre>
diff --git a/IkiWiki.pm b/IkiWiki.pm
index 1043ef4..949273c 100644
--- a/IkiWiki.pm
+++ b/IkiWiki.pm
@@ -916,11 +916,10 @@ sub srcfile_stat {
        my $file=shift;
        my $nothrow=shift;

-       return "$config{srcdir}/$file", stat(_) if -e "$config{srcdir}/$file";
-       foreach my $dir (@{$config{underlaydirs}}, $config{underlaydir}) {
-               return "$dir/$file", stat(_) if -e "$dir/$file";
+       foreach my $dir ($config{srcdir}, @{$config{underlaydirs}}, $config{underlaydir}) {
+               return "$dir/$file", stat(_) if (-e "$dir/$file" && ! -l "$dir/$file");
        }
-       error("internal error: $file cannot be found in $config{srcdir} or underlay") unless $nothrow;
+       error("internal error: $file cannot be found in $config{srcdir} or underlays @{$config{underlaydirs}} $config{underlaydir}") unless $nothrow;
        return;
 }

diff --git a/IkiWiki/Render.pm b/IkiWiki/Render.pm
index 9d6f636..e0b4cf8 100644
--- a/IkiWiki/Render.pm
+++ b/IkiWiki/Render.pm
@@ -337,7 +337,7 @@ sub find_src_files (;$$$) {

                if ($underlay) {
                        # avoid underlaydir override attacks; see security.mdwn
-                       if (! -l "$abssrcdir/$f" && ! -e _) {
+                       if (1 || ! -l "$abssrcdir/$f" && ! -e _) {
                                if (! $pages{$page}) {
                                        push @files, $f;
                                        push @IkiWiki::underlayfiles, $f;
</pre>

<del>Now obviously this patch is incomplete: I am not sure we actually avoid the attack, ie. i am not sure the check in `srcdir()` is sufficient to remove completely the check in `find_src_files()`.</del>

After reviewing the code further, it seems that `find_src_files` in three places in ikiwiki:

<pre>
../IkiWiki/Render.pm:421:	find_src_files(1, \@files, \%pages);
../IkiWiki/Render.pm:846:		($files, $pages)=find_src_files();
../po/po2wiki:18:my ($files, $pages)=IkiWiki::find_src_files();
</pre>

The first occurence is in `IkiWiki::Render::process_changed_files`, where it is used mostly for populating `@IkiWiki::underlayfiles`, the only side effect of 
`find_src_files`. The second occurence is in `IkiWiki::Render::refresh`. There things are a little more complicated (to say the least) and a lot of stuff happens. To put it in broad terms, first it does a `IkiWiki::Render::scan` and then a `IkiWiki::Render::render`. The last two call `srcfile()` appropriately (where i put an extra symlink check), except for  `will_render()` in `scan`, which I can't figure out right now and that seems to have a lot of global side effects. It still looks fairly safe at first glance. The `rcs_get_current_rev`, `refresh`, `scan` and `rendered` hooks are also called in there, but I assume those to be safe, since they are called with sanitized values already.

The patch does work: the files get picked up from the underlay and properly hardlinked into the target `public_html` directory! So with the above patch, then the following hook in `source.git/hooks/post-receive`:

<pre>
#!/bin/sh

OLD_GIT_DIR="$GIT_DIR"
unset GIT_DIR
echo "moving big files to annex repository..."
git annex copy --to annex
git annex sync annex
</pre>

(I am not sure anymore why GIT_DIR is necessary, but I remember it destroyed all files in my repo because git-annex synced against the `setup` branch in the parent directory. fun times.)

Then the `annex` repo is just a direct clone of the source.git:

<pre>
cd /home/user
git clone --shared source.git annex
cd annex
git annex direct
cd ../source.git
git remote add annex ../annex
</pre>

And we need the following config:

<pre>
hardlink: 1
add_underlays:
- /home/w-anarcat/annex
add_plugins:
- underlay
</pre>

... and the `ikiwiki-hosting` patch mentionned earlier to allow git-annex-shell to run at all. Also, the `--shared` option will [make git-annex use hardlinks itself between the two repos](https://git-annex.branchable.com/todo/wishlist:_use_hardlinks_for_local_clones/), so the files will be available for download as well. --[[anarcat]]

> <del>...aaaand this doesn't work anymore. :( i could have sworn this was working minutes ago, but for some reason the annexed files get skipped again now. :(</del> Sorry for the noise, the annex repo wasn't in direct mode - the above works! --[[anarcat]]

This [[!taglink patch]] still applies - anything else I should be doing here to try to get this fixed? A summary maybe? --[[anarcat]]

> Sorry, I don't have the mental bandwidth at the moment to work through the
> implications of this change. I know you want this feature, I know it's an
> attractive solution to several use cases, and git annex support is in the
> queue, but at right now I'm still trying to deal with mitigating
> CVE-2016-3714, and the last thing I want to do is merge new security
> risks. --[[smcv]]

> > No problem at all, glad that you still have that in the queue, and I hope
> > my work was somewhat useful in pushing this forward! Thanks for taking
> > care of the Imagetragick situation... :/ --[[anarcat]]