doc/todo/multi-thread_ikiwiki.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89

[[!tag wishlist]]

[My ikiwiki instance](http://www.ipol.im/) is quite heavy. 674M of data in the source repo, 1.1G in its .git folder.
Lots of \[[!img ]] (~2200), lots of \[[!teximg ]] (~2700). A complete rebuild takes 10 minutes.

We could use a big machine, with plenty of CPUs. Could some multi-threading support be added to ikiwiki, by forking out all the external heavy  plugins (imagemagick, tex, ...) and/or by processing pages in parallel?

Disclaimer: I know nothing of the Perl approach to parallel processing.

> I agree that it would be lovely to be able to use multiple processors to speed up rebuilds on big sites (I have a big site myself), but, taking a quick look at what Perl threads entails, and taking into acount what I've seen of the code of IkiWiki, it would take a massive rewrite to make IkiWiki thread-safe - the API would have to be completely rewritten - and then more work again to introduce threading itself.  So my unofficial humble opinion is that it's unlikely to be done.
> Which is a pity, and I hope I'm mistaken about it.
> --[[KathrynAndersen]]

> > I have much less experience with the internals of Ikiwiki, much
> > less Multi-threading perl, but I agree that to make Ikiwiki thread
> > safe and to make the modifications to really take advantage of the
> > threads is probably beyond the realm of reasonable
> > expectations. Having said that, I wonder if there aren't ways to
> > make Ikiwiki perform better for these big cases where the only
> > option is to wait for it to grind through everything. Something
> > along the lines of doing all of the aggregation and dependency
> > heavy stuff early on, and then doing all of the page rendering
> > stuff at the end quasi-asynchronously? Or am I way off in the deep
> > end.
> >
> > From a practical perspective, it seems like these massive rebuild
> > situations represent a really small subset of ikiwiki builds. Most
> > sites are pretty small, and most sites need full rebuilds very
> > very infrequently. In that scope, 10 minute rebuilds aren't that
> > bad seeming. In terms of performance challenges, it's the one page
> > with 3-5 dependency that takes 10 seconds (say) to rebuild that's
> > a larger challenge for Ikiwiki as a whole. At the same time, I'd
> > be willing to bet that performance benefits for these really big
> > repositories for using fast disks (i.e. SSDs) could probably just
> > about meet the benefit of most of the threading/async work.
> >
> > --[[tychoish]]

>>> It's at this point that doing profiling for a particular site would come
>>> in, because it would depend on the site content and how exactly IkiWiki is
>>> being used as to what the performance bottlenecks would be.  For the
>>> original poster, it would be image processing.  For me, it tends to be
>>> PageSpecs, because I have a lot of maps and reports.

>>> But I sincerely don't think that Disk I/O is the main bottleneck, not when
>>> the original poster mentions CPU usage, and also in my experience, I see
>>> IkiWiki chewing up 100% CPU usage one CPU, while the others remain idle.  I
>>> haven't noticed slowdowns due to waiting for disk I/O, whether that be a
>>> system with HD or SSD storage.

>>> I agree that large sites are probably not the most common use-case, but it
>>> can be a chicken-and-egg situation with large sites and complete rebuilds,
>>> since it can often be the case with a large site that rebuilding based on
>>> dependencies takes *longer* than rebuilding the site from scratch, simply
>>> because there are so many pages that are interdependent.  It's not always
>>> the number of pages itself, but how the site is being used.  If IkiWiki is
>>> used with the absolute minimum number of page-dependencies - that is, no
>>> maps, no sitemaps, no trails, no tags, no backlinks, no albums - then one
>>> can have a very large number of pages without having performance problems.
>>> But when you have a change in PageA affecting PageB which affects PageC,
>>> PageD, PageE and PageF, then performance can drop off horribly.  And it's a
>>> trade-off, because having features that interlink pages automatically is
>>> really nifty ad useful - but they have a price.

>>> I'm not really sure what the best solution is.  Me, I profile my IkiWiki builds and try to tweak performance for them... but there's only so much I can do.
>>> --[[KathrynAndersen]]

>>>> IMHO, the best way to get a multithreaded ikiwiki is to rewrite it
>>>> in haskell, using as much pure code as possible. Many avenues
>>>> then would open up to taking advantage of haskell's ability to
>>>> parallize pure code.
>>>>
>>>> With that said, we already have some nice invariants that could be
>>>> used to parallelize page builds. In particular, we know that
>>>> page A never needs state built up while building page B, for any
>>>> pages A and B that don't have a dependency relationship -- and ikiwiki
>>>> tracks such dependency relationships, although not currently in a form
>>>> that makes it very easy (or fast..) to pick out such groups of
>>>> unrelated pages.
>>>> 
>>>> OTOH, there are problems.. building page A can result in changes to
>>>> ikiwiki's state; building page B can result in other changes. All
>>>> such changes would have to be made thread-safely. And would the
>>>> resulting lock contention result in a program that ran any faster
>>>> once parallelized?
>>>> 
>>>> Which is why [[rewrite_ikiwiki_in_haskell]], while pretty insane, is
>>>> something I keep thinking about. If only I had a spare year..
>>>> --[[Joey]]