substantially expand the mediawiki tip with some of the steps. More to come

author: Jon Dowland <jon@alcopop.org> 2009-10-16 10:44:14 +0100
committer: Jon Dowland <jon@alcopop.org> 2009-10-16 10:44:14 +0100
commit: 0a2e4e167dc0a6b9fc0b038c4174694117b74628 (patch)
tree: 02fd2d9039ccadd737a156505dbdcb8a63d504ef /doc/tips/convert_mediawiki_to_ikiwiki.mdwn
parent: ab68f96494409e4ce8f689b5c27ef9ea3a73172c (diff)
download: ikiwiki-0a2e4e167dc0a6b9fc0b038c4174694117b74628.tar
ikiwiki-0a2e4e167dc0a6b9fc0b038c4174694117b74628.tar.gz
1 files changed, 97 insertions, 4 deletions
diff --git a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn
index f03703b46..c522eaec3 100644
--- a/doc/tips/convert_mediawiki_to_ikiwiki.mdwn
+++ b/doc/tips/convert_mediawiki_to_ikiwiki.mdwn
@@ -1,4 +1,97 @@
-[[sabr]] explains how to [import MediaWiki content into
-git](http://u32.net/Mediawiki_Conversion/index.html?updated), including
-full edit hostory. The [[plugins/contrib/mediawiki]] plugin can then be
-used by ikiwiki to build the wiki.
+Mediawiki is a dynamically-generated wiki which stores it's data in a
+relational database. Pages are marked up using a proprietary markup. It is
+possible to import the contents of a Mediawiki site into an ikiwiki,
+converting some of the Mediawiki conventions into Ikiwiki ones.
+
+The following instructions describe ways of obtaining the current version of
+the wiki. We do not yet cover importing the history of edits.
+
+## Step 1: Getting a list of pages
+
+The first bit of information you require is a list of pages in the Mediawiki.
+There are several different ways of obtaining these.
+
+### Parsing the output of `Special:Allpages`
+
+Mediawikis have a special page called `Special:Allpages` which list all the
+pages for a given namespace on the wiki.
+
+If you fetch the output of this page to a local file with something like
+
+    wget -q -O tmpfile 'http://your-mediawiki/wiki/Special:Allpages'
+
+You can extract the list of page names using the following python script. Note
+that this script is sensitive to the specific markup used on the page, so if
+you have tweaked your mediawiki theme a lot from the original, you will need
+to adjust this script too:
+
+    from xml.dom.minidom import parse, parseString
+    
+    dom = parse(argv[1])
+    tables = dom.getElementsByTagName("table")
+    pagetable = tables[-1]
+    anchors = pagetable.getElementsByTagName("a")
+    for a in anchors:
+        print a.firstChild.toxml().\
+            replace('&amp;,'&').\
+            replace('&lt;','<').\
+            replace('&gt;','>')
+
+Also, if you have pages with titles that need to be encoded to be represented
+in HTML, you may need to add further processing to the last line.
+
+### Querying the database
+
+If you have access to the relational database in which your mediawiki data is
+stored, it is possible to derive a list of page names from this.
+
+## Step 2: fetching the page data
+
+Once you have a list of page names, you can fetch the data for each page.
+
+### Method 1: via HTTP and `action=raw`
+
+You need to create two derived strings from the page titles already: the
+destination path for the page and the source URL. Assuming `$pagename` 
+contains a pagename obtained above, and `$wiki` contains the URL to your
+mediawiki's `index.php` file:
+
+    src=`echo "$pagename" | tr ' ' _ | sed 's,&,&amp;,g'`
+    dest=`"$pagename" | tr ' ' _ | sed 's,&,__38__,g'`
+    
+    mkdir -p `dirname "$dest"`
+    wget -q "$wiki?title=$src&action=raw" -O "$dest"
+
+### Method 2: via HTTP and `Special:Export`
+
+Mediawiki also has a special page `Special:Export` which can be used to obtain
+the source of the page and other metadata such as the last contributor, or the
+full history, etc.
+
+You need to send a `POST` request to the `Special:Export` page. See the source
+of the page fetched via `GET` to determine the correct arguments.
+
+You will then need to write an XML parser to extract the data you need from
+the result.
+
+### Method 3: via the database
+
+It is possible to extract the page data from the database with some
+well-crafted queries.
+
+## Step 2: format conversion
+
+The next step is to convert Mediawiki conventions into Ikiwiki ones. These
+include
+
+ * convert Categories into tags
+ * ...
+
+## External links
+
+[[sabr]] used to explain how to [import MediaWiki content into
+git](http://u32.net/Mediawiki_Conversion/index.html?updated), including full
+edit history, but as of 2009/10/16 that site is not available.
+
+The [[plugins/contrib/mediawiki]] plugin can then be used by ikiwiki to build
+the wiki.
author	Jon Dowland <jon@alcopop.org>	2009-10-16 10:44:14 +0100
committer	Jon Dowland <jon@alcopop.org>	2009-10-16 10:44:14 +0100
commit	0a2e4e167dc0a6b9fc0b038c4174694117b74628 (patch)
tree	02fd2d9039ccadd737a156505dbdcb8a63d504ef /doc/tips/convert_mediawiki_to_ikiwiki.mdwn
parent	ab68f96494409e4ce8f689b5c27ef9ea3a73172c (diff)
download	ikiwiki-0a2e4e167dc0a6b9fc0b038c4174694117b74628.tar ikiwiki-0a2e4e167dc0a6b9fc0b038c4174694117b74628.tar.gz