Data dumps/xml2sql
This page is kept for historical interest. Any policies mentioned may be obsolete. If you want to revive the topic, you can use the talk page or start a discussion on the community forum. |
NOTE: This is not the recommended method of importing XML dumps.
See mw:Manual:Importing XML dumps for an overview.
xml2sql is a tool to convert xml dumps which can be download at http://download.wikimedia.org/ to sqldump which can be imported with mysql, mysqlimport or psql.
This tool is written in ANSI C. To compile it, expat and zlib are required. This tool has been developed on Linux, it also works on FreeBSD, NetBSD, MacOS X, and Windows. Feel free to use it. :)
Download
[edit]- xml2sql-0.5.tar.gz (source code) 2006-02-08
- MD5SUM: 8a1d905636900e3ea07055dd645276f8
- SHA1SUM: ad4ccb37ccbef1a682a86e4b929b43ac0f578744
- xml2sql-0.5-win32.zip (win32 executable) 2006-02-08
- MD5SUM: 9665424dc6d6f5abf6241298e727a5a3
- SHA1SUM: 403bc96a1f679259bcd904f7c9c9bae92252a266
- GitHub: mediawiki-xml2sql
patch for recent versions of mw (>=1.10)
[edit]because the revision table contains two new datasets since 1.10 (rev_len, rev_parent_id) the xml slightly changed. apply this patch to make it work again:
--- xml2sql-0.5/xml2sql.c 2008-01-16 15:32:28.000000000 +0100
+++ xml2sql-0.5 (2)/xml2sql.c 2008-02-17 15:06:34.000000000 +0100
@@ -741,6 +741,10 @@
putcolumnf(&rev_tbl, "%d", revision.minor);
/* rev_deleted */
putcolumn(&rev_tbl, "0", 0);
+
+ putcolumn(&rev_tbl, "NULL", 0);
+ putcolumn(&rev_tbl, "NULL", 0);
+
finrecord(&rev_tbl);
if(page.lastts == 0 || strcmp(page.lastts, revision.timestamp) < 0) {
Install
[edit]*nix, MacOS
[edit]The source package contains standard `configure' script. Just expand the package and make. (On *BSD, you may add --with-expat=/usr/local option to configure.)
(you need on debian/etch : gcc, libc6-dev, expat, libexpat1-dev)
$ ./configure
$ make
# make install
Windows
[edit]Win32 executable is now available. Download it and unzip.
Easy to use
[edit]$ wget http://download.wikimedia.org/enwiki/20080103/enwiki-20080103-pages-meta-current.xml.bz2 $ bunzip2 -c pages-meta-current.xml.bz2 | xml2sql $ mysqlimport -u root -p --local dbname `pwd`/{page,revision,text}.txt
Note: This last line might not work. The database needs to be initialized with the correct tables. The way to do this is to install the Mediawiki software before doing the import.
Windows
[edit]The GUI frontend can decompress gzip, bzip2 and 7-zip archive. Run xml2sql-fe.exe, choose XML file, choose option, optionally choose output directory, and then press "START!!" button.
Reference
[edit]- usage: xml2sql [options]... [XMLFILE]
Input MediaWiki XML dumpfile from XMLFILE (or standard input), output SQL dump for MediaWiki 1.5 or later.
Options
[edit]-i, --import | mysqlimport format. (default) Output filenames are page.txt, revision.txt, and text.txt. You can use mysqlimport program to import this format. |
---|---|
-m, --mysql | MySQL's INSERT format. Output filenames are page.sql, revision.sql, and text.sql. You can use mysql program to import this format. |
-p, --postgresql[=version] | PostgreSQL's COPY format. Output filenames are page.sql, revision.sql, and text.sql. If the version is omitted, 8.0 and earlier is assumed. You can use psql program to import this format. |
-c, --compress[={old,full}] | Compress text table with deflate. (default: old) When output format is postgresql, this option is ignored because PostgreSQL will compress table data itself. |
-r, --renumber | Renumber page id and revision id. |
-N, --namespace=ns,ns,... | Output only specific namespaces. Namespaces can be specified by both namespace number and namespace name. |
-t, --no-text | Exclude text table |
-o, --output-dir=OUTDIR | Specifies output directory (default: current directory) |
-t, --tmpdir=TMPDIR | Specifies temporary directory (default: OUTDIR) Temporary file is used only if --compress=old. |
-v, --verbose | Show progress |
-h, --help | Display help and exit |
--version | Display version information and exit |
COPYRIGHT
[edit]xml2sql, MediaWiki XML to SQL converter.
Copyright © Tietew.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
See also
[edit]- Data dumps - database dump download and import.