anonymous@got.thinkberg.com:thinkberg/wiki2text.git
13 years ago 1c74bb2758 Matthias L. Jugel
cleanup of libraries and dependencies (master)
13 years ago f9873d7354 Matthias L. Jugel
handling of random length text lines from wikipedia dumps
13 years ago f942969bc9 Matthias L. Jugel
initial commit
This simple project can be used to convert wikipedia dumps to plain text. usage: java -Xmx2G -Dfile.encoding=UTF-8 -jar wiki2text-1.0-jar-with-dependencies.jar nlwiki-20120203-pages-articles.xml.bz2 > nl.txt