Commit Briefs

1c74bb2758 Matthias L. Jugel

cleanup of libraries and dependencies (master)


f9873d7354 Matthias L. Jugel

handling of random length text lines from wikipedia dumps


f942969bc9 Matthias L. Jugel

initial commit


Branches

Tags

This repository contains no tags

Tree

READMEcommits | blame
pom.xmlcommits | blame
src/

README

This simple project can be used to convert wikipedia dumps to plain text.

usage: java -Xmx2G -Dfile.encoding=UTF-8 -jar wiki2text-1.0-jar-with-dependencies.jar nlwiki-20120203-pages-articles.xml.bz2 > nl.txt