Tree
- Tree:
67fab2e2b4ecea81f965f83be61e6646b6da677d
- Date:
- Message:
- handling of random length text lines from wikipedia dumps
README | commits | blame |
pom.xml | commits | blame |
src/ |
README
This simple project can be used to convert wikipedia dumps to plain text. usage: java -Xmx2G -Dfile.encoding=UTF-8 -jar wiki2text-1.0-jar-with-dependencies.jar nlwiki-20120203-pages-articles.xml.bz2 > nl.txt