Tree
- Tree:
67fab2e2b4ecea81f965f83be61e6646b6da677d- Date:
- Message:
- handling of random length text lines from wikipedia dumps
| README | commits | blame |
| pom.xml | commits | blame |
| src/ | |
README
This simple project can be used to convert wikipedia dumps to plain text. usage: java -Xmx2G -Dfile.encoding=UTF-8 -jar wiki2text-1.0-jar-with-dependencies.jar nlwiki-20120203-pages-articles.xml.bz2 > nl.txt
