ai.en.txt: the text extracted from the Wikipedia article
ai.en.txt.json: the text annotated with dependency trees (in JSON format)
We used WikiExtractor to extract the text
ai.en.txt from the original MediaWiki article (in XML format). The part-of-speech tags and dependency trees were annotated by Stanford CoreNLP. This is a rough explanation of how to obtain
# Extract text from ai.en.xml (the article in XML format). $ python WikiExtractor.py ai.en.xml # Prepare ai.en.txt by editing the output from the tool, e.g., text/AA/wiki_00 # Apply Stanford CoreNLP to ai.en.txt $ ./corenlp.sh -annotators tokenize,ssplit,pos,lemma,depparse -outputFormat json -file ai.en.txt
These files are distributed under the term of Creative Commons Attribution-ShareAlike 3.0 Unported.