The zip archive contains the article “Artificial intelligence” from English Wikipedia.

We used WikiExtractor to extract the text ai.en.txt from the original MediaWiki article (in XML format). The part-of-speech tags and dependency trees were annotated by Stanford CoreNLP. This is a rough explanation of how to obtain ai.en.txt and ai.en.txt.json.

# Extract text from ai.en.xml (the article in XML format).
$ python ai.en.xml
# Prepare ai.en.txt by editing the output from the tool, e.g., text/AA/wiki_00

# Apply Stanford CoreNLP to ai.en.txt
$ ./ -annotators tokenize,ssplit,pos,lemma,depparse -outputFormat json -file ai.en.txt

These files are distributed under the term of Creative Commons Attribution-ShareAlike 3.0 Unported.