Eliciting Disease Data from Wikipedia Articles
Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. This study presents the use of Wikipedia article content in this sphere. We demonstrate how a named-entity recognizer can be trained to tag case, death, and hospitalization counts in the article text. We also show that there are detailed time series data that are consistently updated that closely align with ground truth data. We argue that Wikipedia can be used to create the first community-driven open-source emerging disease detection, monitoring, and repository system.