conText Experimenting with Lucene 'MoreLikeThis' term extraction and Wikipedia as a URI-based controlled vocabulary

Wikipedia-powered content tagger

Add a paragraph or more of text to the box below, and then hit submit to get a list of Concepts (represented unambiguously by Wikipedia URIs) related to that text... (BTW, there's a bug with double and single quotes, so best if you strip quotes out of your text til that gets sorted...)

optional -- Use the text above to disambiguate a specific concept:
add term extraction heuristic (to find more proper names): no yes
Find plausibly canonical URIs: no yes
Sort Concepts by: Relevance Alphabetical Order
Return XML doc?: No Yes

version of the Wikipedia dump in use here is from circa September 2007 -- i'll try to update to a more recent dump soon...

chris sizemore made this, with advice, input, and inspiration from many -- including rachel lovinger, silver oliver, & matt wood

questions, comments, constructive criticism? email me at onpauseATaolDOTcom