Journal of Management Information Systems

Volume 19 Number 4 2003 pp. 191-212

Generating and Browsing Multiple Taxonomies Over a Document Collection

Spangler, Scott, Kreulen, Jeffrey T, and Lessler, Justin

ABSTRACT: We present a novel system and methodology for generating and then browsing multiple taxonomies over a document collection. Taxonomies are generated using a broad set of capabilities, including meta data, key word queries, and automated clustering techniques that serve as a seed taxonomy. The taxonomy editor, eClassifier, provides powerful tools to visualize and edit each taxonomy to make it reflective of the desired theme. Cluster validation tools allow the editor to verify that documents received in the future can be automatically classified into each taxonomy with sufficiently high accuracy. In general, those seeking knowledge from a document collection may have only a vague notion of exactly what they are attempting to understand, and would like to explore related topics and concepts rather than simply being given a set of documents. For this purpose, we have developed MindMap, an interface utilizing multiple taxonomies and the ability to interact with a document collection.

Key words and phrases: data mining, document classification, document clustering techniques, knowledge management, navigation, taxonomy, text mining, visualization