Journal of Management Information Systems

Volume 18 Number 4 2002 pp. 87-100

Contents Matching Defined by Prototypes: Methodology Verification with Books of the Bible

Visa, Ari, Toivonen, Jarmo, Vanharanta, Hannu, and Back, Barbro

ABSTRACT: It is common that text documents are characterized and classified by key words, index terms, or headings. We have developed a new methodology based on prototype matching. The prototype is an interesting document or a part of an extracted, interesting text. This prototype is matched with the existing document database or with the monitored document flow. The claim is that the new methodology is capable of extracting the contents of the document. To verify this hypothesis, a test with the Bible was designed. Different translations in English, Latin, Greek, and Finnish were selected to test materials. Verification tests that included the search of the ten nearest books to every book of the Bible were performed with a designed prototype version of the software application. The test results are reported in this paper.

Key words and phrases: Bible, document classification, knowledge discovery, methodology, prototype matching, text mining, verification