Abstract
Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.
Original language | English |
---|---|
Title of host publication | PRICAI 2002 |
Subtitle of host publication | Trends in Artificial Intelligence - 7th Pacific Rim International Conference on Artificial Intelligence, Proceedings |
Editors | Mitsuru Ishizuka, Abdul Sattar |
Publisher | Springer Verlag |
Pages | 414-423 |
Number of pages | 10 |
ISBN (Print) | 3540440380, 9783540440383 |
DOIs | |
Publication status | Published - 2002 |
Event | 7th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2002 - Tokyo, Japan Duration: 2002 Aug 18 → 2002 Aug 22 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 2417 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Other
Other | 7th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2002 |
---|---|
Country/Territory | Japan |
City | Tokyo |
Period | 02/8/18 → 02/8/22 |
Bibliographical note
Publisher Copyright:© Springer-Verlag Berlin Heidelberg 2002.
ASJC Scopus subject areas
- Theoretical Computer Science
- Computer Science(all)