Effective methods for improving naive Bayes text classifiers

Sang Bum Kim, Hae-Chang Rim, Dong Suk Yook, Heui Seok Lim

Research output: Chapter in Book/Report/Conference proceedingConference contribution

65 Citations (Scopus)

Abstract

Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.

Original languageEnglish
Title of host publicationPRICAI 2002
Subtitle of host publicationTrends in Artificial Intelligence - 7th Pacific Rim International Conference on Artificial Intelligence, Proceedings
EditorsMitsuru Ishizuka, Abdul Sattar
PublisherSpringer Verlag
Pages414-423
Number of pages10
ISBN (Print)3540440380, 9783540440383
DOIs
Publication statusPublished - 2002
Event7th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2002 - Tokyo, Japan
Duration: 2002 Aug 182002 Aug 22

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2417
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other7th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2002
Country/TerritoryJapan
CityTokyo
Period02/8/1802/8/22

Bibliographical note

Publisher Copyright:
© Springer-Verlag Berlin Heidelberg 2002.

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Effective methods for improving naive Bayes text classifiers'. Together they form a unique fingerprint.

Cite this