Some effective techniques for naive bayes text classification

Sang Bum Kim, Kyoung Soo Han, Hae Chang Rim

Research output: Contribution to journalArticlepeer-review

399 Citations (Scopus)

Abstract

While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we propose two empirical heuristics: per-document text normalization and feature weighting method. While these are somewhat ad hoc methods, our proposed naive Bayes text classifier performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM.

Original languageEnglish
Pages (from-to)1457-1466
Number of pages10
JournalIEEE Transactions on Knowledge and Data Engineering
Volume18
Issue number11
DOIs
Publication statusPublished - 2006 Nov

Bibliographical note

Funding Information:
This work was partly supported by the JSPS Postdoctoral Fellowship Program and the Okumura Group at Tokyo Institute of Technology. H.-C. Rim was the corresponding author.

Keywords

  • Poisson model
  • Text classification
  • feature weighting
  • naive Bayes classifier

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Some effective techniques for naive bayes text classification'. Together they form a unique fingerprint.

Cite this