Nonnegative matrix factorization for interactive topic modeling and document clustering

D. da Kuang, Jaegul Choo, Haesun Park

Research output: Chapter in Book/Report/Conference proceedingChapter

80 Citations (Scopus)

Abstract

Nonnegative matrix factorization (NMF) approximates a nonnegative matrix by the product of two low–rank nonnegative matrices. Since it gives semantically meaningful result that is easily interpretable in clustering applications, NMF has been widely used as a clustering method especially for document data, and as a topic modeling method. We describe several fundamental facts of NMF and introduce its optimization framework called block coordinate descent. In the context of clustering, our framework provides a flexible way to extend NMF such as the sparse NMF and the weakly–supervised NMF. The former provides succinct representations for better interpretations while the latter flexibly incorporate extra information and user feedback in NMF, which effectively works as the basis for the visual analytic topic modeling system that we present. Using real–world text data sets, we present quantitative experimental results showing the superiority of our framework from the following aspects: fast convergence, high clustering accuracy, sparse representation, consistent output, and user interactivity. In addition, we present a visual analytic system called UTOPIAN (User–driven Topic modeling based on Interactive NMF) and show several usage scenarios. Overall, our book chapter cover the broad spectrum of NMF in the context of clustering and topic modeling, from fundamental algorithmic behaviors to practical visual analytics systems.

Original languageEnglish
Title of host publicationPartitional Clustering Algorithms
PublisherSpringer International Publishing
Pages215-243
Number of pages29
ISBN (Electronic)9783319092591
ISBN (Print)9783319092584
DOIs
Publication statusPublished - 2015 Jan 1

Keywords

  • Block coordinate descent
  • Document clustering
  • Interactive visual analytics
  • Nonnegative matrix factorization
  • Topic modeling

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint

Dive into the research topics of 'Nonnegative matrix factorization for interactive topic modeling and document clustering'. Together they form a unique fingerprint.

Cite this