Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations

Tian Shi, Kyeongpil Kang, Jaegul Choo, Chandan K. Reddy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

110 Citations (Scopus)

Abstract

Being a prevalent form of social communications on the Internet, billions of short texts are generated everyday. Discovering knowledge from them has gained a lot of interest from both industry and academia. The short texts have a limited contextual information, and they are sparse, noisy and ambiguous, and hence, automatically learning topics from them remains an important challenge. To tackle this problem, in this paper, we propose a semantics-assisted non-negative matrix factorization (SeaNMF) model to discover topics for the short texts. It effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of the corpus. The SeaNMF model is solved using a block coordinate descent algorithm. We also develop a sparse variant of the SeaNMF model which can achieve a better model interpretability. Extensive quantitative evaluations on various real-world short text datasets demonstrate the superior performance of the proposed models over several other state-of-the-art methods in terms of topic coherence and classification accuracy. The qualitative semantic analysis demonstrates the interpretability of our models by discovering meaningful and consistent topics. With a simple formulation and the superior performance, SeaNMF can be an effective standard topic model for short texts.

Original languageEnglish
Title of host publicationThe Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW 2018
PublisherAssociation for Computing Machinery, Inc
Pages1105-1114
Number of pages10
ISBN (Electronic)9781450356398
DOIs
Publication statusPublished - 2018 Apr 10
Event27th International World Wide Web, WWW 2018 - Lyon, France
Duration: 2018 Apr 232018 Apr 27

Publication series

NameThe Web Conference 2018 - Proceedings of the World Wide Web Conference, WWW 2018

Conference

Conference27th International World Wide Web, WWW 2018
Country/TerritoryFrance
CityLyon
Period18/4/2318/4/27

Keywords

  • Non-negative matrix factorization
  • Short texts
  • Topic modeling
  • Word embedding

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations'. Together they form a unique fingerprint.

Cite this