Abstract
Central to many text analysis methods is the notion of a concept: a set of semantically related keywords characterizing a specific object, phenomenon, or theme. Advances in word embedding allow building a concept from a small set of seed terms. However, naive application of such techniques may result in false positive errors because of the polysemy of natural language. To mitigate this problem, we present a visual analytics system called ConceptVector that guides a user in building such concepts and then using them to analyze documents. Document-analysis case studies with real-world datasets demonstrate the fine-grained analysis provided by ConceptVector. To support the elaborate modeling of concepts, we introduce a bipolar concept model and support for specifying irrelevant words. We validate the interactive lexicon building interface by a user study and expert reviews. Quantitative evaluation shows that the bipolar lexicon generated with our methods is comparable to human-generated ones.
Original language | English |
---|---|
Article number | 8023823 |
Pages (from-to) | 361-370 |
Number of pages | 10 |
Journal | IEEE Transactions on Visualization and Computer Graphics |
Volume | 24 |
Issue number | 1 |
DOIs | |
Publication status | Published - 2018 Jan |
Bibliographical note
Funding Information:Research reported in this publication was partially supported by NIH grant R01GM114267 and the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIP) (No. NRF-2016R1C1B2015924). Any opinions, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the funding agencies.
Publisher Copyright:
© 1995-2012 IEEE.
Keywords
- Text analytics
- concepts
- text classification
- text summarization
- visual analytics
- word embedding
ASJC Scopus subject areas
- Software
- Signal Processing
- Computer Vision and Pattern Recognition
- Computer Graphics and Computer-Aided Design