TY - JOUR
T1 - TopicLens
T2 - Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections
AU - Kim, Minjeong
AU - Kang, Kyeongpil
AU - Park, Deokgun
AU - Choo, Jaegul
AU - Elmqvist, Niklas
N1 - Funding Information:
Research reported in this publication was partially supported by NIH grant R01GM114267 and by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2016R1C1B2015924). Any opinions, findings, and conclusions or recommendations expressed in this article are those of the authors and do not necessarily reflect the views of the funding agencies.
Publisher Copyright:
© 2016 IEEE.
PY - 2017/1
Y1 - 2017/1
N2 - Topic modeling, which reveals underlying topics of a document corpus, has been actively adopted in visual analytics for large-scale document collections. However, due to its significant processing time and non-interactive nature, topic modeling has so far not been tightly integrated into a visual analytics workflow. Instead, most such systems are limited to utilizing a fixed, initial set of topics. Motivated by this gap in the literature, we propose a novel interaction technique called TopicLens that allows a user to dynamically explore data through a lens interface where topic modeling and the corresponding 2D embedding are efficiently computed on the fly. To support this interaction in real time while maintaining view consistency, we propose a novel efficient topic modeling method and a semi-supervised 2D embedding algorithm. Our work is based on improving state-of-the-art methods such as nonnegative matrix factorization and t-distributed stochastic neighbor embedding. Furthermore, we have built a web-based visual analytics system integrated with TopicLens. We use this system to measure the performance and the visualization quality of our proposed methods. We provide several scenarios showcasing the capability of TopicLens using real-world datasets.
AB - Topic modeling, which reveals underlying topics of a document corpus, has been actively adopted in visual analytics for large-scale document collections. However, due to its significant processing time and non-interactive nature, topic modeling has so far not been tightly integrated into a visual analytics workflow. Instead, most such systems are limited to utilizing a fixed, initial set of topics. Motivated by this gap in the literature, we propose a novel interaction technique called TopicLens that allows a user to dynamically explore data through a lens interface where topic modeling and the corresponding 2D embedding are efficiently computed on the fly. To support this interaction in real time while maintaining view consistency, we propose a novel efficient topic modeling method and a semi-supervised 2D embedding algorithm. Our work is based on improving state-of-the-art methods such as nonnegative matrix factorization and t-distributed stochastic neighbor embedding. Furthermore, we have built a web-based visual analytics system integrated with TopicLens. We use this system to measure the performance and the visualization quality of our proposed methods. We provide several scenarios showcasing the capability of TopicLens using real-world datasets.
KW - magic lens
KW - nonnegative matrix factorization
KW - t-distributed stochastic neighbor embedding
KW - text analytics
KW - topic modeling
UR - http://www.scopus.com/inward/record.url?scp=84999233615&partnerID=8YFLogxK
U2 - 10.1109/TVCG.2016.2598445
DO - 10.1109/TVCG.2016.2598445
M3 - Article
C2 - 27875138
AN - SCOPUS:84999233615
SN - 1077-2626
VL - 23
SP - 151
EP - 160
JO - IEEE Transactions on Visualization and Computer Graphics
JF - IEEE Transactions on Visualization and Computer Graphics
IS - 1
M1 - 7539597
ER -