TY - JOUR
T1 - Density-based geodesic distance for identifying the noisy and nonlinear clusters
AU - Yu, Jaehong
AU - Kim, Seoung Bum
N1 - Funding Information:
We thank the editor and referees for their constructive comments and suggestions, which greatly improved the quality of the paper. This work was supported by Brain Korea PLUS and Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Science, ICT and Future Planning ( 2013007724 ). This work was conducted during Seoung Bum Kim’s visit to DIMACS partially enabled through support from the National Science Foundation under grant number CCF-1144502.
PY - 2016/9/10
Y1 - 2016/9/10
N2 - Clustering analysis can facilitate the extraction of implicit patterns in a dataset and elicit its natural groupings without requiring prior classification information. For superior clustering analysis results, a number of distance measures have been proposed. Recently, geodesic distance has been widely applied to clustering algorithms for nonlinear groupings. However, geodesic distance is sensitive to noise and hence, geodesic distance-based clustering may fail to discover nonlinear clusters in the region of the noise. In this study, we propose a density-based geodesic distance that can identify clusters in nonlinear and noisy situations. Experiments on various simulation and benchmark datasets are conducted to examine the properties of the proposed geodesic distance and to compare its performance with that of existing distance measures. The experimental results confirm that a clustering algorithm with the proposed distance measure demonstrated superior performance compared to the competitors; this was especially true when the cluster structures in the data were inherently noisy and nonlinearly patterned.
AB - Clustering analysis can facilitate the extraction of implicit patterns in a dataset and elicit its natural groupings without requiring prior classification information. For superior clustering analysis results, a number of distance measures have been proposed. Recently, geodesic distance has been widely applied to clustering algorithms for nonlinear groupings. However, geodesic distance is sensitive to noise and hence, geodesic distance-based clustering may fail to discover nonlinear clusters in the region of the noise. In this study, we propose a density-based geodesic distance that can identify clusters in nonlinear and noisy situations. Experiments on various simulation and benchmark datasets are conducted to examine the properties of the proposed geodesic distance and to compare its performance with that of existing distance measures. The experimental results confirm that a clustering algorithm with the proposed distance measure demonstrated superior performance compared to the competitors; this was especially true when the cluster structures in the data were inherently noisy and nonlinearly patterned.
KW - Geodesic distance
KW - Mutual neighborhood-based density coefficient
KW - Noisy data clustering
KW - Nonlinearity
UR - http://www.scopus.com/inward/record.url?scp=84969833537&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84969833537&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2016.04.032
DO - 10.1016/j.ins.2016.04.032
M3 - Article
AN - SCOPUS:84969833537
SN - 0020-0255
VL - 360
SP - 231
EP - 243
JO - Information Sciences
JF - Information Sciences
ER -