Clustering analysis can facilitate the extraction of implicit patterns in a dataset and elicit its natural groupings without requiring prior classification information. For superior clustering analysis results, a number of distance measures have been proposed. Recently, geodesic distance has been widely applied to clustering algorithms for nonlinear groupings. However, geodesic distance is sensitive to noise and hence, geodesic distance-based clustering may fail to discover nonlinear clusters in the region of the noise. In this study, we propose a density-based geodesic distance that can identify clusters in nonlinear and noisy situations. Experiments on various simulation and benchmark datasets are conducted to examine the properties of the proposed geodesic distance and to compare its performance with that of existing distance measures. The experimental results confirm that a clustering algorithm with the proposed distance measure demonstrated superior performance compared to the competitors; this was especially true when the cluster structures in the data were inherently noisy and nonlinearly patterned.
Bibliographical noteFunding Information:
We thank the editor and referees for their constructive comments and suggestions, which greatly improved the quality of the paper. This work was supported by Brain Korea PLUS and Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Science, ICT and Future Planning ( 2013007724 ). This work was conducted during Seoung Bum Kim’s visit to DIMACS partially enabled through support from the National Science Foundation under grant number CCF-1144502.
- Geodesic distance
- Mutual neighborhood-based density coefficient
- Noisy data clustering
ASJC Scopus subject areas
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence