Abstract
Because clustering is an unsupervised learning task, a number of different validity indices have been proposed to measure the quality of the clustering results. However, there is no single best validity measure for all types of clustering tasks because individual clustering validity indices have both advantages and shortcomings. Because each validity index has demonstrated its effectiveness in particular cases, it is reasonable to expect that a more generalized clustering validity index can be developed, if individually effective cluster validity indices are appropriately integrated. In this paper, we propose a new cluster validity index, named Charnes, Cooper & Rhodes − cluster validity (CCR-CV), by integrating eight internal clustering efficiency measures based on data envelopment analysis (DEA). The proposed CCR-CV can be used for purposes that are more general because it extends the coverage of a single validity index by adaptively adjusting the combining weights of different validity indices for different datasets. Based on the experimental results on 12 artificial and 30 real datasets, the proposed clustering validity index demonstrates superior ability to determine the optimal and plausible cluster structures compared to benchmark individual validity indices.
Original language | English |
---|---|
Pages (from-to) | 94-108 |
Number of pages | 15 |
Journal | Applied Soft Computing Journal |
Volume | 64 |
DOIs | |
Publication status | Published - 2018 Mar |
Bibliographical note
Funding Information:This research was supported by (1) Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2016R1D1A1B03930729), (2) National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT; Ministry of Science, ICT) (NRF-2015R1A2A2A04007359), and (3) Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. 2017-0-00349, Development of Media Streaming system with Machine Learning using QoE (Quality of Experience)).
Publisher Copyright:
© 2017 Elsevier B.V.
Keywords
- Clustering validity
- Data envelopment analysis
- Internal measure
- Linear programming
ASJC Scopus subject areas
- Software