TY - GEN
T1 - Going metric
T2 - 16th Annual Neural Information Processing Systems Conference, NIPS 2002
AU - Roth, Volker
AU - Laub, Julian
AU - Buhmann, Joachim M.
AU - Müller, Klaus Robert
PY - 2003
Y1 - 2003
N2 - Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pair-wise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods.
AB - Pairwise data in empirical sciences typically violate metricity, either due to noise or due to fallible estimates, and therefore are hard to analyze by conventional machine learning technology. In this paper we therefore study ways to work around this problem. First, we present an alternative embedding to multi-dimensional scaling (MDS) that allows us to apply a variety of classical machine learning and signal processing algorithms. The class of pair-wise grouping algorithms which share the shift-invariance property is statistically invariant under this embedding procedure, leading to identical assignments of objects to clusters. Based on this new vectorial representation, denoising methods are applied in a second step. Both steps provide a theoretically well controlled setup to translate from pairwise data to the respective denoised metric representation. We demonstrate the practical usefulness of our theoretical reasoning by discovering structure in protein sequence data bases, visibly improving performance upon existing automatic methods.
UR - http://www.scopus.com/inward/record.url?scp=84898938392&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84898938392
SN - 0262025507
SN - 9780262025508
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 15 - Proceedings of the 2002 Conference, NIPS 2002
PB - Neural information processing systems foundation
Y2 - 9 December 2002 through 14 December 2002
ER -