Feature discovery in non-metric pairwise data

Julian Laub, Klaus Robert Müller

    Research output: Contribution to journalArticlepeer-review

    57 Citations (Scopus)

    Abstract

    Pairwise proximity data, given as similarity or dissimilarity matrix, can violate metricity. This occurs either due to noise, fallible estimates, or due to intrinsic non-metric features such as they arise from human judgments. So far the problem of non-metric pairwise data has been tackled by essentially omitting the negative eigenvalues or shifting the spectrum of the associated (pseudo-)covariance matrix for a subsequent embedding. However, little attention has been paid to the negative part of the spectrum itself. In particular no answer was given to whether the directions associated to the negative eigenvalues would at all code variance other than noise related. We show by a simple, exploratory analysis that the negative eigenvalues can code for relevant structure in the data, thus leading to the discovery of new features, which were lost by conventional data analysis techniques. The information hidden in the negative eigenvalue part of the spectrum is illustrated and discussed for three data sets, namely USPS handwritten digits, text-mining and data from cognitive psychology.

    Original languageEnglish
    Pages (from-to)801-818
    Number of pages18
    JournalJournal of Machine Learning Research
    Volume5
    Publication statusPublished - 2004 Jul 1

    Bibliographical note

    Publisher Copyright:
    © 2004 Julian Laub and Klaus-Robert Müller.

    Keywords

    • Embedding
    • Exploratory data analysis
    • Feature discovery
    • Non-metric
    • Pairwise data
    • Unsupervised learning

    ASJC Scopus subject areas

    • Software
    • Control and Systems Engineering
    • Statistics and Probability
    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Feature discovery in non-metric pairwise data'. Together they form a unique fingerprint.

    Cite this