TY - JOUR
T1 - ML2Motif - Reliable extraction of discriminative sequence motifs from learning machines
AU - Vidovic, Marina M.C.
AU - Kloft, Marius
AU - Müller, Klaus Robert
AU - Görnitz, Nico
N1 - Funding Information:
We thank Raphael Pelessof for stimulating discussions. MMCV and NG were supported by BMBF ALICE II grant 01IB15001B. We also acknowledge the support by the German Research Foundation through the grant DFG KL2698/2-1, MU 987/6-1, and RA 1894/1-1. KRM thanks for partial funding by the National Research Foundation of Korea funded by the Ministry of Education, Science, and Technology in the BK21 program. MK and KRM were supported by the German Ministry for Education and Research through the awards 031L0023A and 031B0187B and the Berlin Big Data Center BBDC (01IS14013A).
Publisher Copyright:
Copyright © 2017 Vidovic et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2017/3
Y1 - 2017/3
N2 - High prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. For computational biology, positional oligomer importance matrices (POIMs) have been successfully applied to explain the decision of support vector machines (SVMs) using weighted-degree (WD) kernels. To extract relevant biological motifs from POIMs, the motifPOIM method has been devised and showed promising results on real-world data. Our contribution in this paper is twofold: as an extension to POIMs, we propose gPOIM, a general measure of feature importance for arbitrary learning machines and feature sets (including, but not limited to, SVMs and CNNs) and devise a sampling strategy for efficient computation. As a second contribution, we derive a convex formulation of motif- POIMs that leads to more reliable motif extraction from gPOIMs. Empirical evaluations confirm the usefulness of our approach on artificially generated data as well as on real-world datasets.
AB - High prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. For computational biology, positional oligomer importance matrices (POIMs) have been successfully applied to explain the decision of support vector machines (SVMs) using weighted-degree (WD) kernels. To extract relevant biological motifs from POIMs, the motifPOIM method has been devised and showed promising results on real-world data. Our contribution in this paper is twofold: as an extension to POIMs, we propose gPOIM, a general measure of feature importance for arbitrary learning machines and feature sets (including, but not limited to, SVMs and CNNs) and devise a sampling strategy for efficient computation. As a second contribution, we derive a convex formulation of motif- POIMs that leads to more reliable motif extraction from gPOIMs. Empirical evaluations confirm the usefulness of our approach on artificially generated data as well as on real-world datasets.
UR - http://www.scopus.com/inward/record.url?scp=85016431205&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0174392
DO - 10.1371/journal.pone.0174392
M3 - Article
C2 - 28346487
AN - SCOPUS:85016431205
SN - 1932-6203
VL - 12
JO - PLoS One
JF - PLoS One
IS - 3
M1 - e0174392
ER -