Ensemble machine learning on gene expression data for cancer classification.

Aik Choon Tan, David Gilbert

Research output: Contribution to journalArticlepeer-review

305 Citations (Scopus)


Whole genome RNA expression studies permit systematic approaches to understanding the correlation between gene expression profiles to disease states or different developmental stages of a cell. Microarray analysis provides quantitative information about the complete transcription profile of cells that facilitate drug and therapeutics development, disease diagnosis, and understanding in the basic cell biology. One of the challenges in microarray analysis, especially in cancerous gene expression profiles, is to identify genes or groups of genes that are highly expressed in tumour cells but not in normal cells and vice versa. Previously, we have shown that ensemble machine learning consistently performs well in classifying biological data. In this paper, we focus on three different supervised machine learning techniques in cancer classification, namely C4.5 decision tree, and bagged and boosted decision trees. We have performed classification tasks on seven publicly available cancerous microarray data and compared the classification/prediction performance of these methods. We have observed that ensemble learning (bagged and boosted decision trees) often performs better than single decision trees in this classification task.

Original languageEnglish
Pages (from-to)S75-83
JournalApplied bioinformatics
Issue number3 Suppl
Publication statusPublished - 2003

ASJC Scopus subject areas

  • Information Systems
  • General Agricultural and Biological Sciences
  • Computer Science Applications


Dive into the research topics of 'Ensemble machine learning on gene expression data for cancer classification.'. Together they form a unique fingerprint.

Cite this