Robust Bayesian clustering for replicated gene expression data

Jianyong Sun, Jonathan M. Garibaldi, Kim Kenobi

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and this correlation must be carefully dealt with in scientific analysis. In this paper, we propose a robust Bayesian mixture model for clustering data sets with replicated measurements. The model aims not only to accurately cluster the data points taking the replicated measurements into consideration, but also to find the outliers (i.e., scattered objects) which are possibly required to be studied further. A tree-structured variational Bayes (VB) algorithm is developed to carry out model fitting. Experimental studies showed that our model compares favorably with the infinite Gaussian mixture model, while maintaining computational simplicity. We demonstrate the benefits of including the replicated measurements in the model, in terms of improved outlier detection rates in varying measurement uncertainty conditions. Finally, we apply the approach to clustering biological transcriptomics mRNA expression data sets with replicated measurements.
Original languageEnglish
Pages (from-to)1504 - 1514
Number of pages11
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume9
Issue number5
DOIs
Publication statusPublished - 2012
Externally publishedYes

Fingerprint

Gene Expression Data
Gene expression
Cluster Analysis
Clustering
Gene Expression
Uncertainty
Variational Bayes
Messenger RNA
Measurement Uncertainty
Outlier Detection
Data Clustering
Model Fitting
Gaussian Mixture Model
Bayesian Model
Mixture Model
Outlier
Biology
Datasets
Experimental Study
Simplicity

Cite this

Sun, Jianyong ; Garibaldi, Jonathan M. ; Kenobi, Kim. / Robust Bayesian clustering for replicated gene expression data. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2012 ; Vol. 9, No. 5. pp. 1504 - 1514.
@article{7b3db53ac05f4b48a12edf147e1ad8c5,
title = "Robust Bayesian clustering for replicated gene expression data",
abstract = "Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and this correlation must be carefully dealt with in scientific analysis. In this paper, we propose a robust Bayesian mixture model for clustering data sets with replicated measurements. The model aims not only to accurately cluster the data points taking the replicated measurements into consideration, but also to find the outliers (i.e., scattered objects) which are possibly required to be studied further. A tree-structured variational Bayes (VB) algorithm is developed to carry out model fitting. Experimental studies showed that our model compares favorably with the infinite Gaussian mixture model, while maintaining computational simplicity. We demonstrate the benefits of including the replicated measurements in the model, in terms of improved outlier detection rates in varying measurement uncertainty conditions. Finally, we apply the approach to clustering biological transcriptomics mRNA expression data sets with replicated measurements.",
author = "Jianyong Sun and Garibaldi, {Jonathan M.} and Kim Kenobi",
year = "2012",
doi = "10.1109/TCBB.2012.85",
language = "English",
volume = "9",
pages = "1504 -- 1514",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "5",

}

Robust Bayesian clustering for replicated gene expression data. / Sun, Jianyong; Garibaldi, Jonathan M.; Kenobi, Kim.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 9, No. 5, 2012, p. 1504 - 1514.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Robust Bayesian clustering for replicated gene expression data

AU - Sun, Jianyong

AU - Garibaldi, Jonathan M.

AU - Kenobi, Kim

PY - 2012

Y1 - 2012

N2 - Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and this correlation must be carefully dealt with in scientific analysis. In this paper, we propose a robust Bayesian mixture model for clustering data sets with replicated measurements. The model aims not only to accurately cluster the data points taking the replicated measurements into consideration, but also to find the outliers (i.e., scattered objects) which are possibly required to be studied further. A tree-structured variational Bayes (VB) algorithm is developed to carry out model fitting. Experimental studies showed that our model compares favorably with the infinite Gaussian mixture model, while maintaining computational simplicity. We demonstrate the benefits of including the replicated measurements in the model, in terms of improved outlier detection rates in varying measurement uncertainty conditions. Finally, we apply the approach to clustering biological transcriptomics mRNA expression data sets with replicated measurements.

AB - Experimental scientific data sets, especially biology data, usually contain replicated measurements. The replicated measurements for the same object are correlated, and this correlation must be carefully dealt with in scientific analysis. In this paper, we propose a robust Bayesian mixture model for clustering data sets with replicated measurements. The model aims not only to accurately cluster the data points taking the replicated measurements into consideration, but also to find the outliers (i.e., scattered objects) which are possibly required to be studied further. A tree-structured variational Bayes (VB) algorithm is developed to carry out model fitting. Experimental studies showed that our model compares favorably with the infinite Gaussian mixture model, while maintaining computational simplicity. We demonstrate the benefits of including the replicated measurements in the model, in terms of improved outlier detection rates in varying measurement uncertainty conditions. Finally, we apply the approach to clustering biological transcriptomics mRNA expression data sets with replicated measurements.

U2 - 10.1109/TCBB.2012.85

DO - 10.1109/TCBB.2012.85

M3 - Article

VL - 9

SP - 1504

EP - 1514

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 5

ER -