Canonical correlation analysis on data with censoring and error information

Jianyong Sun, Simeon Keates

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

We developed a probabilistic model for canonical correlation analysis in the case when the associated datasets are incomplete. This case can arise where data entries either contain measurement errors or are censored (i.e., nonignorable missing) due to uncertainties in instrument calibration and physical limitations of devices and experimental conditions. The aim of our model is to estimate the true correlation coefficients, through eliminating the effects of measurement errors and abstracting helpful information from censored data. As exact inference is not possible for the proposed model, a modified variational Expectation-Maximization (EM) algorithm was developed. In the algorithm developed, we approximated the posteriors of the latent variables as normal distributions. In the experiment, the modified E-step approximation accuracy is first empirically demonstrated by being compared to hybrid Monte Carlo (HMC) sampling. The following experiments were carried out on synthetic datasets with different numbers of censored data and different correlation coefficient settings to compare the proposed algorithm with a maximum a posteriori (MAP) solution and a Markov Chain-EM solution. Experimental results showed that the variational EM solution compares favorably against the MAP solution, approaching the accuracy of the Markov Chain-EM, while maintaining computational simplicity. We finally applied the proposed algorithm to finding the mostly correlated properties of galaxy group with the X-ray luminosity.
Original languageEnglish
Pages (from-to)1909-1919
Number of pages11
JournalIEEE Transactions on Neural Networks and Learning Systems
Volume24
Issue number12
DOIs
Publication statusPublished - Dec 2013

Fingerprint

Measurement errors
Markov processes
Galaxies
Normal distribution
Luminance
Data acquisition
Experiments
Calibration
Sampling
X rays
Statistical Models
Uncertainty

Cite this

@article{eff1380267914d95ac5fd26fad61d012,
title = "Canonical correlation analysis on data with censoring and error information",
abstract = "We developed a probabilistic model for canonical correlation analysis in the case when the associated datasets are incomplete. This case can arise where data entries either contain measurement errors or are censored (i.e., nonignorable missing) due to uncertainties in instrument calibration and physical limitations of devices and experimental conditions. The aim of our model is to estimate the true correlation coefficients, through eliminating the effects of measurement errors and abstracting helpful information from censored data. As exact inference is not possible for the proposed model, a modified variational Expectation-Maximization (EM) algorithm was developed. In the algorithm developed, we approximated the posteriors of the latent variables as normal distributions. In the experiment, the modified E-step approximation accuracy is first empirically demonstrated by being compared to hybrid Monte Carlo (HMC) sampling. The following experiments were carried out on synthetic datasets with different numbers of censored data and different correlation coefficient settings to compare the proposed algorithm with a maximum a posteriori (MAP) solution and a Markov Chain-EM solution. Experimental results showed that the variational EM solution compares favorably against the MAP solution, approaching the accuracy of the Markov Chain-EM, while maintaining computational simplicity. We finally applied the proposed algorithm to finding the mostly correlated properties of galaxy group with the X-ray luminosity.",
author = "Jianyong Sun and Simeon Keates",
year = "2013",
month = "12",
doi = "10.1109/TNNLS.2013.2262949",
language = "English",
volume = "24",
pages = "1909--1919",
journal = "IEEE Transactions on Neural Networks and Learning Systems",
issn = "2162-237X",
publisher = "IEEE Computational Intelligence Society",
number = "12",

}

Canonical correlation analysis on data with censoring and error information. / Sun, Jianyong; Keates, Simeon.

In: IEEE Transactions on Neural Networks and Learning Systems, Vol. 24, No. 12, 12.2013, p. 1909-1919.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Canonical correlation analysis on data with censoring and error information

AU - Sun, Jianyong

AU - Keates, Simeon

PY - 2013/12

Y1 - 2013/12

N2 - We developed a probabilistic model for canonical correlation analysis in the case when the associated datasets are incomplete. This case can arise where data entries either contain measurement errors or are censored (i.e., nonignorable missing) due to uncertainties in instrument calibration and physical limitations of devices and experimental conditions. The aim of our model is to estimate the true correlation coefficients, through eliminating the effects of measurement errors and abstracting helpful information from censored data. As exact inference is not possible for the proposed model, a modified variational Expectation-Maximization (EM) algorithm was developed. In the algorithm developed, we approximated the posteriors of the latent variables as normal distributions. In the experiment, the modified E-step approximation accuracy is first empirically demonstrated by being compared to hybrid Monte Carlo (HMC) sampling. The following experiments were carried out on synthetic datasets with different numbers of censored data and different correlation coefficient settings to compare the proposed algorithm with a maximum a posteriori (MAP) solution and a Markov Chain-EM solution. Experimental results showed that the variational EM solution compares favorably against the MAP solution, approaching the accuracy of the Markov Chain-EM, while maintaining computational simplicity. We finally applied the proposed algorithm to finding the mostly correlated properties of galaxy group with the X-ray luminosity.

AB - We developed a probabilistic model for canonical correlation analysis in the case when the associated datasets are incomplete. This case can arise where data entries either contain measurement errors or are censored (i.e., nonignorable missing) due to uncertainties in instrument calibration and physical limitations of devices and experimental conditions. The aim of our model is to estimate the true correlation coefficients, through eliminating the effects of measurement errors and abstracting helpful information from censored data. As exact inference is not possible for the proposed model, a modified variational Expectation-Maximization (EM) algorithm was developed. In the algorithm developed, we approximated the posteriors of the latent variables as normal distributions. In the experiment, the modified E-step approximation accuracy is first empirically demonstrated by being compared to hybrid Monte Carlo (HMC) sampling. The following experiments were carried out on synthetic datasets with different numbers of censored data and different correlation coefficient settings to compare the proposed algorithm with a maximum a posteriori (MAP) solution and a Markov Chain-EM solution. Experimental results showed that the variational EM solution compares favorably against the MAP solution, approaching the accuracy of the Markov Chain-EM, while maintaining computational simplicity. We finally applied the proposed algorithm to finding the mostly correlated properties of galaxy group with the X-ray luminosity.

U2 - 10.1109/TNNLS.2013.2262949

DO - 10.1109/TNNLS.2013.2262949

M3 - Article

VL - 24

SP - 1909

EP - 1919

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

SN - 2162-237X

IS - 12

ER -