Statistical modelling of performance data for molecular amplification methods in diagnostic virology

  • Llenalia Garcia-Fernandez

    Student thesis: Doctoral Thesis


    Nucleic Acid Technology (NAT), introduced in the late 90s, is a molecular amplification method that can be used for the diagnosis and management of patients with infectious diseases. NAT test results are obtained quicker and are quantified, providing greater information than the positive/negative results available from traditional techniques. However, NATs are technically demanding, susceptible to contamination and hence results from associated diagnostic tests may be inaccurate. External Quality Assessment (EQA) services are programmes developed to assess and advance the quality performance of laboratories that use NAT kits to diagnose, manage and control human diseases. Quality Control for Molecular Diagnostics (QCMD), an organisation that provides EQA, uses proficiency panels designed with samples containing no, weak, medium and strong microbial loads. The panels are distributed to participating laboratories who analyse them knowing the pathogen but blind to the microbial load.

    In this thesis, factors which are significantly associated with EQA participants’ performance are identified. In particular, rigorous statistical methods are used and developed to interrogate, for the first time, the large reservoir of QCMD data and model participants’ performance over time for different pathogens. Furthermore, new scoring schemes are developed to assess individual participants’ performance on individual panels.

    Existing scoring schemes do not take into account known prior information about the sample viral load. We propose, using Bayesian techniques, to score participants with respect to a ‘Bayesian mean’ value obtained from prior information available to QCMD and the values from ‘reference’ laboratories with high reputation. For qualitative (presence/absence) diagnosis, logistic regression models from a Bayesian perspective are developed to fit historical and current data in order to identify factors which are significantly associated with participant performance. For quantitative (estimate of sample microbial load) diagnosis, Generalised Linear Models (GLM) from a Bayesian perspective are developed to fit the data and find significant factors associated with participants’ estimates of the sample microbial load. A more natural parameter inference is made from a Bayesian perspective using the distributions of the parameters given the data. Model validation and robustness are also investigated. Some responses in the quantitative diagnosis are given as censored data, so a GLM which allows the inclusion of the censored observations is introduced and developed in order to obtain a more accurate model to fit these data. Also, a variation of an existing model comparison tool, the Deviance Information Criterion (DIC), is developed in order to discriminate between different suggested models. Extensive use is made of Markov Chain Monte Carlo (MCMC) methods using R statistical software to obtain model estimates.

    The benefits of adopting this approach are the full use of data from panels for the same pathogen over time, above/below limit of detection data and a more accurate target value. These provide a better measure of participant performance, so the advice given to participants about the best technology to be used improves. The techniques developed in this thesis can be applied to other research areas- especially those where GLM for censored observation are used, such as survival analysis in medical research and industrial experiments on reliability.
    Date of AwardApr 2012
    Original languageEnglish
    SupervisorNathaniel Jack (Supervisor)

    Cite this