SSAI National Seminar: Removing unwanted variation: from principal components to random effects

Host Institution:

RMIT (Note: this has changed from the originally advertised AGR)

Title of Seminar:

SSAI National Seminar: Removing unwanted variation: from principal components to random effects

Speaker's Name:

Professor Terry Speed

Speaker's Institution:

Bioinformatics Division, Walter & Eliza Hall Institute of Medical Research, and
Department of Statistics, University of California at Berkeley

Time and Date:

Wednesday 26 June, 4.00pm AEST (2.00pm AWST)

Seminar Abstract:

Ordinary least-squares is a venerable tool for the analysis of scientific data originating in the work of A-M. Legendre and C. F. Gauss around 1800. Gauss used the method extensively in astronomy and geodesy. Generalized least squares is more recent, originating with A. C. Aitken in 1934, though weighted least squares was widely used long before that.  At around the same time (1933) H. Hotelling introduced principal components analysis to psychology. Its modern form is the singular value decomposition. In 1907, motivated by social science, G. U. Yule presented a new notation and derived some identities for linear regression and correlation. Random effects models date back to astronomical work in the mid-19th century, but it was through the work of C. R. Henderson and others in animal science in the 1950s that their connexion with generalized least squares was firmly made.

These are the diverse origins of our story, which concerns the removal of unwanted variation in high-dimensional genomic and other "omic" data using negative controls.  We start with a linear model that Gauss would recognize, with ordinary least squares in mind, but we add unobserved terms to deal with unwanted variation.  A singular value decomposition, one of Yule's identities, and negative control measurements (here genes) permit the identification of our model. In a surprising twist, our initial solution turns out to be equivalent to a form of generalized least squares.  This is the starting point for much of our recent work. In this talk I will try to explain how a rather eclectic mix of familiar statistical ideas can combine with equally familiar notions from biology (negative and positive controls) to give a useful new set of tools for omic data analysis.  Other statisticians have come close to the same endpoint from a different perspectives, including Bayesian, sparse linear and random effects models.

About the speaker
Terry Speed completed a BSc (Hons) in mathematics and statistics at the University of Melbourne (1965), and a PhD in mathematics at Monash University (1969). He   held appointments at the University of Sheffield, U.K. (1969-73) and the University of Western Australia in Perth (1974-82), and he was with Australia's CSIRO between 1983 and 1987.  In 1987 he moved to the Department of Statistics at the University of California at Berkeley (UCB), and has remained with them ever since. In 1997 he took an appointment with the Walter Eliza Hall Institute of Medical Research (WEHI) in Melbourne, Australia, and was 50:50 UCB:WEHI until 2009, when he became  emeritus professor at UCB and full-time at WEHI, where he heads the Bioinformatics Division. His research interests lie in the application of statistics to genetics and genomics, and to related fields such as proteomics, metabolomics and epigenomics.  


This email address is being protected from spambots. You need JavaScript enabled to view it. - Please note your name, email address, and the location of the AGR that you are participating from.

AGR IT support: