Somatic mutation-calling based on DNA from matched tumor-normal patient samples is one of the key tasks carried
by many cancer genome projects. One such large-scale project is The Cancer Genome Atlas (TCGA), which is now
routinely compiling catalogs of somatic mutations from hundreds of paired tumor-normal DNA exome‑sequence
datasets. Several mutation-callers are publicly available and more are likely to appear. Nonetheless,
mutation‑calling is still challenging and there is unlikely to be one established caller that systematically
outperforms all others. Evaluation of the mutation callers or understanding the sources of discrepancies is not
straightforward, since for most tumor studies, validation data based on independent whole exome DNA sequencing
is not available, only partial validation data for a selected (ascertained) subset of sites.
We have analyzed several sets of mutation calling data from TCGA benchmark studies and their partial validation
data. To assess the performances of multiple callers, we introduce approaches utilizing the external sequence data
to varying degrees, ranging from having independent DNA-seq pairs, RNA-seq for tumor samples only, the original
exome-seq pairs only, or none of those.
Utilizing multiple callers can be a powerful way to construct a list of final calls for one’s research. Using a set of
mutations from multiple callers that are impartially validated, we present a statistical approach for building a
combined caller, which can be applied to combine calls in a wider dataset generated using a similar protocol. The
approach allows us to build a combined caller across the full range of stringency levels, which outperforms all of
the individual callers.
This is joint work with Su Yeon Kim and Laurent Jacob.
About the 2014 AMSI-SSAI Lecture Tour:
Between the months of August and November this year, the 2013 Prime Minister's Science Prize winner and one of Australia’s most eminent statisticians, Professor Terry Speed, will be touring the country as the 2014 AMSI-SSAI Lecturer. This AGR Seminar is part of the Lecture Tour.
This annual event gives the research community and the general public an opportunity to hear top academics in the fields of both pure and applied mathematics speak about their research.
For more information about the Lecture Tour schedule, please click here.