The charter of the Transformational Bioinformatics team is to develop novel bioinformatics solutions for research and industry using the latest in cloud-computing and BigData infrastructure. We specifically focus on population-scale ‘omics (genomics, transcriptomics, methylomics) analysis as well as genome engineering applications as the two high impact life science areas.
Our science focuses on developing sophisticated machine learning technology specifically tailored to handle high-dimensional genomic data as well as provide the high-precision required for editing the genomes of living cells.
Impact on the Health System
We partner with the Melbourne genomics health alliance as well as several NHMRC projects contributing our capability to analyze large scale genomics data. We also partner with the Australian CRISPR facilities (Peter MacCallum Cancer Centre and The John Curtin School of Medical Research) as well as international institutes like the Whitehead Institute for Biomedical Research at Massachusetts Institute of Technology, US. Both genomic, or other ‘omics information, as well as genome engineering are a game-changer for the Health system as it enables quantifiable, evidence-based and highly personalised treatment choices and preventative interventions.
We developed VariantSpark, a Hadoop/Spark-based data analytical framework for population-scale ‘omics data. It can cluster patients by their genomic profile or identify disease associated genes in whole genome cohorts (thousands of samples with millions of variants each) in just 30 minutes. This allows –for the first time- personalized genomic insights at point-of-care, by e.g. finding patients-like-mine based on their genomic similarity to other patients in international studies or the health care system.
Finding the optimal spot to edit is a computationally challenging optimisation problem, balancing efficient incorporation rate with location specificity. We therefore developed GT-Scan, a cloud-based framework recommending researchers the optimal target site for genome editing applications. It can identify all potential sites and process them efficiently by using server-less functions, which are capable of massively parallelising task in a web-service environment. This allows queries to remain constant in runtime (~1 min) despite the underlying complexity varying drastically (e.g. 100-100K binding site candidates), ideally catering for time critical clinical workflows.
July 31, 2018Bioinformatics Case Study, Case study, ALS, Amyotrophic Later Sclerosis, Bioinformatics, case-control, Genome Analysis, GWAS, machine learing, Macquarie University, motorneuron, Project MinE, VariantSpark
CSIRO is a partner in the Dementia Team Grant led by Prof Ian Blair at Macquarie University – one of only six funded applications. The Challenge: Uncovering the molecular mechanisms of Amyotrophic Later Sclerosis (ALS)....
March 20, 2018Bioinformatics, Case study, ALS, Arash Bayat, Dementia, machine learning, Natalie Twine, Project MinE, Transformational Bioinformatics, VariantSpark
Advanced bioinformatics tools help sift through millions of genomic mutations to discover the origins of dementia and related neurodegenerative diseases as part of a network of national and international experts. The Challenge:...
November 23, 2017Bioinformatics, Bioinformatics Case Study, Software
Minimises overhead for set up and processing of new projects. NGSANE We developed NGSANE , a Linux-based, HPC-enabled framework that minimises overhead for set up and processing of new projects yet maintains full flexibility of...
Senior Research Scientist, building personalised health and computational genome engineering applications using cloud computing.
Postdoctoral Researcher, focusing on machine learning methods to identify high-dimensional genome wide disease associations.
Research Scientist, focusing on applying advanced machine learning algorithms to decipher human genome function for better health.
PhD Student, Cloud computing expert now focusing on computationally guiding gene-insertion with Australia’s most prestigious Genome Engineering Facilities.
PhD Student, Analysing liquid biopsy to detect cancer recurrence through high-throughput ‘omics data integration.
Postdoctoral Researcher, focusing on Whole Genome Sequencing data of ALS as part of the international Project MINE consortium.
Postdoctoral Researcher, researching governing factors of CRISPR/Cas9 activity for gene knock-out applications.
- O’Brien AR, Saunders NFW, Guo Y, Buske FA, Scott RJ, Bauer DC (2015). Population Scale Clustering of Genotype Information BMC Genomics (IF=3.99) 2015, 16:1052. Citations 15.
VariantSpark is one of the first published software frameworks capable of applying machine learning to population-size genomics datasets. It outperforms tools by the Global Alliance for Genomics and Health (US Precision Medicine Initiative) and had in 2017 an invited presentation at SparkSummit East, Boston, and Strata Hadoop, London, two of the world most preeminent BigData conferences.
- Zhang ZH., Jhaveri DJ., Marshall VM., Bauer DC., Edson J., Narayanan RK., Robinson GJ., Lundberg AE., Bartlett PF., Wray NR., Zhao Q. (2014). A comparative study of techniques for differential expression analysis on RNA-Seq data. PLoS One (IF=3.730), 13;9(8):e103207. Citations 62.
This paper describes the analysis pipeline used by the sequencing facility of the Queensland Brain Institute for transcriptome quantification. It represents one of the first empirical comparisons of different RNAseq library and analysis protocols highlighting the variability originating from in-silico versus laboratory differences.
- Bauer DC, Zadoorian A, Wilson LW, Melbourne Genomics Health Alliance, and Thorne NP (2016) Evaluation of computational programs to predict HLA genotypes from genomic sequencing data. Briefings in Bioinformatics (IF=10) Nov 1. pii: bbw097. Citations: 1.
In-silico tests are pitted to replace laboratory testing due to faster turnaround times and higher economical benefits. However, this paper demonstrates that for one of the primary applications areas in organ transplant and drug-choices, in-silio HLA genotyping does not yet deliver clinical accuracy.
- Hortle E, Nijagal B, Bauer DC, Cockburn IA, Lampkin S, Tull D, McConville MJ, McMorran BJ, Foote SJ, and Burgio G (2016) AMPD3 Activation Shortens Erythrocyte half-life and Provides Malaria Resistance in Mice. Blood (IF=12). 2016 Sep 1;128(9):1290-301. Citations: 3
This paper establishes that AMPD3 activation causes malaria resistance through increased red blood cell turnover and hence production of new un-infected cells. This represents a novel mechanism of clearing malaria infection and hence a putative human treatment that is not susceptible to drug resistance.
- Bauer DC, Gaff C, Dinger ME, Caramins M, Buske FA, Fenech M, Hansen D, Cobiac L (2014) Genomics and personalised whole-of-life healthcare Trends Mol Med (IF=10). 2014 May 4. pii: S1471-4914(14)00062-8. Citations: 7.
This review paper describes CSIRO’s vision of harnessing genomic data, health records and information from personal sensing devices to improve medical practice. It brings together the three main bodies of clinical genomics in Australia: Melbourne Health Alliance, Garvan Institute of medical research and the Royal College of Pathologists.
- Kerr C, Grice D, Tran C, Bauer DC, Hendry P, Li D, Hannan G, (2014) Early life events influence whole-of-life metabolic health via gut microflora and gut permeability, Crit Rev Microbiol. (IF=5) Mar 19. Citations: 30.
Review paper discussing the impact of the gut microbiome on health through the lifespan of an individual. It specifically focusses on how a diverse gut microbiome can prevent disease by providing a level of ‘resilience’ and promotes a healthy aging.
- Greenfield P, Duesing K, Papanicolaou A, Bauer DC (2014). Blue: correcting sequencing errors using consensus and context. Bioinformatics (IF=6). Jun 11. Citations: 25.
This paper describes the need for error correction to reduce processing time and increase the accuracy for genomic sequencing data. The paper contains a comparison to previously published error correction software, which finds our tool to be the most efficient for both runtime and memory consumption making it the first tool to be applicable to human whole genome data.
- Taberlay PC, Achinger-Kawecka J,Lun ATL, Buske FA, Sabir K, Gould CM, Zotenko E, Bert SA, Giles KA, Bauer DC, Smyth GK, Stirzaker C, O’Donoghue SI, Clark SJ (2016). Three-dimensional disorganization of the cancer genome occurs coincident with long-range genetic and epigenetic alterations. Genome Res (IF=15) Jun;26(6):719-31 Citations: 9.
One of the first systematic analysis of genomic 3D structure and its involvement in cancer. It interrogates the ability of cancer cells to maintain long-range interactions between distant chromosome loci, dispelling the assumption that cancer genomes lose the ability to organize higher order genomic structures.
- Li X., Luo OJ.,Wang P., Zheng M., Wang D., Piecuch E., Zhu JJ., Tian SZ., Tang Z., Li G., Ruan Y. (2017). Long-read ChIA-PET for base-pair resolution mapping of haplotype-specific chromatin interactions. Nature Protocol (IF=11) (In press), Citations: 0
The paper describes an improved experimental and computational methodology to produce chromatin interaction data for inferring 3D genome organization. This is the first method to be able to identify allele-specific chromatin interactions at nucleotide resolution.
- Genetic correlation between amyotrophic lateral sclerosis and schizophreniaRL McLaughlin, D Schijven, W Van Rheenen, KR Van Eijk, M O’Brien, …
Nature communications 8, 14774.