Transformational bioinformatics

KJ0SNSW5Rdx6g5A9fQna0,ks7XfI8MmxmPK71IdERIouaMbRLT-H1B_I8d-YxD8vI,zGjG_PAkrbtsq0RbenNrhsYdLFWzF4XmE8DH2EIdwFQ,WdzzG_zgDiL931823b08Xqn6BoT9QFkf7B4Sx3qO04ADenis Bauer
Team Leader
Transformational Bioinformatics

The charter of the Transformational Bioinformatics team is to develop novel bioinformatics solutions for research and industry using the latest in cloud-computing and BigData infrastructure. We specifically focus on population-scale ‘omics (genomics, transcriptomics, methylomics) analysis as well as genome engineering applications as the two high impact life science areas.

Our science focuses on developing sophisticated machine learning technology specifically tailored to handle high-dimensional genomic data as well as provide the high-precision required for editing the genomes of living cells.

Impact on the Health System
We partner with the Melbourne genomics health alliance as well as several NHMRC projects contributing our capability to analyze large scale genomics data. We also partner with the Australian CRISPR facilities (Peter MacCallum Cancer Centre and The John Curtin School of Medical Research) as well as international institutes like the Whitehead Institute for Biomedical Research at Massachusetts Institute of Technology, US. Both genomic, or other ‘omics information, as well as genome engineering are a game-changer for the Health system as it enables quantifiable, evidence-based and highly personalised treatment choices and preventative interventions.

Our Solutions
We developed VariantSpark, a Hadoop/Spark-based data analytical framework for population-scale ‘omics data. It can cluster patients by their genomic profile or identify disease associated genes in whole genome cohorts (thousands of samples with millions of variants each) in just 30 minutes. This allows –for the first time- personalized genomic insights at point-of-care, by e.g. finding patients-like-mine based on their genomic similarity to other patients in international studies or the health care system.

Finding the optimal spot to edit is a computationally challenging optimisation problem, balancing efficient incorporation rate with location specificity. We therefore developed GT-Scan, a cloud-based framework recommending researchers the optimal target site for genome editing applications. It can identify all potential sites and process them efficiently by using server-less functions, which are capable of massively parallelising task in a web-service environment. This allows queries to remain constant in runtime (~1 min) despite the underlying complexity varying drastically (e.g. 100-100K binding site candidates), ideally catering for time critical clinical workflows.



March 9, 2018

GT-Scan Suite

Bioinformatics Software, Case study

Computational tools to improve genome engineering applications   The GT-Scan suite contains computational tools for efficiently solving the optimisation problem of identifying the most suitable target for genome engineering...

March 9, 2018


Bioinformatics Software, Case study

machine learning for population-scale whole genome data Genomic information is increasingly being used for medical research, giving rise to the need for efficient analysis methodologies able to cope with thousands of individuals and...

Case Studies


Team Members:

Denis Bauer

Arash Bayat

Kaitao Lai

Oscar Luo

Aidan O’Brien

Anita Sathyanarayanan





Natalie Twine

Laurence Wilson








Denis Bauer
Senior Research Scientist, building personalised health and computational genome engineering applications using cloud computing.

Arash Bayat
Postdoctoral Researcher, focusing on machine learning methods to identify high-dimensional genome wide disease associations.

Kaitao Lai
Postdoctoral Researcher, developing predictive models for CRISPR/Cas12a activity with specific focus on enabling genome engineering in the Agriculture industry.

Oscar Luo
Research Scientist, focusing on applying advanced machine learning algorithms to decipher human genome function for better health.

Aidan O’Brien
PhD Student, Cloud computing expert now focusing on computationally guiding gene-insertion with Australia’s most prestigious Genome Engineering Facilities.

Anita Sathyanarayanan
PhD Student, Analysing liquid biopsy to detect cancer recurrence through high-throughput ‘omics data integration.

Natalie Twine
Postdoctoral Researcher, focusing on Whole Genome Sequencing data of ALS as part of the international Project MINE consortium.

Laurence Wilson
Postdoctoral Researcher, researching governing factors of CRISPR/Cas9 activity for gene knock-out applications.


  1. O’Brien AR, Saunders NFW, Guo Y, Buske FA, Scott RJ, Bauer DC (2015). Population Scale Clustering of Genotype Information BMC Genomics (IF=3.99) 2015, 16:1052. Citations 15.
    VariantSpark is one of the first published software frameworks capable of applying machine learning to population-size genomics datasets. It outperforms tools by the Global Alliance for Genomics and Health (US Precision Medicine Initiative) and had in 2017 an invited presentation at SparkSummit East, Boston, and Strata Hadoop, London, two of the world most preeminent BigData conferences.
  2. Zhang ZH., Jhaveri DJ., Marshall VM., Bauer DC., Edson J., Narayanan RK., Robinson GJ., Lundberg AE., Bartlett PF., Wray NR., Zhao Q. (2014). A comparative study of techniques for differential expression analysis on RNA-Seq data. PLoS One (IF=3.730), 13;9(8):e103207. Citations 62.
    This paper describes the analysis pipeline used by the sequencing facility of the Queensland Brain Institute for transcriptome quantification. It represents one of the first empirical comparisons of different RNAseq library and analysis protocols highlighting the variability originating from in-silico versus laboratory differences.
  3. Bauer DC, Zadoorian A, Wilson LW, Melbourne Genomics Health Alliance, and Thorne NP (2016) Evaluation of computational programs to predict HLA genotypes from genomic sequencing data. Briefings in Bioinformatics (IF=10) Nov 1. pii: bbw097. Citations: 1.
    In-silico tests are pitted to replace laboratory testing due to faster turnaround times and higher economical benefits. However, this paper demonstrates that for one of the primary applications areas in organ transplant and drug-choices, in-silio HLA genotyping does not yet deliver clinical accuracy.
  4. Hortle E, Nijagal B, Bauer DC, Cockburn IA, Lampkin S, Tull D, McConville MJ, McMorran BJ, Foote SJ, and Burgio G (2016) AMPD3 Activation Shortens Erythrocyte half-life and Provides Malaria Resistance in Mice. Blood (IF=12). 2016 Sep 1;128(9):1290-301. Citations: 3
    This paper establishes that AMPD3 activation causes malaria resistance through increased red blood cell turnover and hence production of new un-infected cells. This represents a novel mechanism of clearing malaria infection and hence a putative human treatment that is not susceptible to drug resistance.
  5. Bauer DC, Gaff C, Dinger ME, Caramins M, Buske FA, Fenech M, Hansen D, Cobiac L (2014) Genomics and personalised whole-of-life healthcare Trends Mol Med (IF=10). 2014 May 4. pii: S1471-4914(14)00062-8. Citations: 7.
    This review paper describes CSIRO’s vision of harnessing genomic data, health records and information from personal sensing devices to improve medical practice. It brings together the three main bodies of clinical genomics in Australia: Melbourne Health Alliance, Garvan Institute of medical research and the Royal College of Pathologists.
  6. Kerr C, Grice D, Tran C, Bauer DC, Hendry P, Li D, Hannan G, (2014) Early life events influence whole-of-life metabolic health via gut microflora and gut permeability, Crit Rev Microbiol. (IF=5) Mar 19. Citations: 30.
    Review paper discussing the impact of the gut microbiome on health through the lifespan of an individual. It specifically focusses on how a diverse gut microbiome can prevent disease by providing a level of ‘resilience’ and promotes a healthy aging.
  7.  Greenfield P, Duesing K, Papanicolaou A, Bauer DC (2014). Blue: correcting sequencing errors using consensus and context. Bioinformatics (IF=6). Jun 11. Citations: 25.
    This paper describes the need for error correction to reduce processing time and increase the accuracy for genomic sequencing data. The paper contains a comparison to previously published error correction software, which finds our tool to be the most efficient for both runtime and memory consumption making it the first tool to be applicable to human whole genome data.
  8. Taberlay PC, Achinger-Kawecka J,Lun ATL, Buske FA, Sabir K, Gould CM, Zotenko E, Bert SA, Giles KA, Bauer DC, Smyth GK, Stirzaker C, O’Donoghue SI, Clark SJ (2016). Three-dimensional disorganization of the cancer genome occurs coincident with long-range genetic and epigenetic alterations. Genome Res (IF=15) Jun;26(6):719-31 Citations: 9.
    One of the first systematic analysis of genomic 3D structure and its involvement in cancer. It interrogates the ability of cancer cells to maintain long-range interactions between distant chromosome loci, dispelling the assumption that cancer genomes lose the ability to organize higher order genomic structures.
  9. Li X., Luo OJ.,Wang P., Zheng M., Wang D., Piecuch E., Zhu JJ., Tian SZ., Tang Z., Li G., Ruan Y. (2017). Long-read ChIA-PET for base-pair resolution mapping of haplotype-specific chromatin interactions. Nature Protocol (IF=11) (In press), Citations: 0
    The paper describes an improved experimental and computational methodology to produce chromatin interaction data for inferring 3D genome organization. This is the first method to be able to identify allele-specific chromatin interactions at nucleotide resolution.
  10. Genetic correlation between amyotrophic lateral sclerosis and schizophreniaRL McLaughlin, D Schijven, W Van Rheenen, KR Van Eijk, M O’Brien, …
    Nature communications 8, 14774.