Genomics Virtual Laboratory

Modern genome research is a data-intensive form of discovery, encompassing the generation, analysis and interpretation of vast amounts of data against catalogues of public genomic knowledge in complex multi-stage workflows. New algorithm and tool development continues at a rapid pace to keep up with new sequencing technologies: visualisation of genomic data and public genomic catalogues is possible through a variety of mature genome browsers (UCSC Genome Browser, GBrowse, IGV and others), while analysis platforms such as Galaxy, Bioflow or GenePattern (to name a few) allow biologists with little training in programming to develop analysis workflows and launch tasks on HPC clusters.

However, the reality is that the necessary tools, platforms and data services for best practice genomics are complicated to install and customise, require dedicated servers and massive data stores, and typically involve a high level of ongoing maintenance to keep the software, data and hardware up to date. This requires significant expertise in software development, system administration, and hardware and networking, as well as access to hardware resources. These are beyond the means of all but the largest research groups.

Aims and objectives

Genome Research Computing at the University of Queensland and the Victorian Life Sciences Computation Initiative have proposed the establishment of a Genomics Virtual Laboratory (GVL) to provide rolling best practice genomics tools and data to genome researchers nationwide. Physically, the GVL will consist of local instances of centralised scalable genomic informatics platforms, data repositories and support services in research precincts housing high-throughput genomics technologies. It will provide a vehicle for collaboration, training, support and outreach, and ongoing strategic planning and strategic coordination, including the development of informed and timely applications to national agencies to upgrade and expand the facilities.

The specific objectives of the GVL will be to:

  • Provide infrastructure tailored to the unique data-intensive demands of genomics.
  • Provide a forum for researchers to collaborate and share data and workflows.
  • Coordinate with the multiple genomics groups Australia-wide, promote understanding of the unique needs of genomics and coordinate and participate in grant applications necessary to secure ongoing state, federal and international funding.
  • Provide a platform for outreach, learning and dissemination of new tools and techniques.
  • Build computational skills in existing and potential genome research groups (which include biologists, clinicians, and others who may have little or no formal training in computer programming or the use of HPC systems).
  • Involve the genome research community in defining future computational needs to help sustain and promote genomics in Australia.

Working with the LSCC (Life Sciences Computation Centre), VeRSI will contribute to the GVL by assisting with the implementation and customisation of a genomics workflow platform on the NeCTAR research cloud.

Outcomes

Researchers and bioinformaticians would benefit from a GVL in several equally important and complementary ways:

  • ‘Reduced entry’ best practice genomics: currently best practice typically requires significant expertise in programming, scripting and data management, and investment in understanding state-of-the-art analysis techniques. The GVL is intended to provide a central resource of hardware, software and human expertise to allow researchers to focus on the biological interpretation of genomic data rather than the details of technical analysis.
  • Integration of analysis tools, public datasets and visualisation platforms, streamlining research and reducing time from experiment to publication.
  • Scalability through infrastructure: implementation of the GVL on scalable infrastructure such as the NeCTAR Research Cloud simplifies the scaling of analysis in response to rapidly rising numbers of genomics datasets of increasing size.
  • Enhanced collaboration between researchers and across the community through shared datasets, workflows and customised toolsets.
  • Reproducibility and research provenance: workflow platforms record all aspects of an experiment, allowing for confidence in repeatability and for the publication of workflows along with the resulting data.

VeRSI thanks Clare Sloggett of VLSCI for her contribution to this project summary


Project details

ID number  UOB/P/010

Project title  VLSCI Life Science Computation Centre

Start date  February 2011 End date  June 2012

Lead institute  VLSCI

Principal investigator  Prof Andrew Lonie Head LSCC

Partner PIs and/or participating institutions  Dr. Nathan Hall – Bioinformatician

Prof Justin Zobel – High Throughput Genomics Theme Leader

Partner sponsor  The University of Melbourne

VeRSI executive sponsor  Dr Ann Borda VeRSI Executive Director

VeRSI project management  Jared Winton

    

Keywords: VLSCI | Galaxy | VeRSI | LSCC | Bioinformatics | NeCTAR | GVL | Genomics | Virtual Laboratory | Research | Data | Life Science | Genome | Informatics | Collaboration | Training | Repositories