Cloud based computing technologies for genomic medicine

Yang, Andrian (2019) Cloud based computing technologies for genomic medicine. PhD thesis, Victor Chang Cardiac Research Institute & St Vincent's Clinical School, Faculty of Medicine, UNSW.

[thumbnail of Andrian Yang 2019 PhD Thesis VCCRI UNSW.pdf]
Preview
Text
Andrian Yang 2019 PhD Thesis VCCRI UNSW.pdf

Download (26MB) | Preview

Abstract

Recent advances in single-cell RNA-sequencing (scRNA-seq) methods have enabled the study of cellular heterogeneity at the single-cell resolution. However, current tools for processing and analysing RNA-seq data are not equipped to handle the large amount of data generated in single-cell studies. With the exponential growth in the number gene expression profiles generated by scRNA-seq methods, there is a need to develop scalable tools for large-scale data analysis and interpretation. In this thesis, I report several new scalable bioinformatics methods that I have developed for the analysis of scRNA-seq data:1. Falco - a new cloud-based framework for processing of large-scale scRNA-seq data. Falco utilises standard Big Data frameworks such as Apache Hadoop and Apache Spark to enable scalable data analysis. The Falco framework is designed to perform read processing, alignment, gene expression quantification, and transcript reconstruction - all in a parallel and distributed manner. We demonstrated Falco’s scalability using real data sets, with Falco achieving a speed up of 1.7x to 145x compared to single-node execution. Falco also allows for cost efficient analysis, providing savings of up to 65%.2. Scavenger - a new pipeline to recover false negative, non-aligned reads in RNA-seq data. Scavenger utilises a novel mechanism for the recovery of such reads based on similarity with aligned reads. Using real data, we demonstrated how Scavenger is able to recover a good portion of non-aligned reads and how reads recovered have more variance compared to aligned reads. Genes with substantial increase in expression after recovery are typically lowly-expressed genes and are enriched for pseudogenes, suggesting that the expression of pseudogenes may be under-reported.3. Starmap - a new tool for visualisation of scRNA-seq data to help with the exploration and interpretation of the large amount of data. Starmap combines two visual paradigms, the 3D scatter plot and the star plot, to allow visualisation of both the high level structure of the data and the cell-level features. Starmap is designed to be cross-platform and supports an immersive mode which allows for visualisation using low-cost VR headsets.

Item Type: Thesis (PhD )
Additional Information: SUPERVISORS: Ho, Joshua, Victor Chang Cardiac Research Institute, Faculty of Medicine, UNSW; Suter, Catherine, Victor Chang Cardiac Research Institute, Faculty of Medicine, UNSW
Subjects: R Medicine > R Medicine (General)
Depositing User: Repository Administrator
Date Deposited: 16 Apr 2019 23:27
Last Modified: 16 Apr 2019 23:27
URI: https://eprints.victorchang.edu.au/id/eprint/827

Actions (login required)

View Item View Item