Please use this identifier to cite or link to this item:
Title: Collective analysis of multiple high-throughput gene expression datasets
Authors: Abu Jamous, Basel
Advisors: Nandi, A
Abbad, M
Keywords: Bioinformatics;Computational biology;Information engineering;Machine learning;Consensus clustering
Issue Date: 2015
Publisher: Brunel University London
Abstract: Modern technologies have resulted in the production of numerous high-throughput biological datasets. However, the pace of development of capable computational methods does not cope with the pace of generation of new high-throughput datasets. Amongst the most popular biological high-throughput datasets are gene expression datasets (e.g. microarray datasets). This work targets this aspect by proposing a suite of computational methods which can analyse multiple gene expression datasets collectively. The focal method in this suite is the unification of clustering results from multiple datasets using external specifications (UNCLES). This method applies clustering to multiple heterogeneous datasets which measure the expression of the same set of genes separately and then combines the resulting partitions in accordance to one of two types of external specifications; type A identifies the subsets of genes that are consistently co-expressed in all of the given datasets while type B identifies the subsets of genes that are consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets. This contributes to the types of questions which can addressed by computational methods because existing clustering, consensus clustering, and biclustering methods are inapplicable to address the aforementioned objectives. Moreover, in order to assist in setting some of the parameters required by UNCLES, the M-N scatter plots technique is proposed. These methods, and less mature versions of them, have been validated and applied to numerous real datasets from the biological contexts of budding yeast, bacteria, human red blood cells, and malaria. While collaborating with biologists, these applications have led to various biological insights. In yeast, the role of the poorly-understood gene CMR1 in the yeast cell-cycle has been further elucidated. Also, a novel subset of poorly understood yeast genes has been discovered with an expression profile consistently negatively correlated with the well-known ribosome biogenesis genes. Bacterial data analysis has identified two clusters of negatively correlated genes. Analysis of data from human red blood cells has produced some hypotheses regarding the regulation of the pathways producing such cells. On the other hand, malarial data analysis is still at a preliminary stage. Taken together, this thesis provides an original integrative suite of computational methods which scrutinise multiple gene expression datasets collectively to address previously unresolved questions, and provides the results and findings of many applications of these methods to real biological datasets from multiple contexts.
Description: This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London
Appears in Collections:Electronic and Computer Engineering
Dept of Electronic and Computer Engineering Theses

Files in This Item:
File Description SizeFormat 
FulltextThesis.pdf8.1 MBAdobe PDFView/Open

Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.