Please activate JavaScript!
Please install Adobe Flash Player, click here for download

ISCB2014_abstract_book

ISCB 2014 Vienna, Austria • Abstracts - Oral Presentations 75Wednesday, 27th August 2014 • 14:00-15:30 Monday25thAugustTuesday26thAugustThursday28thAugustAuthorIndexPostersWednesday27thAugustSunday24thAugust C42 Poly-omics studies & Systems Biology C42.1 A stratified boosting approach for combining gene expression measurements from different platforms to identify prognostic markers J Mazur1 , I Zwiener1,2 , H Binder1 1 University Medical Center Mainz, IMBEI, Mainz, Germany, 2 Merck KGaA, Darmstadt, Germany   Development of gene expression risk prediction signatures in a survival setting is typically severely constrained by the number of samples. A natural approach which analyzes several data sets simultaneously is a pooled analysis of samples. However, gene expression studies are often performed on different platforms, like RNA-Seq and microarrays, such that direct pooling of individual patient data is not possible anymore. To still be able to combine gene expression studies, we propose a strati- fied boosting approach for regularized estimation of Cox regression mod- els. For every study, i.e. every stratum, a componentwise likelihood-based boosting algorithm is performed where the variable that is updated in each step is the one where the score statistic is the largest across studies. For evaluation, the prediction performance of our stratified boosting ap- proach is compared to the prediction performance of the pooled analysis for simulated data. Additionally, for simulated data, we quantify the per- formance with respect to identifying important genes for our stratified boosting approach, the pooled analysis and a setting where only gene lists but not the data itself is available for the gene expression studies. Finally, we apply our approach to RNA-Seq and gene expression microar- ray data from kidney clear cell carcinoma patients. The results indicate that our newly proposed stratified boosting approach performs close to the pooled analysis where the latter is feasible, and in addition makes it possible to combine gene expression studies from dif- ferent molecular platforms. C42.2 Weighted penalized canonical correlation analysis to integrate multiple omics-data A Zwinderman1 1 Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands   To integrate omics-data from multiple platforms and to integrate this with phenotypic data, we suggested canonical correlation analysis. Since omics-data are usually high dimensional, we suggested a penal- ized version (PCCA). We used the elastic net because this method is ca- pable to perform variable selection but also groups correlated variables (which may represent biological pathways) (Waaijenborg & Zwinderman 2009,2010,2011). To associate the multiple platforms and clinical data with each other we maximized the sum of the multiple correlations between the canonical variates of each platform. Optimal penalty parameters were estimated by k-fold crossvalidation using a grid search and we optimized the absolute mean difference between the canonical correlations between the training and test sets.We now extended PCCA with a weighting scheme to account for (causal) direction in the association analysis. Such causal pathway is useful, for instance, when integrating genomewide SNP/DNA sequence data with genomewide methylation and expression data or with pro- teome/metabolome data. In addition the weighting schemes may also be used to search specifically for cis-regulatory elements, either located physically-close to a particular gene or located metaphysically-close to a particular protein in a biologi- cal/metabolic pathway.We illustrate the weighted PCCA approach by ana- lyzing the associations between 700K SNPs, 200K CNVs, beta-methylation values of 450K CpG-sites, 20K gene expression values and 100 phenotypes measured in 237 patients with Marfan syndrome. We used a weighting scheme to test for the expectation that the phe- notypic variation is influenced by SNPs, CNVs and methylation data only through the gene-expression values.   C42.3 Prediction performance as a measure for optimal mapping of methylation and RNA-Seq data A Gerhold-Ay1 , J Mazur1 , H Binder1 1 IMBEI, Universitätsmedizin Mainz, Mainz, Germany   Next-generation sequencing and microarray data are becoming more and more important for medical research. They enable us to develop gene signatures for prediction of clinical endpoints like death, via the integra- tion of the information present in RNA-Seq data on gene expression and methylation data on CpG sites. This still has the challenge which CpG sites should be considered as being related to one specific gene. Our aim is to investigate how the prediction performance measure can be used as a measure for optimality to find the mapping of CpG sites to their related genes. To find the optimal mapping for methylation to gene information, we define a length of nucleotides around all genes, which we call a window around these genes. In a two-step approach, we first use a likelihood- based componentwise boosting approach to estimate a gene signature only with RNA-Seq data. In the following step, the methylation data of the CpG sites that are falling in this window are used to estimate a new sig- nature. For finding prognostic signatures, RNA-Seq and methylation data of kidney tumor patients are used. We analyze different window sizes for the mapping and show that they have an effect on the prediction perfor- mance with respect to the clinical endpoint. Prognostic gene signatures can be a powerful tool for the classification of cancer patients. To underpin this tool, we propose the prediction per- formance measure as a criterion to find the optimal mapping window for RNA-Seq and methylation data and show its usefulness. C42.4 Integration of somatic mutation, gene expression and functional data in predicting human breast cancer survival C Suo1 , D Lee2 , D Saputra1 , H Joshi1 , S Pramana1,3 , S Calza1,4 , Y Pawitan1 1 Dept of Medical Epi and Biostatistics, Karolinska Institutet, Stockholm, Sweden, 2 Dept of Statistics, Ewha Womans University, Seoul, Korea, Republic of, 3 Institute of Statistics, Jakarta, Indonesia, 4 University of Brescia, Brescia, Italy   Whole-genome and transcriptome sequencing experiments can be used to explore the understanding of human cancers comprehensively. The Cancer Genome Atlas breast cancer consortium provides a unique data structure by sequencing from sixty matched tumor and normal sample of the same female patient diagnosed with breast invasive carcinoma, allow- ing us to accurately infer somatic mutations and isoform-level expression. However, it is not immediately obvious how to subsequently construct and integrate the complex network of the diverse signatures discovered, owing to a lack of mature statistical tools. The fundamental challenges also lie in identifying patient-specific mutational event contributing to the heterogeneity pattern between tumors and translating the findings into clinically relevant aspects. We propose a novel method to integrate genomic and transcriptomic pro- files based on network enrichment analyses, revealing statistical evidence of the functional implications of the biomarkers found between- and within-patients. We develop a weighted driver gene score summarizing the mutated driver genes that are common across patients and those that are patients-specific. To contribute to the driver gene score, a gene has to

Pages Overview