Please activate JavaScript!
Please install Adobe Flash Player, click here for download

ISCB2014_abstract_book

ISCB 2014 Vienna, Austria • Abstracts - Oral Presentations 35Monday, 25th August 2014 • 16:00-17:30 Monday25thAugustTuesday26thAugustThursday28thAugustAuthorIndexPostersWednesday27thAugustSunday24thAugust the gene sequence. Generalization methods based on resampling such as Bootstrap and Cross Validation methods were considered. The regression methods used are Support Vector Regression (SVR) and Decision Trees Regression (DTR). As a result of simulation studies, actual performance of the regression techniques for gene data was approximated by use of Bootstrap and Cross Validation methods. Overall, when results are examined for each simulation senario, it appears that the bootstrap method yields a lower error of estimation than Cross Validation. C16.2 An application of sequential meta-analysis to gene expression studies PW Novianti1 , VL Jong1,2 , I van der Tweel1 , KCB Roes1 , MJC Eijkemans1 1 University Medical Center Utrecht, Utrecht, The Netherlands, 2 Erasmus Medical Center, Rotterdam, The Netherlands Most of the discoveries from gene expression data are driven by a single study, claiming an optimal subset of genes that play a key role in a specific disease. Results from a differentially expressed genes (DEGs) analysis may be used in drug development. An optimal new drug that is based on the results of the DEGs analysis is potentially hard to achieve, due to false- positive findings. Meta-analyzing the available datasets potentially helps in getting con- cordant results so that a real life application may be more successful. Sequential meta-analysis (SMA) is an approach for combining studies in chronological order by preserving the type I error and pre-specifying the statistical power to detect a given effect size. This study focuses on the application of SMA (following Whitehead’s triangular test boundaries ap- proach) to find gene expression signatures across microarray experiments in acute myeloid leukemia (AML). Seven raw datasets on AML patients versus healthy controls fulfilled our predefined search criteria and were downloaded from the ArrayExpress repository. The boundaries in the tri- angular test were constructed for a pre-specified effect size θR =0.8, a type 1 error α=0.5% and the power 1-β=80% . The between-study variance was estimated by the Paule-Mandel method.We found 169 DEGs, based on the cumulative information of the seven experiments. Meanwhile, Bonferroni correction of α=5% to α=0.0007% yielded 24 DEGs. This study shows whether there is enough evidence at a certain time point to draw a conclusion for a particular gene or to hold the conclusion until the evidence is adequate. C16.3 Ensemble classifiers in the high-dimensional setting with class-imbalanced data R Blagus1 , L Lusa1 1 University of Ljubljana, Ljubljana, Slovenia The goal of biomedical studies is often to develop a rule (classifier) to pre- dict the class-membership of new samples based on the values of some measured variables. Boosted classifiers combine the votes of a base clas- sifier trained on modified versions of the training data; typically boost- ing improves the accuracy of the base classifier and reduces its variance. However, the usefulness of boosting remains questionable when data are high-dimensional, where the number variables greatly exceeds the num- ber of samples. We consider AdaBoost.M1, gradient boosting and logistiboost and use classification trees as base classifiers. On simulated and real high-dimen- sional data boosting algorithms often do not improve upon their base classifier; the best performance is achieved by stochastic gradient boost- ing, while AdaBoost.M1 and gradient boosting can perform very poorly with small samples. We propose a straightforward, yet efficient, modification of the AdaBoost. M1 algorithmthat can perform well also in these settings. It is known that high-dimensionality exacerbates the class-imbalance bias, where most samples are assigned to the majority class unless the differ- ences between the classes are large; so far the performance of boosting on imbalanced high-dimensional data was not investigated. Our results show that boosting can increase the class imbalance bias of its base clas- sifier. We show that this problem can be avoided by using boosting on previously down-sized training set, or by using more complex ensembles that combine boosting with bootstrap aggregating.   C16.4 Combining techniques for screening and evaluating interaction terms on high-dimensional time-to-event data I Hoffmann1 , M Sariyar2 , H Binder1 1 Universitätsmedizin Mainz, Johannes-Gutenberg Universität, Mainz, Germany, 2 Universitätsmedizin Berlin, Charité, Berlin, Germany When linking high-dimensional molecular covariates to some clinical endpoint, e.g., when using gene expression measurements for prognosis, sparse regression techniques are destined to provide a short list of mar- ginal or main effects. While interactions are highly likely to be present in molecular applica- tions, it is still very challenging to identify interactions terms that should be considered together with potential main effects for predicting a clinical outcome. Additionally it is well known that gene expression data is highly correlat- ed. To address this, we present a strategy based on the combination of a regularized regression approach for fitting prognostic models, and differ- ent approaches for interaction screening. We specifically consider componentwise likelihood-based boosting to se- lect main effects for a prognostic model in a time-to-event setting. Random survival forests and logic regression are considered for preselect- ing the potential interaction terms. [h1] Specifically, the screening step considers permutation accuracy impor- tance and pairwise inclusion frequencies. The benefits and limits of the different interaction screening approaches are evaluated in a simulation study with respect to prediction perfor- mance and sensitivity concerning main effects and interactions. We consider scenarios with different relative main effect and interaction effect sizes, and with different correlation structures. The proposed strategy for interaction screening and prognostic model building is further illustrated with gene expression data from patients with diffuse large B-cell lymphoma.   C16.5 Comparing models of location and scale for genome-wide DNA methylation data S Wahl1 , N Fenske2 , S Zeilinger1 , K Suhre1,3 , C Gieger1 , A Peters1 , M Waldenberger1 , H Grallert1 , M Schmid2,4 1 Helmholtz Zentrum München, Neuherberg, Germany, 2 Ludwig- Maximilians-Universität München, Munich, Germany, 3 Weill Cornell Medical College in Qatar, Doha, Qatar, 4 Rheinische Friedrich- Wilhelms-Universität, Bonn, Germany With the help of methylome-wide association studies, increasing knowl- edge on the role of DNA methylation in disease processes is obtained. In terms of statistical analysis, specific challenges arise from the characteris- tics of methylation data. First, they represent proportions with skewed and heteroscedastic distributions. Traditional strategies assuming a normally distributed response might therefore be inappropriate. Second, recent evidence suggests that not only mean differences but also variability in site-specific DNA methylation

Pages Overview