Please activate JavaScript!
Please install Adobe Flash Player, click here for download

ISCB2014_abstract_book

ISCB 2014 Vienna, Austria • Abstracts - Oral Presentations 59Wednesday, 27th August 2014 • 9:00-10:48 Monday25thAugustTuesday26thAugustThursday28thAugustAuthorIndexPostersWednesday27thAugustSunday24thAugust Wednesday, 27th August 2014 – 9:00-10:48 Contributed sessions C31 Variable selection in high-dimensional models C31.1 Student Conference Award An extension of the lasso penalization to reduce false positive selection in high-dimensional Cox models N Ternès1,2 , F Rotolo1 , S Michiels1,2 1 Institut Gustave Roussy, Department of Biostatistics and Epidemiology, Villejuif, France, 2 University Paris Sud, Le Kremlin- Bicêtre, France   Introduction: An increasing interest is being devoted to select the right prognostic biomarkers among multiple candidates. Regression with LASSO penalization is a popular variable selection method, but strongly depends on the penalization parameter λ. Usually, λ is chosen via maxi- mum cross-validated log-likelihood (max-cvl). Yet, this choice often de- tects too many false positives. Methods: We propose an AIC-like penalized cvl (pcvl), a function of the number of non-null regression parameters in the model, trading off the goodness of fit (small λ) and the parsimony of the model (big λ). According to this extension, the optimal λ is greater or equal to the max-cvl one, and selects fewer biomarkers. We evaluate the false discovery (FDR) and false negative rate (FNR) in a simulation study by varying sample size, number and prevalence of bi- nary biomarkers, number of active biomarkers, correlation and censoring. Finally, we apply these methods on two publicly available mutation and gene expression data sets in non-small cell lung cancer from The Cancer Genome Atlas database. Results: In null scenarios (i.e. no active biomarker), no difference was ob- served between the two methods in terms of FDR, however, pcvl selected on average fewer biomarkers. In alternative scenarios, the FDR was sys- tematically lower for pcvl. The FNR was low and comparable for both methods, although slightly higher for the pcvl with a small sample size and a high number of active and non-active biomarkers. Conclusion: Maximum pcvl yields much less false positive biomarkers with lasso penalization in high-dimensional Cox regression models.   C31.2 Biomarker discovery: controlling false discoveries in high dimensional situations B Hofner1 1 FAU Erlangen-Nürnberg, Erlangen, Germany   Modern biotechnologies often result in high-dimensional data sets with much more variables than observations (n ≪ p). These data sets pose new challenges to statistical analysis: Variable selection becomes one of the most important tasks in this setting. Recently, Meinshausen and Bühlmann (JRSSB, 2010) proposed a flexible framework for variable selec- tion called stability selection, which was refined by Shah and Samworth (JRSSB, 2013). By the use of resampling procedures, stability selection adds a finite sam- ple error control to high dimensional variable selection procedures such as Lasso or boosting. We consider the combination of boosting and stability selection and pres- ent results from a detailed simulation study that presents insights on the usefulness of this combination. Limitations will be discussed and guid- ance on the specification and tuning of stability selection will be given. The results will then be used for the detection of metabolic biomarkers for autism. All methods are implemented in the R package mboost (http://cran.r-proj- ect.org/package=mboost). C31.3 Deviance residuals based sparse PLS and sparse kernel PLS regression for censored data P Bastien1 , F Bertrand2 , N Meyer3 , M Maumy-Bertrand2 1 L’Oreal R&I, Aulnay, France, 2 CNRS, Université de Strasbourg, Strasbourg, France, 3 INSERM, Faculté de Médecine, Strasbourg, France   There has been a vast literature in the last decade devolved to relating gene expression profiles to subject survival or to time to cancer recur- rence. The proportional hazard regression model suggested by Cox, 1972, to study the relationship between the time to event and a set of covariates in the presence of censoring is the model most commonly used for the analysis of survival data. However, like multivariate regression, it supposes that there are more ob- servations than variables, complete data, and variables not strongly cor- related between them. In practice when dealing with high-dimensional data, these constraints are crippling. Collinearity gives rise to issues of over-fitting and model mis-identification. Variable selection can improve the estimation accuracy by effectively identifying the subset of relevant predictors and enhance the model interpretability with parsimonious representation. In order to deal with both collinearity and variable selec- tion issues, many methods based on Lasso penalized Cox proportional hazard have been proposed since the seminal paper of Tibshirani, 1997. Regularization could also be performed using dimension reduction as is the case with PLS regression. We propose two original algorithms named sPLSDR and its non linear kernel counterpart DKsPLSDR, by using sparse PLS regression (sPLS) based on deviance residuals.We compared their pre- dicting performance with state of the art algorithms based on reference benchmarks and simulated datasets. Results: sPLSDR and DKsPLSDR compare favorably with other methods in their computational time, prediction and selectivity. The R-package plsR- cox is available on the CRAN and maintained by Frédéric Bertrand.   C31.4 Weibull regression with Bayesian variable selection to identify prognostic biomarkers of breast cancer survival PJ Newcombe1 , H Raza Ali2,3,4 , FM Blows5 , E Provenzano6 , PD Pharoah4,5,7 , C Caldas2,4,5 , S Richardson1 1 MRC Biostatistics Unit, Cambridge, United Kingdom, 2 Cancer Research UK, Cambridge, United Kingdom, 3 Department of Pathology, University of Cambridge, Cambridge, United Kingdom, 4 Cambridge Experimental Cancer Medicine Centre, Cambridge, United Kingdom, 5 Department of Oncology, University of Cambridge, Cambridge, United Kingdom, 6 NIH Cambridge Biomedical Research Centre, Cambridge, United Kingdom, 7 Strangeways Research Laboratory, Cambridge, United Kingdom   As large, data-rich medical datasets are becoming routinely collected, there is a growing demand for regression methodology that facilitates fea- ture selection over a large number of predictors. Bayesian variable selec- tion algorithms offer an attractive solution, whereby a sparsity inducing prior allows inclusion of sets of predictors simultaneously and inference of those which are most important. Since predictors are included simulta- neously, effect estimates are adjusted for one another and issues around multiple testing are avoided. Furthermore, uncertainty in the subset of im- portant predictors and their effect estimates is naturally captured.

Pages Overview