Please activate JavaScript!
Please install Adobe Flash Player, click here for download

ISCB2014_abstract_book

40 ISCB 2014 Vienna, Austria • Abstracts - Oral PresentationsTuesday, 26th August 2014 • 9:00-10:30 Monday25thAugustTuesday26thAugustThursday28thAugustAuthorIndexPostersWednesday27thAugustSunday24thAugust Contributed sessions C19 Development of prediction models C19.1 Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): the TRIPOD statement GS Collins1 , JB Reitsma2 , DG Altman1 , KG Moons2 1 University of Oxford, Oxford, United Kingdom, 2 UMC Utrecht, Utrecht, The Netherlands Prediction models are developed to aid healthcare providers in estimating the probability that a specific outcome or disease is present (diagnostic models) or will occur in the future (prognostic models), to inform their decision-making. Clinical prediction models are abundant in the medical literature. Some disease areas show an overwhelming number of competing pre- diction models (sometimes even >100) for the same outcome or target population. Only when full information on all aspects of a prediction mod- el study are clearly reported can risk of bias and potential usefulness of the prediction model be adequately assessed. Many reviews have shown that the quality of published reports on the development, validation and updating of prediction models, is very poor. The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Initiative therefore developed a set of consensus-based rec- ommendations for the reporting of studies developing, validating or updating a prediction model, whether for diagnostic or prognostic pur- poses. The development was based on systematic reviews of the litera- ture, web-based surveys and a 3-day expert meeting among methodolo- gists, healthcare professionals and journal editors. The TRIPOD checklist includes 22 items deemed essential for transparent reporting of a predic- tion model study. The development and contents of the TRIPOD checklist will be presented and illustrated, along with empirical evidence and rationale for their in- clusion. The TRIPOD statement intends to improve the transparency and completeness of reporting of studies that report the development, valida- tion, or updating of a diagnostic or prognostic prediction model. C19.2 The multi-split testing approach for choosing between 2 prediction strategies P Blanche1 , M van de Wiel2 , TA Gerds1 1 University of Copenhagen, Department of Biostatistics, Copenhagen, Denmark, 2 VU University, Dept. of Epidemiology & Biostatistics, Amsterdam, The Netherlands Due to a growing interest in personalized medicine, the demand for new prediction tools is currently strongly increasing. While numerous works have proposed promising statistical models and strategies that develop prognostic tools, in practice, it remains challenging to choose among them. For choosing between 2 prediction strategies, a commonly applied strat- egy is to split the data once into two data sets: a “learning sample”, used to train the 2 prediction tools, and a “validation sample” used to compare them. Unfortunately, the results usually depend strongly on how the data were split. Recently, van de Wiel et al. (Biostatistics, 2009) proposed a test based on multiple splits of the data. The key idea of the method is to ag- gregate the p-values obtained by several different random splits, to obtain a conclusion that does not depend on the choice of any specific split. From a practical point of view, the strengths of the approach are its com- putational ease and universality, enabling one to compare arbitrary pre- diction strategies. It is also general with respect to the prediction accuracy criterion, and thus extensions to right censored data and situations with competing risks are readily available, as is shown in this talk. We provide new insights regarding type one error control and power of the original testing procedure and also discuss how to test alternative hy- potheses. The ideas are motivated and illustrated by a real data analysis of cardiovas- cular risk prediction models.   C19.3 Cancelled The impact of events per variable on the predictive performance of the Cox model EO Ogundimu1 , DG Altman1 , GS Collins1 1 University of Oxford, Oxford, United Kingdom Sample size requirements for developing multivariable prediction mod- els using Cox regression are routinely based on the rule of thumb derived from simulation studies ranging from a minimum of 5 to 20 events per variable (EPV). However, a common design feature of these simulation studies is the small sample size and limited scenarios, and that only one binary predictor was included in the models.The effects of multiple binary predictors with varying degrees of prevalence, reflecting clinical practice, have not been investigated. Furthermore, emphasis in these studies has focussed on the accuracy and precision of regression coefficients, and not on the predictive accuracy of the fitted model, which ultimately character- ises the predictive ability of the model. We therefore conducted extended simulation studies using a large gen- eral practice dataset (THIN), comprising over 2 million anonymised patient records to examine the sample size requirements for prediction models developed using Cox regression. Investigating both fully specified mod- els and models derived using variable selection, we examine the stability and precision of regression coefficients and their impact on the apparent model performance (e.g. c-index, D-statistic, R2 ) as well as subsequent per- formance in an external validation dataset. We also present results exam- ining models containing low prevalence binary predictors and the impact in terms of sample size on the predictive accuracy of the model. We will demonstrate that more events are needed to achieve precise mea- sures of predictive accuracy in situations where `many´ low prevalence binary predictors are included in the model. C19.4 The number of events per variable needed to build logistic prediction models in clustered data: a simulation study L Wynants1,2 , W Bouwmeester3 , S Van Huffel1,2 , B Van Calster4 , Y Vergouwe5 1 KU Leuven Dept of Electrical Engineering / ESAT-STADIUS, Leuven, Belgium, 2 iMinds Medical Information Technologies, Leuven, Belgium, 3 MediQuest, Utrecht, The Netherlands, 4 KU Leuven Dept of Development & Regeneration, Leuven, Belgium, 5 Public Health Dept, Erasmus MC, Rotterdam, The Netherlands Researchers increasingly combine data from several centers to develop clinical prediction models for diagnosis or prognosis. Guidelines for the required sample size of such multicenter studies are lacking. We studied the impact of the number of events per variable (EPV) on the estimation of regression coefficients and the performance of the resulting prediction model. We performed a simulation study to investigate the influence of the amount of clustering (the intraclass correlation or ICC), backward vari- able selection, the number of centers, center size, and the total sample size. A high EPV increased the accuracy of the regression estimates and the performance of the prediction model, while the ICC did not meaningfully influence estimation or performance. In addition to EPV, also the total

Pages Overview