Please activate JavaScript!
Please install Adobe Flash Player, click here for download

ISCB2014_abstract_book

82 ISCB 2014 Vienna, Austria • Abstracts - Oral PresentationsWednesday, 27th August 2014 • 16:00-17:30 Monday25thAugustTuesday26thAugustThursday28thAugustAuthorIndexPostersWednesday27thAugustSunday24thAugust C46.3 Comparison of methods for imputing limited-range variables: a simulation study L Rodwell1,2 , KJ Lee1,2 , H Romaniuk1,2 , JB Carlin1,2 1 Murdoch Children’s Research Institute, Melbourne, Australia, 2 The University of Melbourne, Melbourne, Australia   Multiple imputation (MI) was developed to enable valid inferences in the presence of missing data rather than to re-create the missing values. Within the applied setting, it remains unclear how important it is that im- puted values should be plausible. One variable for which MI may lead to implausible values is a limited-range variable, where imputed values may fall outside the observable range. The aim of this work was to compare methods for imputing limited-range variables. We consider three variables, based on different scoring methods of the General Health Questionnaire (GHQ). These variables resulted in three continuous distributions with mild, moderate and severe positive skew- ness. In an otherwise complete dataset, we set 33% of the GHQ observa- tions to missing at random; creating 1000 datasets with incomplete data. We imputed values on the raw scale and following transformation us- ing: regression with no rounding; post-imputation rounding; truncated normal regression; and predictive mean matching. We estimated the marginal mean of the GHQ and the association between the GHQ and a fully observed binary outcome, comparing the results with complete data statistics. Imputation with no rounding performed well when applied to the raw scale data. Post-imputation rounding and truncated normal regression produced higher marginal means for data with a moderate or severe skew. Predictive mean matching produced under-coverage of the complete data estimate. For the association, all methods produced similar estimates. For highly skewed limited-range data, MI techniques that restrict the range of imputed values can result in biased estimates for the marginal mean.   C46.4 Validation of prediction models based on lasso regression with multiply imputed data JZ Musoro1 , AH Zwinderman1 , MA Puhan2 , G ter Riet1 , RB Geskus1 1 Academic Medical Center of Amsterdam, Amsterdam, The Netherlands, 2 Institute for Social and Preventive Medicine, Zurich, Switzerland   Background: In prognostic studies, the lasso technique is attractive since it improves the quality of predictions by shrinking regression coef- ficients, compared to predictions based on a model fitted via unpenalized maximum likelihood. Since some coefficients are set to zero, parsimony is achieved as well. It is unclear whether the performance of a model fit- ted using the lasso still shows some optimism. Bootstrap methods have been advocated to quantify optimism and generalize model performance to new subjects. It is unclear how resampling should be performed in the presence of multiply imputed data. Method: The study data were based on a cohort of Chronic Obstructive Pulmonary Disease (COPD) patients. We constructed models to predict Chronic Respiratory Questionnaire (CRQ) dyspnea 6 months ahead. We in- vestigated optimism of the lasso model, and compared three approaches of handling multiply imputed data in the bootstrap procedure, using the study data and simulated data sets. Results: The discriminative model performance of the lasso was optimis- tic. There was suboptimal calibration due to over-shrinkage. The estimate of optimism was sensitive to the choice of handling imputed data in the bootstrap resampling procedure. Conclusion: Performance of prognostic models constructed using the lasso technique can be optimistic as well. Resampling in the presence of multiply imputed data should be performed such that a bootstrap sample selects the same subjects across the imputed data sets, which should dif- fer solely by the imputed values, not by the individuals. C46.5 Impact of incomplete follow-up when exploring associations between baseline characteristics and outcome in a longitudinal study S Crichton1 , C Wolfe1 , J Peacock1 1 King’s College London, London, United Kingdom   Aim: To assess the impact of missing data when identifying predictors of poor outcome after stroke. Methods: Data were extracted from South London Stroke Register (N=3617) which collects data at onset, 3 months and annually after stroke. Outcomes are assessed using the Barthel index (categorised as indepen- dent, mildly, moderately or severely disabled), Frenchay Activities Index (active, slightly active, inactive) and the Hospital Anxiety and Depression scale. Follow-up rates are typically 60-70%. Models, with varying missing data assumptions, were applied to explore relationships between base- line characteristics and outcomes up to 5 years after stroke. These were Generalised Estimating Equations (GEEs) (assuming missing completely at random data), weighted GEEs (WGEE), GEE combined with multiple impu- tation (MI-GEE), and multi-level mixed-effects models (all assume missing at random). GEE and mixed-effect estimates were compared to appropri- ate shared parameter and pattern mixture models, which allow for miss- ing not at random data. All models for binary outcomes were logistic, pro- portional odds models used for activity level and multinomial for disability level (as proportional odds assumptions were violated). Results: In univariable and multivariable models for anxiety and de- pression the same factors were consistently identified as significant. Population averaged effect sizes were comparable across models esti- mated using GEE’s as were subject specific effect sizes from mixed-effects models. GEE,WGEE and MI-GEE models for disability and activity level pro- duced similar results. Findings from other disability and activity models will also be compared. Conclusions: Missing data appears to have limited impact when looking at associations between baseline and post-stroke outcomes.   C47 Special types of censored data C47.1 Analysing disease recurrence with missing at risk information M Pohar Perme1 , T Štupnik2 1 University of Ljubljana, Ljubljana, Slovenia, 2 University Clinical Center, Ljubljana, Slovenia   When analysing time to disease recurrence, we sometimes stumble over data where we are certain that we have all information on recurrence, but do not know whether the studied patients are still alive. This may happen with diseases of benign nature where patients are only seen at recurrences or in poorly designed national registries with insufficient patient identi- fiers to obtain their dead/alive status. When the average time to disease recurrence is long enough in comparison to the expected survival of the patients, the statistical analysis of such data may be significantly biased. Under the assumption that the expected survival of an individual is not influenced by the disease itself, we try to reduce this bias by using the general population mortality tables. We show why the intuitive solution of simply censoring the patients with their expected survival time does not give unbiased estimates and pro- vide an alternative framework that allows for unbiased estimation of the usual quantities of interest in survival analysis. Our results are supported by simulations and real data examples.

Pages Overview