Please activate JavaScript!
Please install Adobe Flash Player, click here for download

ISCB2014_abstract_book

112 ISCB 2014 Vienna, Austria • Abstracts - Poster PresentationsTuesday, 26th August 2014 • 10:30-11:00 Monday25thAugustTuesday26thAugustThursday28thAugustAuthorIndexPostersWednesday27thAugustSunday24thAugust As to our knowledge, there is no specific topology-based pathway analy- sis method for RNA-Seq data. Here, we present simple adaptations of the topology-based methods for RNA-Seq data and compare their ability to identify differentially expressed pathways on the example of real data. As a model we chose colorectal cancer (CRC), where we compare microsatel- lite instable (MSI) and microsatellite stable (MSS) tumors, which have dis- tinct prognosis and specific transcriptional activity. We also compare the performance of our adapted methods applied on RNA-Seq data with the original methods applied on microarray data. To this end we use four publicly available datasets from both The Cancer Genome Atlas (TCGA) database and The Gene Expression Omnibus (GEO) and discuss the number of identified pathways, the ranks of the pathways and the overlaps between individual methods. P2.5.116 Contribution of alternative splicing variants to gene expression variation W Ouwerkerk1 , AH Zwinderman1 1 Amsterdam Medical Center, Amsterdam, The Netherlands   Alternative splicing of messenger RNAs provides cells with the opportu- nity to create protein isoforms of a multitude of functions from a single gene by excluding or including exons during post-transcriptional process- ing. Reconstructing the contribution of these splicing variants to the total amount of gene expression remains difficult. We introduced a probabilistic model of the alternative splicing reconstruc- tion problem using a finite mixture model, and provide a solution based on the maximum likelihood principle. Our model is based on the assump- tion that the expected expression level of exons in a particular splicing variant is the same for all exons in that variant but allows for measurement error. In this model the expression (Y) of patient i can be written as a weighted sum of the number of splicing variants, K, mixture multivariate Gaussian densities: f(Yi )=∑ Pk ×gk (Yi |θk ) {k=1,…,K}.The kth variant can be described by the factor Zjk . Where Zjk =1 or 0 depending on if exon j is included or excluded. We estimated parameters θk of the mixture Gaussian densities by maximizing the total likelihood using a Nelder and Mead optimization algorithm in R. We applied this model to three genes (SLC2A10,TGFβR2 and FBN1) associ- ated with marfan‘s syndrome in gene/exon expression data of 63 patients with Marfan‘s syndrome. We compared the likelihood, AIC and BIC of 5 scenario‘s: Normal Mixture Modeling estimated by Mclust, known splicing variants, no splicing variation, all possible variants, which existed of 25 , 29 , 265 possible splicing variants, for SLC2A10, TGFβR2 and FBN1.   P2.5.120 Multi-purpose SNP selection method in genetic association study M Park1 1 Eulji University, Daejeon, Republic of Korea   Recent development of high-throughput technologies in biology has re- sulted in the production of huge amount of data. In genetic association study, those are characterized by thousands of SNPs with small number of samples, which could cause the“large p, small n”problem. For this rea- son, single marker-based analysis is commonly adopted in many stud- ies despite of various merits of the joint analysis of multiple markers. Existence of the redundant SNPs may also bring about many problems in further analysis. Therefore, it is necessary to eliminate the near-redundant SNPs and hence to determine the subset of SNPs that should be included in the joint analysis. In this study, we propose an unsupervised SNP selection algorithm based on the principal variable method. Minimum trace of partial variances of the unselected SNPs unexplained by selected SNPs is used as criterion. The resulting subset of SNPs could be used for further analysis on mul- tiple purposes. This method is illustrated with real genotype datasets.   P2.5.138 Measurement Error in GWAS: what have we missed? RCA Rippe1 1 Institute of Education and Child Studies, Leiden University, Leiden, The Netherlands   Genetic associations with any behavior or disease are commonly found using using Genome Wide Association Analysis, among other methods. Biology predicts that single as well as multi-locus effects do exist but are generally very small (Davis et al, 2010, Vinkhuyzen et al, 2012). While GWAS was introduced as promising methodology, the amount of significant empirical results seems to be less impressive. Underlying this observation is the possible influence of measurement error in both the outcome as well as (co)variates. Imperfections can be due to the use of (self report) questionnaires (Hofstee, 1994; Spain, Eaton & Funder, 2000) for outcome and determinants like age, weight and height, as well as de- termination of the genotype (Rabbee & Speed (2006); Rippe, Meulman & Eilers, 2012; Ziegler, König & Thompson, 2008) and reported ethnicity (Price et al, 2006). Measurement error can distort results either through error in the determi- nants, diluting estimates of the association toward zero, and through error in the outcome, inflating standard errors (Hutcheon et al, 2010). The current study illustrates to what extent the expected effects have remained undetected due to these errors. A large scale simulation study was set up using the highly efficient GWAS implementation of Sikorska et al. (2013) in order to evaluate genetic effect detection for different error levels in the variables involved. We observe that up to 20% more and 10% stronger genetic associations could be detected under smaller measure- ment error, showing possibly stronger biological effects than those cur- rently reported.   P2.5.144 Classification in high-dimensional feature spaces F Schroeder1 1 Austrian Institute of Technology, Vienna, Austria   The characteristic property of many data sets in modern scientific fields, such as genomics, is the high-dimensionality of its feature space. It poses a significant challenge for statistical methods for classification and has thus been the object of intensive research. This work studies the different approaches, with which standard classifica- tion methods, such as Discriminant Analysis, SupportVector Machines and Logistic Regression, have been modified to account for high-dimensional- ity, and compares their performance in different simulation experiments. Both prediction as well as model selection performance are examined un- der different parameters, including sample size, signal-to-noise ratios, and different structures of dependence. The results are supposed to guide the applied researcher in one of the most tricky questions: Choosing the most suitable method for a given re- search question and data set.  

Pages Overview