Show simple item record

dc.contributor.advisorFaraone, Stephen
dc.contributor.authorBarnett, Eric J.
dc.description.abstractDespite heritability estimates that suggest a high ceiling for the classification of many complex genetic disorders, current models have only been moderately successful at accurately classifying cases and controls of these disorders. The knowledge base about the human genome is large and continuously growing, but disorder classification models rarely use any of that information beyond genetic associations. We use three different genomic context data granularities, 4 different machine learning models, and datasets of mood disorders, ADHD, and type 2 diabetes to test hypotheses on whether including genomic context can improve modelling of disorder risk. When predicting whether subjects had been diagnosed with any mood disorder, we found that using polygenic risk scores from other psychiatric disorders in logistic regression models improved classification performance as measured by the area under the receiver operating characteristic curve (AUC). In another study classifying cases of ADHD and controls, we found that the addition of summations of risk based on the genetic variants' inclusion in gene sets associated with ADHD improved AUCs in random forest modelling. The random forest importance scores of those gene set polygenic risk scores showed biological relevance through the correlation of importance scores with relative gene set expression in the brain. In the final study classifying type 2 diabetes cases and controls, for each genetic variant, we attached several types of functional genomic annotations to genotype data. These genomic context informed genotype data were used in convolutional neural networks and significantly improved AUC compared to polygenic risk score models while using a within-model adversarial ancestry task to adjust for potential confounding due to ancestry. In these models, we found that some risk features developed by context informed data overlapped with features developed with standard genotype input while other risk features were unique to the input type. Together, these studies provide evidence that context matters when looking at the disorder risk conferred by genetic variants in complex genetic disorders.en_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.subjectGenomic Contexten_US
dc.subjectMachine Learningen_US
dc.subjectGenetic Risken_US
dc.titleContext Matters: Using Genomic Knowledge to Improve Disorder Classification Modelsen_US
dc.description.institutionUpstate Medical Universityen_US
dc.description.degreelevelPhDen_US 2023en_US

Files in this item


This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International