Context Matters: Using Genomic Knowledge to Improve Disorder Classification Models
Cast your vote
You can rate an item by clicking the amount of stars they wish to award to this item.
When enough users have cast their vote on this item, the average rating will also be shown.
Your vote was cast
Thank you for your feedback
Thank you for your feedback
AuthorBarnett, Eric J.
Term and YearSpring 2023
MetadataShow full item record
AbstractDespite heritability estimates that suggest a high ceiling for the classification of many complex genetic disorders, current models have only been moderately successful at accurately classifying cases and controls of these disorders. The knowledge base about the human genome is large and continuously growing, but disorder classification models rarely use any of that information beyond genetic associations. We use three different genomic context data granularities, 4 different machine learning models, and datasets of mood disorders, ADHD, and type 2 diabetes to test hypotheses on whether including genomic context can improve modelling of disorder risk. When predicting whether subjects had been diagnosed with any mood disorder, we found that using polygenic risk scores from other psychiatric disorders in logistic regression models improved classification performance as measured by the area under the receiver operating characteristic curve (AUC). In another study classifying cases of ADHD and controls, we found that the addition of summations of risk based on the genetic variants' inclusion in gene sets associated with ADHD improved AUCs in random forest modelling. The random forest importance scores of those gene set polygenic risk scores showed biological relevance through the correlation of importance scores with relative gene set expression in the brain. In the final study classifying type 2 diabetes cases and controls, for each genetic variant, we attached several types of functional genomic annotations to genotype data. These genomic context informed genotype data were used in convolutional neural networks and significantly improved AUC compared to polygenic risk score models while using a within-model adversarial ancestry task to adjust for potential confounding due to ancestry. In these models, we found that some risk features developed by context informed data overlapped with features developed with standard genotype input while other risk features were unique to the input type. Together, these studies provide evidence that context matters when looking at the disorder risk conferred by genetic variants in complex genetic disorders.
The following license files are associated with this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International