Loading...
Thumbnail Image
Publication

Deep learning on graphs: context-enriched biological representations for disorder classification and network extraction

Saxena, Ankita
Citations
Altmetric:
Journal Title
Keywords
Readers/Advisors
Journal Title
Term and Year
Publication Date
2026-03
Book Title
Publication Volume
Publication Issue
Publication Begin
Publication End
Number of pages
Research Projects
Organizational Units
Journal Issue
Abstract
Most disease classification models treat biological features as independent, discarding relational information among genes or brain regions that may carry predictive signal. Graph Neural Networks (GNNs) can operate on explicitly structured inputs, making them a natural framework for leveraging such context, but it is not well understood when they confer practical advantages or what methodological standards are required to evaluate them. Across three studies, we investigated whether providing biological context at multiple scales: at the gene level, among brain regions, and across modalities, can improve classification and interpretability for complex disorders. A systematic review of 75 published GNN models applied to disease gene and network identification revealed that data leakage was present in nearly a third of models and was significantly associated with inflated performance, while inconsistent reporting of model design limited independent evaluation of GNN-based approaches. For Alzheimer's Disease classification using individual-level GWAS data from 7,358 participants, a Graph Attention Network (GAT) operating on curated biological pathway graphs did not outperform polygenic risk scores alone, but captured complementary signal: ensemble models combining GNN and PRS predictions significantly outperformed PRS (AUC 0.82 vs. 0.80), and the benefit emerged only after incorporation of intergenic risk and graph-level context via transfer learning. Architectural modifications that forced engagement with graph topology were required for the GNN to extract value from the graph, and curated pathway graphs supported productive learning while empirically derived co-expression graphs introduced noise. For childhood ADHD classification using structural MRI data from the ENIGMA consortium (n = 2,331), a GAT using a tractography-derived connectivity graph achieved significant classification while a topology-agnostic baseline did not. Simultaneously, GSEA-style enrichment analysis of model attributions revealed biologically coherent circuit-level signals, including convergence on fronto-striatal circuitry after covariate adjustment. Collectively, these findings support the conclusion that biological context, when properly structured, can improve both prediction and interpretability in complex disease classification, even when graph-based models do not outperform simpler approaches in isolation.
Citation
DOI
Description
Accessibility Statement
Embedded videos