Interpreting molecular mechanism of genomic variations and their causal relationships with diseases/traits are challenging and important problems in the human genetics study. It also facilitates the characterization of individual genomic alterations for personalized diagnosis and therapy. This novel platform will benefit researchers to interrogate the biological functions of genome variations.
Despite the great progress of international projects in generating, processing and distributing large amounts of genome/epigenome sequencing data and functional annotations, biologists and clinicians nowadays face tremendous difficulty to curate, collect and compare variant information from different resources, even need to download huge pre-computed files or manually calculate prediction scores. Besides, the overwhelming growth of tissue/cell type-specific and disease/trait-specific variant annotation enables evidence-driven prioritization of causal/pathogenic variant in particular conditions. Unfortunately, existing databases barely incorporate context-dependent functional annotation for drawing biologically meaningful conclusions of investigated variants. To benefit comprehensive and context-specific variant annotations for biologists and clinicians, here, by systematically integrating large-scale genomic/epigenomic profiles and frequently used annotation databases from various biological domains, we develop this database VannoPortal.
Our VannoPortal has following merits: 1) systematically incorporates lots of new genome-scale and context-dependent variant annotation resources from various biological domains, particularly for noncoding variants; 2) focuses more on interpretability of variant annotations instead of information enumeration using many intuitive visualizations and interactive web components; 3) enables direct comparison of some functional evidence between query variant and its linked ones without multi-round queries.
By equipping our recent novel index system and parallel random-sweep searching algorithms (Huang et al. Genome Res. 2020;30(12):1789-1801), we can efficiently perform information extraction and searching. In addition to more flexible random access, the indexed data structure and fast algorithm significantly facilitate the associated variants expansion in LD region as well as the tissue/cell type-specific scoring and prioritization among these variants.