1 Citation

Access the recommendation on F1000Prime Genome Medicine 2016;8:129 (DOI: 10.1186/s13073-016-0384-y)

2 Summary

We introduce XGR (eXploring Genomic Relations), released as an R package (http://cran.r-project.org/package=XGR) and web-app (http://galahad.well.ox.ac.uk/XGR) enabling downstream knowledge discovery from genomic summary data. Biological interpretation of genomic summary data resulting from such as GWAS and eQTL mapping is a major bottleneck in human disease genomics, calling for efficient and integrative tools designed to resolve this problem. In the past, this kind of research has been less appreciated but is becoming increasingly important with the growing availability of ontology, annotation, and network data essential for precision interpretation.

XGR is designed to make a user-defined list of genes or SNPs (or genomic regions) more interpretable by comprehensively utilising ontology and network information to generate more informative results than conventional methods. XGR is unique in supporting a broad range of ontologies (including knowledge of biological and molecular functions, pathways, diseases and phenotypes - in both human and mouse) and different types of networks (including functional, physical and pathway interactions).

In this user manual, you will be guided through the steps necessary to use this tool. After going through particularly the Showcases section which includes several demos with published data, you will be able to: 1) perform enrichment analysis using either built-in or custom ontologies, 2) calculate semantic similarity between genes (or between SNPs) based on their ontology annotation profiles, 3) identify a gene subnetwork given your query list of (significant) genes, SNPs or genomic regions, and 4) interpret genomic regions using co-localised functional genomic annotations and using nearby gene annotations by ontologies. For end-users who are unfamiliar with R, please refer to our user-friendly web app.1

3 Web-app (linkto)

4 Installation

4.1 R

R (http://www.r-project.org) is a language and environment for statistical computing and graphics. The latest version on different platforms can be installed: Windows (download), Mac OS X (download), and Linux (see below).

4.2 Packages

For installation of the XGR package, please follow the instructions below:

4.3 BugReports

We are grateful to have your feedbacks particularly bugs. To help streamline bug reports and fixes, please file an issue here.

5 Source data

All source data are represented uniformly as well-documented RData-formatted files, taking advantage of the R software environment and its infrastructure packages such as igraph (Csardi and Nepusz 2006) and GenomicRanges (Lawrence et al. 2013). These data are subject to regular updates, and are also regularly supplemented to keep pace with the explosive nature of big data in modern genome biology.

5.1 Ontologies and annotations at the gene level

Ontologies and their identifier codes used in XGR are summarised below
Category Ontology Identifier Codes
Disease Disease Ontology DO
Function Gene Ontology Molecular Function GOMF
Function Gene Ontology Biological Process GOBP
Function Gene Ontology Cellular Component GOCC
Phenotype Human Phenotype Phenotypic Abnormality HPPA
Phenotype Human Phenotype Mode of Inheritance HPMI
Phenotype Human Phenotype Clinical Modifier HPCM
Phenotype Human Phenotype Mortality Aging HPMA
Phenotype Mammalian/Mouse Phenotype MP
Trait Experimental Factor Ontology EF
Druggability DGI druggable gene categories DGIdb
Domain SCOP domain superfamilies SF
Domain Pfam domain families Pfam
MsigDB Hallmark gene sets MsigdbH
MsigDB Chromosome and cytogenetic band positional gene sets MsigdbC1
MsigDB Chemical and genetic perturbation gene sets MsigdbC2CGP
MsigDB All pathway gene sets MsigdbC2CPall
MsigDB Canonical pathway gene sets MsigdbC2CP
MsigDB KEGG pathway gene sets MsigdbC2KEGG
MsigDB Reactome pathway gene sets MsigdbC2REACTOME
MsigDB BioCarta pathway gene sets MsigdbC2BIOCARTA
MsigDB Transcription factor target gene sets MsigdbC3TFT
MsigDB microRNA target gene sets MsigdbC3MIR
MsigDB Cancer gene neighborhood gene sets MsigdbC4CGN
MsigDB Cancer module gene sets MsigdbC4CM
MsigDB GO biological process gene sets MsigdbC5BP
MsigDB GO molecular function gene sets MsigdbC5MF
MsigDB GO cellular component gene sets MsigdbC5CC
MsigDB Oncogenic signature gene sets MsigdbC6
MsigDB Immunologic signature gene sets MsigdbC7

5.2 Annotations at the genomic region level

Data types, sources, and identifier codes used in XGR are summarised below
Category Source Genomic Annotations Identifier Codes
TFBS ENCODE Cell-type-specific TFBS uniformly identified Uniform_TFBS
TFBS ENCODE Clustered TFBS ENCODE_TFBS_ClusteredV3
TFBS ENCODE Cell-type-specific clustered TFBS ENCODE_TFBS_ClusteredV3_CellTypes
DHS ENCODE Cell-type-specific DHS uniformly identified Uniform_DNaseI_HS
DHS ENCODE Clustered DHS ENCODE_DNaseI_ClusteredV3
DHS ENCODE Cell-type-specific clustered DHS ENCODE_DNaseI_ClusteredV3_CellTypes
Histone Modifications ENCODE Cell-type-specific histone modifications Broad_Histone
Histone Modifications ENCODE Cell-type-specific histone modifications SYDH_Histone
Histone Modifications ENCODE Cell-type-specific histone modifications UW_Histone
Expressed Enhancers FANTOM5 Cell-type-specific expressed enhancers FANTOM5_Enhancer_Cell
Expressed Enhancers FANTOM5 Tissue-specific expressed enhancers FANTOM5_Enhancer_Tissue
Expressed Enhancers FANTOM5 Extensive enhancers FANTOM5_Enhancer_Extensive
Expressed Enhancers FANTOM5 Full collections of enhancers FANTOM5_Enhancer
Genome Segmentations ENCODE Combined genome segmentation for GM12878 Segment_Combined_Gm12878
Genome Segmentations ENCODE Combined genome segmentation for H1-hESC Segment_Combined_H1hesc
Genome Segmentations ENCODE Combined genome segmentation for HeLa S3 Segment_Combined_Helas3
Genome Segmentations ENCODE Combined genome segmentation for HepG2 Segment_Combined_Hepg2
Genome Segmentations ENCODE Combined genome segmentation for HUVEC Segment_Combined_Huvec
Genome Segmentations ENCODE Combined genome segmentation for K562 Segment_Combined_K562
Conserved TFBS TRANSFAC PWM human/mouse/rat conserved TFBS TFBS_Conserved
miRNA regulatory sites TargetScan miRNA regulatory sites TS_miRNA
Cancer mutations TCGA Tumor-type-specific exome mutations TCGA

5.3 Ontology annotations at the SNP level

SNP annotations are based on the Experimental Factor Ontology (EFO). EFO standardises GWAS traits from the NHGRI GWAS Catalog using well-defined terms (Welter et al. 2014). Knowledge of co-inherited variants is also used to include additional SNPs that are in Linkage Disequilibrium (LD) with GWAS lead SNPs. LD SNPs are calculated based on the 1000 Genomes Project data (1000 Genomes Project Consortium 2012). LD SNPs are defined to be any SNPs having R2>0.8 with GWAS lead SNPs.

List of populations used to calculate LD SNPs
Identifier Code Population Project
AFR African 1000 Genomes Project
AMR Admixed American 1000 Genomes Project
EAS East Asian 1000 Genomes Project
EUR European 1000 Genomes Project
SAS South Asian 1000 Genomes Project

5.4 Interaction networks at the gene level

XGR support networks of different interaction types (functional, physical, and pathway-derived), of varying interaction quality (highest, high, and medium) and of two interaction directions (directed versus undirected). These are mainly sourced from the STRING database (Szklarczyk et al. 2015) and the Pathway Commons database (Cerami et al. 2011). STRING is a meta-integration of undirect interactions from a functional aspect, while Pathway Commons mainly contains both undirect and direct interactions from a physical/pathway aspect. In addition to interaction type, users can choose the interactions of varying quality:

Database, interaction type and quality, and identifier codes used in XGR are summarised below
Identifier Code Interaction (type and quality) Database
STRING_high Functional interactions (with high confidence scores>=700) STRING
STRING_medium Functional interactions (with medium confidence scores>=400) STRING
PCommonsUN_high Physical/undirect interactions (with references & >=2 sources) Pathway Commons
PCommonsUN_medium Physical/undirect interactions (with references & >=1 sources) Pathway Commons
PCommonsDN_high Pathway/direct interactions (with references & >=2 sources) Pathway Commons
PCommonsDN_medium Pathway/direct interactions (with references & >=1 sources) Pathway Commons
For the pathway-merged direct interactions, networks sourced individually are also supported
Identifier Code Interaction (source) Database
PCommonsDN_Reactome Pathway/direct interactions (only from Reactome) Pathway Commons
PCommonsDN_KEGG Pathway/direct interactions (only from KEGG) Pathway Commons
PCommonsDN_HumanCyc Pathway/direct interactions (only from HumanCyc) Pathway Commons
PCommonsDN_PID Pathway/direct interactions (only from PID) Pathway Commons
PCommonsDN_PANTHER Pathway/direct interactions (only from PANTHER) Pathway Commons
PCommonsDN_ReconX Pathway/direct interactions (only from ReconX) Pathway Commons
PCommonsDN_PhosphoSite Pathway/direct interactions (only from PhosphoSite) Pathway Commons
PCommonsDN_CTD Pathway/direct interactions (only from CTD) Pathway Commons

6 Functionality

The functions in the package XGR are categorised into five groups according to the tasks they complete. They are summarised below.

6.1 Enrichment functions

Enrichment functions are supposed to do enrichment analysis based on several statistical tests (either Fisher’s exact test or hypergeometric/binomial test). The test is to estimate significance of overlaps between, for example, an input group of genes and a group of genes annotated by an ontology term. By default, all annotatable genes are used as the test background but can be specified by the user. If ontology terms are organised as a tree-like structure, this ontology structure can also be taken into account to produce more informative results. Particularly for a non-structure ontologies (eg a collection of pathways), a filtering procedure is also developed to generate non-redundant but informative results.

6.1.1 xEnricherGenes

xEnricherGenes: conducts gene-based enrichment analysis given a list of genes and the ontology in query. It supports two types of ontologies: 1) structured ontologies including Gene Ontology (Ashburner et al. 2000), Disease Ontology (Schriml et al. 2012), and Phenotype Ontologies in human and mouse (Köhler et al. 2013; Smith and Eppig 2009), and 2) non-structured ontologies/categories; for example, a collection of pathways, gene expression signatures, transcription factor targets, and gene druggable categories.

6.1.2 xEnricherSNPs

xEnricherSNPs: conducts SNP-based enrichment analysis using GWAS Catalog traits mapped to Experimental Factor Ontology (Welter et al. 2014). Inclusion of additional SNPs that are in linkage disequilibrium (LD) with input SNPs are also allowed for enrichment analysis.

6.1.3 xEnricherYours

xEnricherYours: conducts custom-based enrichment analysis provided with an entity file and an annotation file.

6.1.4 xEnricher

xEnricher: acts as a template for enrichment analysis. It is an internal function upon which high-level functions (ie xEnricherGenes, xEnricherSNPs and xEnricherYours) rely.

6.1.5 xEnrichViewer

xEnrichViewer: views enrichment results as a data frame that is also useful for the subsequent file saving.

6.1.6 xEnrichConciser

xEnrichConciser: makes enrichment results much clearer by removing redundant terms. A redundant term is claimed if its overlapped part with a more significant term meets both criteria: covers more than 95% of this redundant term and also more than 50% of the more significant term. In doing so, only non-redundant but informative terms will be left.

6.1.7 xEnrichBarplot

xEnrichBarplot: visualises enrichment results using a barplot.

6.1.8 xEnrichDAGplot

xEnrichDAGplot: visualises enrichment results using a DAG plot. This function is only useful for tree-like structured ontologies. Significant terms (of interest) are highlighted by box-shaped nodes, and the others by ellipse nodes.

6.1.9 xEnrichCompare

xEnrichCompare: compares enrichment results using side-by-side barplots. This function is useful when comparing enrichment results for different inputs but based on the same ontology.

6.1.10 xEnrichDAGplotAdv

xEnrichDAGplotAdv: visualises comparative enrichment results using a DAG plot. This function takes input the output of the function xEnrichCompare to further illustrate differences and commonalities of comparative enrichment results in the context of ontology tree.

6.2 Similarity functions

Similarity functions serve to conduct similarity analysis calculating semantic similarity - a type of comparison to assess the degree of relatedness between two entities (eg genes or SNPs) based on their annotation profiles (by ontology terms) (Pesquita et al. 2009). To do so, information content (IC) of a term is first defined to measure how informative a term is to being used for annotating genes: –log10(frequency of genes annotated to this term). Similarity between two terms are then measured based on IC, usually at the most informative common ancester (MICA). Finally, similarity between two entities (eg genes) are derived from pairwise term similarity using best-matching based methods: average, maximum, and complete.

6.2.1 xSocialiserGenes

xSocialiserGenes: conducts gene-based similarity analysis given a list of genes and the ontology in query. It supports several structured ontologies including Gene Ontology, Disease Ontology, and Phenotype Ontologies (in human and mouse), and returns socialised genes represented as a network with nodes for input genes and edges for pair-wise semantic similarity between them.

6.2.2 xSocialiserSNPs

xSocialiserSNPs: conducts SNP-based similarity analysis using GWAS Catalog traits mapped to Experimental Factor Ontology. Inclusion of additional SNPs that are in linkage disequilibrium (LD) with input SNPs are also allowed for similarity analysis. It returns socialised SNPs represented as a network with nodes for input SNPs and edges for pair-wise semantic similarity between them.

6.2.3 xSocialiser

xSocialiser: acts as a template for similarity analysis. It is an internal function upon which high-level functions (ie xSocialiserGenes and xSocialiserSNPs) rely.

6.2.4 xCircos

xCircos: visualises the similarity results using a circos plot. The degree of similarity between SNPs (or genes) is visualised by the colour of links. This function can be used either to visualise the most similar links or to plot links involving an input SNP (or gene).

6.2.5 xSocialiserDAGplot

xSocialiserDAGplot: visualises terms used to annotate an input SNP (or gene) using a DAG plot. Terms used for direct/original annotations by box-shaped nodes, and terms for indirect/inherited annotations by ellipse nodes. This function is part of utilities in understanding calculated similarity.

6.2.6 xSocialiserDAGplotAdv

xSocialiserDAGplotAdv: uses a DAG plot to visualise and compare two sets of terms used to annotate two input SNPs (or genes) that are predicted to be similar. This function is part of utilities in understanding calculated similarity.

6.3 Network functions

Network functions are supposed to identify a gene subnetwork from a gene interaction network with node/gene significant information. The node/gene information can be directly provided (eg user-defined genes with the significance level; p-values or FDR); see the function xSubneterGenes. The node/gene information can also be indirectly provided, for example, nearby genes of user-defined SNPs with the significance level (eg GWAS reported p-values; see the function xSubneterSNPs), or more generally, nearby genes of user-defined genomic regions with the significance level (eg differentially methylated regions together with FDR; see the function xSubneterGR). From a gene interaction network with nodes labelled with gene information, the algorithm searching for a maximum-scoring gene subnetwork has been reported in our previous publication (Fang and Gough 2014), briefed as follows:

  1. score transformation, that is, given the threshold of tolerable p-value, nodes with p-values below this threshold (nodes of interest) are scored positively, and negative scores for nodes with threshold-above p-values (intolerable);

  2. subnetwork identification, that is, to find an interconnected gene subnetwork enriched with positive-score nodes, but allowing for a few negative-score nodes as linkers;

  3. controlling the subnetwork size, that is, an iterative procedure is provided to finetune tolerable thresholds for identifying the gene subnetwork with a desired number of nodes.

6.3.1 xSubneterGenes

xSubneterGenes: takes as input a list of user-defined genes with the significance level (p-values), superposes these genes onto a gene interaction network, and outputs a maximum-scoring gene subnetwork that contains as many most significant (highly scored) genes as possible but also a few lesser significant (scored) genes as linkers.

6.3.2 xSNP2GeneScores

xSNP2GeneScores: takes as input a list of user-defined SNPs with the significance level (eg GWAS reported p-values), and defines and scores nearby genes that take into account the distance to and the significance of input SNPs.

6.3.3 xSubneterSNPs

xSubneterSNPs: identifies a gene subnetwork that is likely modulated by input SNPs and/or their Linkage Disequilibrium (LD) SNPs, including two major steps. The first step is to use xSNP2GeneScores for defining and scoring nearby genes that are located within distance window of input and/or LD SNPs. The second step is to use xSubneterGenes for identifying a maximum-scoring gene subnetwork.

6.3.4 xGR2GeneScores

xGR2GeneScores: takes as input a list of user-defined genomic regions (GR) with the significance level (eg p-values), and defines and scores nearby genes that take into account the distance to and the significance of input GR.

6.3.5 xSubneterGR

xSubneterGR: identifies a gene subnetwork that is likely modulated by input genomic regions (GR), including two major steps. The first step is to use xGR2GeneScores for defining and scoring nearby genes that are located within distance window of input genomic regions. The second step is to use xSubneterGenes for identifying a maximum-scoring gene subnetwork.

6.4 Annotation functions

Annotation functions are supposed to interpret a user-defined list of genomic regions either via looking at nearby gene annotations by ontologies or via looking at co-localised functional genomic annotations.

6.4.1 xGRviaGeneAnno

xGRviaGeneAnno: conducts region-based enrichment analysis using nearby gene annotations, including two major steps. The first step is to define nearby genes within the maximum distance gap between genomic regions and gene location. The second step is to use xEnricherGenes for enrichment analysis to identify enriched terms.

6.4.2 xGRviaGenomicAnno

xGRviaGenomicAnno: conducts region-based enrichment analysis using functional genomic annotations. Enrichment analysis is based on binomial test for estimating the significance of overlaps at three levels of resolution. Genomic annotations cover a broad spectrum of genetic and epigenetic knowledge, including functional genomic data experimentally generated by the consortia such as ENCODE (Bernstein et al. 2012), FANTOM5 (Forrest et al. 2014), BLUEPRINT Epigenome (Adams et al. 2012), TCGA (Kandoth et al. 2013) and Roadmap Epigenomics (Kundaje et al. 2015), and comparative genomic data predicted by the computational methods.

Notably, the resolution of overlaps being tested can be:

  1. bases at the base resolution (by default),

  2. regions at the region resolution,

  3. hybrid at the base-region hybrid resolution (that is, data at the region resolution but annotation/background at the base resolution).

Generally speaking, if genomic annotations are exclusive to each other, the resolution can be either regions or hybrid. If genomic annotations are somehow inclusive to each other, it is better to choose hybrid (or bases). If regions being analysed are SNPs, then the results are the same (irrespective of specified resolution).

6.4.3 xGRviaGenomicAnnoAdv

xGRviaGenomicAnnoAdv: conducts region-based enrichment analysis using functional genomic annotations. Enrichment analysis is achieved by comparing the observed overlaps against the expected overlaps which are estimated from the null distribution. The null distribution is generated via sampling, that is, randomly generating samples for data genomic regions from background genomic regions. Background genomic regions can be provided by the user; by default, the annotatable genomic regions will be used. Since sampling is time-consuming, the parallel computation is also supported for Unix-like computers.

6.5 Infrastructure functions

Infrastructure functions are essential as they deal with infrastructure including built-in data loading, ontology annotation propagation, calculation of term-term semantic similarity, graph conversions and visualisations, and define and score genes likely modulated by SNPs.

  • xRDataLoader: serves as hub for loading built-in data about genes, SNPs, ontologies and annotations.
  • xDAGanno: induces annotations to the ontology root according to the true-path rule.
  • xDAGpropagate: propagates annotations (together with numeric info on such as p-values) to the ontology root according to the true-path rule.
  • xDAGsim: calculates semantic similarity between terms, and returns a network with nodes for terms and edges for pair-wise semantic similarity between terms.
  • xConverter: converts an object between graph classes.
  • xVisNet: visualises the graph in different layouts.
  • xVisKernels: visualises distance kernel functions.
  • xSNPscores: calculates scores for lead or LD SNPs.
  • xSNPlocations: extracts genomic locations for SNPs.
  • xSNP2nGenes: defines nearby genes from SNPs.
  • xSparseMatrix: creates a sparse matrix for an input file.
  • xSM2DF: creates a data frame (with three columns) from a (sparse) matrix.
  • xLiftOver: lifts genomic intervals from one genome build to another.
  • xGRsampling: generates random samples for data genomic regions from background genomic regions.
  • xColormap: defines a colormap (color palette).
  • xGRscores: calculates scores for input genomic regions (GR).
  • xGR2nGenes: defines nearby genes from input genomic regions (GR).
  • xGR: creates a GRanges object given a list of genomic regions (GR).
  • xCheckParallel: checks whether parallel computing should be used and how.
  • xSymbol2GeneID: converts gene symbols to entrez geneid.
  • xGeneID2Symbol: converts gene symbols to entrez geneid.
  • xDefineNet: defines a gene network sourced from the STRING database or the Pathway Commons database.
  • xSimplifyNet: simplifies a network by keeping root-tip shortest paths only.
  • xHeatmap: draws a heatmap.

6.6 Auxiliary functions

Auxiliary functions provide supplementary supports during the package development, such as code debugging and documentation creating.

  • xAuxFunArgs: assigns arguments with default values for a given function, useful for code debugging.
  • xAuxRdWrap: wraps long texts onto the next line for Rd files.
  • xAuxRd2HTML: converts Rd files to HTML files.
  • xAuxEmbed: embeds a file (encoded as a base64 string) into the R markdown output html file.

7 Showcases

An essential step of data analysis is how to make sense of a gene (or SNP) list in a biologically-meaningful way. Genes (and/or SNPs) may be identified from differential expression studies, GWAS and eQTL mappings. We showcase the applications using several published datasets. The users are encouraged to adapt the provided codes to analyse their own datasets.

7.1 Interpreting differential genes

Demo Series 1: using the XGR package to interpret summary data resulting from differential expression studies

7.2 Interpreting GWAS SNPs

Demo Series 2: using the XGR package to interpret GWAS summary data

7.3 Interpreting eQTL SNPs

Demo Series 3: using the XGR package to interpret eQTL summary data

8 References

Below is the list of references that XGR stands on:

1000 Genomes Project Consortium. 2012. “An integrated map of genetic variation from 1,092 human genomes.” Nature 491 (7422): 56–65. https://doi.org/10.1038/nature11632.

Adams, D, L Altucci, Se Antonarakis, J Ballesteros, S Beck, A Bird, C Bock, et al. 2012. “BLUEPRINT to decode the epigenetic signature written in blood.” Nature Biotechnology 30 (3): 224–26. https://doi.org/10.1038/nbt.2153.

Ashburner, M, C A Ball, J A Blake, D Botstein, H Butler, J M Cherry, A P Davis, et al. 2000. “Gene ontology: tool for the unification of biology.” Nat Genet 25 (1): 25–29. https://doi.org/10.1038/75556.

Bernstein, Bradley E, Ewan Birney, Ian Dunham, Eric D Green, Chris Gunter, and Michael Snyder. 2012. “An integrated encyclopedia of DNA elements in the human genome.” Nature 489 (7414): 57–74. https://doi.org/10.1038/nature11247.

Cerami, E. G., B. E. Gross, E. Demir, I. Rodchenkov, O. Babur, N. Anwar, N. Schultz, G. D. Bader, and C. Sander. 2011. “Pathway Commons, a web resource for biological pathway data.” Nucleic Acids Research 39 (Database): D685–D690. https://doi.org/10.1093/nar/gkq1039.

Csardi, G, and T Nepusz. 2006. “The igraph software package for complex network research.” InterJournal Complex Systems 1695: 1695.

Fang, Hai, and Julian Gough. 2014. “The ’dnet’ approach promotes emerging research on cancer patient survival.” Genome Medicine 6 (8): 64. https://doi.org/10.1186/s13073-014-0064-8.

Forrest, Alistair R. R., Hideya Kawaji, Michael Rehli, J. Kenneth Baillie, Michiel J. L. de Hoon, Vanja Haberle, Timo Lassmann, et al. 2014. “A promoter-level mammalian expression atlas.” Nature 507 (7493): 462–70. https://doi.org/10.1038/nature13182.

Kandoth, Cyriac, Michael D McLellan, Fabio Vandin, Kai Ye, Beifang Niu, Charles Lu, Mingchao Xie, et al. 2013. “Mutational landscape and significance across 12 major cancer types.” Nature 502 (7471): 333–9. https://doi.org/10.1038/nature12634.

Köhler, Sebastian, Sandra C Doelken, Christopher J Mungall, Sebastian Bauer, Helen V Firth, Isabelle Bailleul-Forestier, Graeme C M Black, et al. 2013. “The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data.” Nucleic Acids Research 42 (Database issue): D966–74. https://doi.org/10.1093/nar/gkt1026.

Kundaje, Anshul, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, et al. 2015. “Integrative analysis of 111 reference human epigenomes.” Nature 518: 317–30. https://doi.org/10.1038/nature14248.

Lawrence, Michael, Wolfgang Huber, Hervé Pagès, Patrick Aboyoun, Marc Carlson, Robert Gentleman, Martin T Morgan, and Vincent J Carey. 2013. “Software for Computing and Annotating Genomic Ranges.” PLoS Computational Biology 9 (8): 1–10. https://doi.org/10.1371/journal.pcbi.1003118.

Pesquita, C, D Faria, A O Falcao, P Lord, and F M Couto. 2009. “Semantic similarity in biomedical ontologies.” PLoS Comput Biol 5 (7): e1000443. https://doi.org/10.1371/journal.pcbi.1000443.

Schriml, L M, C Arze, S Nadendla, Y W Chang, M Mazaitis, V Felix, G Feng, and W A Kibbe. 2012. “Disease Ontology: a backbone for disease semantic integration.” Nucleic Acids Res 40 (Database issue): D940–6. https://doi.org/10.1093/nar/gkr972.

Smith, C L, and J T Eppig. 2009. “The Mammalian Phenotype Ontology: enabling robust annotation and comparative analysis.” Wiley Interdiscip Rev Syst Biol Med 1 (3): 390–99. https://doi.org/10.1002/wsbm.44.

Szklarczyk, Damian, Andrea Franceschini, Stefan Wyder, Kristoffer Forslund, Davide Heller, Jaime Huerta-cepas, Milan Simonovic, et al. 2015. “STRING v10 : protein – protein interaction networks , integrated over the tree of life.” Nucleic Acids Res 43 (Database): D447–D452. https://doi.org/10.1093/nar/gku1003.

Welter, Danielle, Jacqueline MacArthur, Joannella Morales, Tony Burdett, Peggy Hall, Heather Junkins, Alan Klemm, et al. 2014. “The NHGRI GWAS Catalog, a curated resource of SNP-trait associations.” Nucleic Acids Research 42 (D1): 1001–6. https://doi.org/10.1093/nar/gkt1229.


  1. http://galahad.well.ox.ac.uk/XGR