1 Web-app (linkto)

2 Summary

This demo showcases the analytic power of using XGR to make sense of GWAS SNPs, including network analysis, enrichment analysis, and annotation analysis. Network analysis supported in XGR is unique in its power to identify SNP-modulated gene networks (via SNP scoring, gene scoring and network scoring) and to perform cross-disease analysis. When coupled with annotation analysis (via looking at nearby gene annotations by ontologies or via looking at co-localised functional genomic annotations), XGR is able to perform in-depth interpretations of the underlying genetic landscape of immunological diseases based on GWAS summary data.

3 Packages

library(XGR)

# Specify the locations of built-in data
RData.location <- "http://galahad.well.ox.ac.uk/bigdata"

4 Data

The dataset ImmunoBase is GWAS lead SNPs associated with immunologically related human diseases, obtained from ImmunoBase.

ImmunoBase <- xRDataLoader(RData.customised='ImmunoBase', RData.location=RData.location)
# get info about diseases
disease_list <- lapply(ImmunoBase, function(x) x$disease)
# get the number of disease associated variants/SNPs
variants_list <- lapply(ImmunoBase, function(x) length(unique(names(x$variants))))
# get the number of genes that are located within 50kb distance window of SNPs
genes_list <- lapply(ImmunoBase, function(x) length(x$genes_variants<=50000))
# create a data frame
df_ib <- data.frame(Code=names(ImmunoBase), Disease=unlist(disease_list), num_SNPs=unlist(variants_list), num_nearby_genes=unlist(genes_list), stringsAsFactors=F)

We restrict common diseases and each having at least 30 GWAS lead SNPs. As such we have 11 common diseases for cross-disease analysis below.

ind <- match(c('IBD','CRO','UC','PSO','AS','RA','CEL','T1D','MS','SLE','ATD'), names(ImmunoBase))
IB <- ImmunoBase[ind]

A summary of 11 diseases (along with GWAS lead SNPs and their nearby genes):

Code Disease num_SNPs num_nearby_genes
IBD Inflammatory Bowel Disease (IBD) 148 2280
CRO Crohn’s Disease (CRO) 174 1945
UC Ulcerative Colitis (UC) 120 1651
PSO Psoriasis (PSO) 68 624
AS Ankylosing Spondylitis (AS) 32 437
RA Rheumatoid Arthritis (RA) 165 1070
CEL Celiac Disease (CEL) 144 734
T1D Type 1 Diabetes (T1D) 180 1046
MS Multiple Sclerosis (MS) 237 1725
SLE Systemic Lupus Erythematosus (SLE) 61 742
ATD Autoimmune Thyroid Disease (ATD) 33 233

5 Showcase

5.1 Scoring from SNPs to genes

Do SNP scoring and gene scoring for each disease

disease_ls <- lapply(IB, function(x){
    gr <- x$variant
    data <- GenomicRanges::mcols(gr)[, c('Variant','Pvalue')]
    xSNP2GeneScores(data=data, include.LD='EUR', LD.r2=0.8, significance.threshold=5e-8, distance.max=50000, decay.kernel="rapid", decay.exponent=2, GR.SNP="dbSNP_GWAS", GR.Gene="UCSC_knownCanonical", scoring.scheme="max", verbose=T, RData.location=RData.location)
})

Get all diseases and all genes

## all diseases
allDiseases <- names(IB)
## all Genes
res_Gene <- lapply(disease_ls, function(x) x$Gene$Gene)
allGenes <- unique(unlist(res_Gene))

A matrix of Genes X Diseases:

mat_Gene <- 0
for(i in 1:length(disease_ls)){
    df <- disease_ls[[i]]$Gene
    df$Disease <- rep(allDiseases[i], nrow(df))
    mat <- xSparseMatrix(df[,c("Gene","Disease","Score")], rows=allGenes, columns=allDiseases)
    mat_Gene <- mat_Gene + mat
}
mat_Gene <- as.matrix(mat_Gene)

A neighbor-joining tree (with bootstrap confidence values) based on the matrix of Genes X Diseases. The distance between two diseases is defined as the sum of their difference in gene scores (that is, manhattan/cityblock distance).

data <- mat_Gene
tree_bs <- visTreeBootstrap(data=t(data), algorithm="nj", metric="manhattan", num.bootstrap=10000, reroot="min.bootstrap", visTree=T, nodelabels.arg=list(cex=0.6))

The corresponding consensus tree of the above bootstrapped tree is (based on the majority voting):

data <- mat_Gene
tree_cons <- visTreeBootstrap(data=t(data), algorithm="nj", metric="manhattan", num.bootstrap=10000, consensus=T, reroot="min.bootstrap", visTree=T)
g <- igraph::as.igraph(tree_cons)
vertex.label <- V(g)$name
vertex.label <- gsub("Node.*", "", vertex.label)
vertex.size <- rep(12, vcount(g))
vertex.size[vertex.label==''] <- 2
xVisNet(g=g, glayout=layout_with_kk, vertex.shape="sphere", vertex.size=vertex.size, vertex.color="white", vertex.label=vertex.label, vertex.label.dist=0.9, vertex.label.font=2, , newpage=F)

Heatmap of the 1311 Genes X 11 Diseases matrix:

tree_bs_ind <- match(tree_bs$tip.label, colnames(mat_Gene))
data <- mat_Gene[, rev(tree_bs_ind)]
visHeatmapAdv(data=data, Rowv=T, Colv=F, dendrogram="none", dist.metric="manhattan", colormap="lightblue-blue", labRow=NA, KeyValueName="Gene scores", zlim=c(0,8), add.expr=abline(v=(1:(ncol(data)+1))-0.5,col="white"), lmat=rbind(c(4,3), c(2,1)), lhei=c(1,4), lwid=c(1,3), margins=c(12,2), srtCol=30, cexCol=0.8)

5.2 SNP-modulated gene networks underlying disease categories

Neighbour-joining tree of above 11 common diseases (based on scored genes under the genetic influence of GWAS SNPs) nicely captures the current disease classification/taxonomy according to the genetic and cellular basis fo autoinflammation and autoimmunity, against the traditional classification by organ or system affected.

The tree shows the analysable diseases span the autoinflammatory-autoimmne continuum (a classification system reflecting the relative role of innate immune responses versus adaptive immune responses played in the disease development), including three categories:

  1. Classic polygenic autoinflammatory diseases, that is, polygenic diseases with a prominent autoinflammatory component, including IBD, CRO and UC;

  2. classic polygenic autoimmune diseases, that is, polygenic diseases with a prominent autoimmune component, including SLE, ATD, CEL, MS, T1D and RA;

  3. mixed diseases having both components, including PSO and AS.

Our analysis also shows that mixed diseases have more the autoinflammatory component than the autoimmune component. Polygenic autoinflammatory diseases may be subdivided into two subtypes, the one for SLE and ATD, the other for CEL, MS, T1D and RA.

To understand the molecular basis of these three disease categories, we next aim to identify the top SNP-modulated gene networks based on pooled GWAS SNPs for each category.

5.2.1 Polygenic autoinflammatory diseases

# pool together GWAS SNPs
autoinfl <- c('IBD','CRO','UC')
ind <- match(autoinfl, names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
inGR <- GenomicRanges::unlist(GRL)

# find maximum-scoring gene subnetwork with the desired node number=23
data <- GenomicRanges::mcols(inGR)[, c('Variant','Pvalue')]
subnet_in <- xSubneterSNPs(data=data, significance.threshold=5e-8, distance.max=50000, network="STRING_high", decay.kernel="rapid", decay.exponent=2, GR.SNP="dbSNP_GWAS", GR.Gene="UCSC_knownCanonical", scoring.scheme="max", seed.genes=T, subnet.size=23, RData.location=RData.location)

The identified gene network with 23 nodes colored according to gene scores (the higher the more signficant) is shown below. Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

subnet <- subnet_in
pattern <- -log10(as.numeric(V(subnet)$significance))
xVisNet(g=subnet, pattern=pattern, glayout=layout_with_kk, vertex.shape="sphere", colormap="white-yellow-orange", zlim=c(0,8), newpage=F, vertex.size=10, vertex.label.color="blue", vertex.label.dist=0.45, vertex.label.font=3)

5.2.2 Mixed diseases

# pool together GWAS SNPs
mix <- c('AS','PSO')
ind <- match(mix, names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
mxGR <- GenomicRanges::unlist(GRL)

# find maximum-scoring gene subnetwork with the desired node number=23
data <- GenomicRanges::mcols(mxGR)[, c('Variant','Pvalue')]
subnet_mx <- xSubneterSNPs(data=data, significance.threshold=5e-8, distance.max=50000, network="STRING_high", decay.kernel="rapid", decay.exponent=2, GR.SNP="dbSNP_GWAS", GR.Gene="UCSC_knownCanonical", scoring.scheme="max", seed.genes=T, subnet.size=23, RData.location=RData.location)

The identified gene network with 23 nodes colored according to gene scores (the higher the more signficant) is shown below. Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

subnet <- subnet_mx
pattern <- -log10(as.numeric(V(subnet)$significance))
xVisNet(g=subnet, pattern=pattern, glayout=layout_with_kk, vertex.shape="sphere", colormap="white-lightcyan-cyan", zlim=c(0,8), newpage=F, vertex.size=10, vertex.label.color="blue", vertex.label.dist=0.45, vertex.label.font=3)

5.2.3 Polygenic autoimmune diseases

# pool together GWAS SNPs
autoimmu <- c('SLE','ATD','T1D','RA','MS','CEL')
ind <- match(autoimmu, names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
imGR <- GenomicRanges::unlist(GRL)

# find maximum-scoring gene subnetwork with the desired node number=23
data <- GenomicRanges::mcols(imGR)[, c('Variant','Pvalue')]
subnet_im <- xSubneterSNPs(data=data, significance.threshold=5e-8, distance.max=50000, network="STRING_high", decay.kernel="rapid", decay.exponent=2, GR.SNP="dbSNP_GWAS", GR.Gene="UCSC_knownCanonical", scoring.scheme="max", seed.genes=T, subnet.size=23, RData.location=RData.location)

The identified gene network with 23 nodes colored according to gene scores (the higher the more signficant) is shown below. Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

subnet <- subnet_im
pattern <- -log10(as.numeric(V(subnet)$significance))
xVisNet(g=subnet, pattern=pattern, glayout=layout_with_kk, vertex.shape="sphere", colormap="white-red-darkred", zlim=c(0,8), newpage=F, vertex.size=10, vertex.label.color="blue", vertex.label.dist=0.45, vertex.label.font=3)

5.2.4 Comparing network genes across disease categories

# Combine into an igraph object called "net_merged"
list_igraph <- list(subnet_in, subnet_mx, subnet_im)
## edges
ls_edges <- lapply(list_igraph, function(x){
    igraph::get.data.frame(x, what="edges")
})
relations <- do.call(rbind, ls_edges)
relations <- relations[!duplicated(relations), ]
## nodes
ls_nodes <- lapply(list_igraph, function(x){
    igraph::get.data.frame(x, what="vertices")
})
nodes <- do.call(rbind, ls_nodes)
nodes <- nodes[,c('name','description')]
nodes <- nodes[!duplicated(nodes), ]
## net_merged
net_merged <- igraph::graph.data.frame(d=relations, directed=T, vertices=nodes)

# combined into a data frame called 'df_merged'
## indicator matrix
df_nodes <- igraph::get.data.frame(net_merged, what="vertices")
mat_merged <- matrix(0, nrow=nrow(df_nodes), ncol=3)
colnames(mat_merged) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
rownames(mat_merged) <- rownames(df_nodes)
mat_merged[V(subnet_in)$name, 1] <- 1
mat_merged[V(subnet_mx)$name, 2] <- 1
mat_merged[V(subnet_im)$name, 3] <- 1
## number of genes being shared
nSharings <- apply(mat_merged, 1, sum)
df_merged <- data.frame(df_nodes, mat_merged, nSharings=nSharings, stringsAsFactors=F)

# reorder
ind <- with(df_merged, order(-nSharings, -Autoinflammatory, -Mixed, -Autoimmune, rownames(df_merged)))
df_merged <- df_merged[ind, ]
#write.table(df_merged, file="network_genes_merged.txt", sep="\t", row.names=F)

A table listing network genes across 3 disease categories is shown below. The 0-1 codings indicate whether a gene is present or absent.

name description Autoinflammatory Mixed Autoimmune nSharings
STAT3 signal transducer and activator of transcription 3 (acute-phase response factor) 1 1 1 3
IL23R interleukin 23 receptor 1 1 0 2
IL2RA interleukin 2 receptor, alpha 1 0 1 2
TNFRSF1A tumor necrosis factor receptor superfamily, member 1A 0 1 1 2
TNIP1 TNFAIP3 interacting protein 1 0 1 1 2
ATG16L1 autophagy related 16-like 1 1 0 0 1
CCL2 chemokine (C-C motif) ligand 2 1 0 0 1
CCL7 chemokine (C-C motif) ligand 7 1 0 0 1
CUL2 cullin 2 1 0 0 1
CYLD cylindromatosis (turban tumor syndrome) 1 0 0 1
FCGR2A Fc fragment of IgG, low affinity IIa, receptor (CD32) 1 0 0 1
IFNG interferon, gamma 1 0 0 1
IFNGR2 interferon gamma receptor 2 (interferon gamma transducer 1) 1 0 0 1
IKZF3 IKAROS family zinc finger 3 (Aiolos) 1 0 0 1
IL10 interleukin 10 1 0 0 1
IL18RAP interleukin 18 receptor accessory protein 1 0 0 1
IL27 interleukin 27 1 0 0 1
JAK2 Janus kinase 2 1 0 0 1
NOD2 nucleotide-binding oligomerization domain containing 2 1 0 0 1
PDGFB platelet-derived growth factor beta polypeptide 1 0 0 1
PTPN2 protein tyrosine phosphatase, non-receptor type 2 1 0 0 1
SMAD3 SMAD family member 3 1 0 0 1
SOCS1 suppressor of cytokine signaling 1 1 0 0 1
TNFSF15 tumor necrosis factor (ligand) superfamily, member 15 1 0 0 1
ZMIZ1 zinc finger, MIZ-type containing 1 1 0 0 1
DDX58 DEAD (Asp-Glu-Ala-Asp) box polypeptide 58 0 1 0 1
ERAP1 endoplasmic reticulum aminopeptidase 1 0 1 0 1
ERAP2 endoplasmic reticulum aminopeptidase 2 0 1 0 1
ETS1 v-ets avian erythroblastosis virus E26 oncogene homolog 1 0 1 0 1
HLA-C major histocompatibility complex, class I, C 0 1 0 1
ICAM3 intercellular adhesion molecule 3 0 1 0 1
IFNLR1 interferon, lambda receptor 1 0 1 0 1
IL12B interleukin 12B 0 1 0 1
IL13 interleukin 13 0 1 0 1
IL23A interleukin 23, alpha subunit p19 0 1 0 1
IL4 interleukin 4 0 1 0 1
IL6R interleukin 6 receptor 0 1 0 1
NOS2 nitric oxide synthase 2, inducible 0 1 0 1
REL v-rel avian reticuloendotheliosis viral oncogene homolog 0 1 0 1
RUNX3 runt-related transcription factor 3 0 1 0 1
SPATA2 spermatogenesis associated 2 0 1 0 1
STAT2 signal transducer and activator of transcription 2, 113kDa 0 1 0 1
TNFAIP3 tumor necrosis factor, alpha-induced protein 3 0 1 0 1
TYK2 tyrosine kinase 2 0 1 0 1
CIITA class II, major histocompatibility complex, transactivator 0 0 1 1
CLEC16A C-type lectin domain family 16, member A 0 0 1 1
ERBB3 erb-b2 receptor tyrosine kinase 3 0 0 1 1
HLA-DQA1 major histocompatibility complex, class II, DQ alpha 1 0 0 1 1
HLA-DQB1 major histocompatibility complex, class II, DQ beta 1 0 0 1 1
IFI30 interferon, gamma-inducible protein 30 0 0 1 1
IL21 interleukin 21 0 0 1 1
IRF5 interferon regulatory factor 5 0 0 1 1
ITGAM integrin, alpha M (complement component 3 receptor 3 subunit) 0 0 1 1
MAPK1 mitogen-activated protein kinase 1 0 0 1 1
NCF2 neutrophil cytosolic factor 2 0 0 1 1
NFKB1 nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 0 0 1 1
PTPN22 protein tyrosine phosphatase, non-receptor type 22 (lymphoid) 0 0 1 1
SH2B3 SH2B adaptor protein 3 0 0 1 1
STAT4 signal transducer and activator of transcription 4 0 0 1 1
TH tyrosine hydroxylase 0 0 1 1
TSHR thyroid stimulating hormone receptor 0 0 1 1
VCAM1 vascular cell adhesion molecule 1 0 0 1 1
VMP1 vacuole membrane protein 1 0 0 1 1

Heatmap of the indicator matrix of Genes X Disease Categories:

data <- df_merged[,3:5]
visHeatmapAdv(t(data), Rowv=F, Colv=F, dendrogram="none", key=F, zlim=c(0,1), colormap="grey-steelblue", add.expr=abline(v=(1:(nrow(data)+1))-0.5,h=(1:(ncol(data)+1))-0.5,,col="white"), lhei=c(1,4), lwid=c(1,5), margins=c(4,10), cexCol=0.8, cexRow=1)

5.2.5 Pathway analysis of network genes across disease categories

Pathway enrichment analysis for network genes is using Reactome pathways:

ontology <- "MsigdbC2CPall"

Pathway enrichments for genes in autoimmune network

data <- V(subnet_im)$name
eTerm <- xEnricherGenes(data=data, ontology=ontology, test="fisher", RData.location=RData.location)
eTerm_im <- xEnrichConciser(eTerm)
xEnrichViewer(eTerm_im, 10)

Barplot of enriched terms

bp <- xEnrichBarplot(eTerm_im, top_num='auto', displayBy="fdr")
bp

name nAnno nOverlap fc zscore pvalue adjp members
Leishmania infection 72 6 38.50 14.90 0.0e+00 2.0e-07 NFKB1, MAPK1, NCF2, ITGAM, HLA-DQB1, HLA-DQA1
Cytokine Signaling in Immune system 268 7 12.10 8.57 8.0e-07 1.5e-05 MAPK1, STAT3, IL2RA, VCAM1, HLA-DQA1, CIITA, IRF5
Ceramide Signaling Pathway 22 3 62.90 13.60 1.3e-05 7.8e-05 NFKB1, MAPK1, TNFRSF1A
Role of ERBB2 in Signal Transduction and Oncology 22 3 62.90 13.60 1.3e-05 7.8e-05 MAPK1, STAT3, ERBB3
IL12-mediated signaling events 63 4 29.30 10.50 8.6e-06 7.8e-05 NFKB1, STAT3, STAT4, IL2RA
Antigen processing and presentation 88 4 21.00 8.78 3.3e-05 1.7e-04 HLA-DQB1, HLA-DQA1, IFI30, CIITA
Bioactive Peptide Induced Signaling Pathway 43 3 32.20 9.56 1.0e-04 3.2e-04 MAPK1, STAT3, STAT4
Cell adhesion molecules (CAMs) 133 4 13.90 6.98 1.6e-04 3.9e-04 ITGAM, VCAM1, HLA-DQB1, HLA-DQA1
Autoimmune thyroid disease 52 3 26.60 8.64 1.8e-04 4.0e-04 HLA-DQB1, HLA-DQA1, TSHR
IL2-mediated signaling events 55 3 25.20 8.38 2.1e-04 4.5e-04 MAPK1, STAT3, IL2RA
Acute myeloid leukemia 57 3 24.30 8.22 2.3e-04 4.7e-04 NFKB1, MAPK1, STAT3
Jak-STAT signaling pathway 155 4 11.90 6.39 3.0e-04 5.6e-04 STAT3, STAT4, IL2RA, IL21
Downstream signaling in naive CD8+ T cells 65 3 21.30 7.66 3.5e-04 5.7e-04 MAPK1, STAT4, IL2RA
Signaling events mediated by HDAC Class I 66 3 21.00 7.59 3.6e-04 5.7e-04 NFKB1, STAT3, TNFRSF1A
Immune System 912 8 4.05 4.53 3.5e-04 5.7e-04 MAPK1, STAT3, NCF2, IL2RA, VCAM1, HLA-DQA1, CIITA, IRF5
Signaling by SCF-KIT 75 3 18.50 7.08 5.3e-04 7.3e-04 MAPK1, STAT3, SH2B3
Toll-like receptor signaling pathway 102 3 13.60 5.95 1.3e-03 1.7e-03 NFKB1, MAPK1, IRF5
Leukocyte transendothelial migration 117 3 11.80 5.50 1.9e-03 2.4e-03 NCF2, ITGAM, VCAM1
Neurotrophin signaling pathway 126 3 11.00 5.26 2.4e-03 2.8e-03 NFKB1, MAPK1, SH2B3
Cytokine-cytokine receptor interaction 266 3 5.21 3.25 1.9e-02 2.1e-02 TNFRSF1A, IL2RA, IL21

Pathway enrichments for genes in mixed network

data <- V(subnet_mx)$name
eTerm <- xEnricherGenes(data=data, ontology=ontology, test="fisher", RData.location=RData.location)
eTerm_mx <- xEnrichConciser(eTerm)
xEnrichViewer(eTerm_mx, 10)

Barplot of enriched terms

bp <- xEnrichBarplot(eTerm_mx, top_num='auto', displayBy="fdr")
bp

name nAnno nOverlap fc zscore pvalue adjp members
Jak-STAT signaling pathway 155 10 28.30 16.40 0.0e+00 0.0e+00 IL4, STAT2, STAT3, IL12B, IL13, IL6R, TYK2, IFNLR1, IL23A, IL23R
IL23-mediated signaling events 37 6 71.10 20.40 0.0e+00 0.0e+00 STAT3, IL12B, TYK2, NOS2, IL23A, IL23R
Cytokine-cytokine receptor interaction 266 8 13.20 9.65 1.0e-07 5.0e-07 IL4, TNFRSF1A, IL12B, IL13, IL6R, IFNLR1, IL23A, IL23R
IL27-mediated signaling events 26 4 67.50 16.20 3.0e-07 1.4e-06 STAT2, STAT3, IL12B, TYK2
IL12-mediated signaling events 63 5 34.80 12.90 2.0e-07 1.4e-06 IL4, STAT3, IL12B, TYK2, NOS2
Interleukin-6 signaling 10 3 132.00 19.70 1.2e-06 5.0e-06 STAT3, IL6R, TYK2
Immune System 912 10 4.81 5.81 9.8e-06 2.7e-05 STAT2, STAT3, TNFAIP3, IL6R, TYK2, HLA-C, ICAM3, DDX58, REL, ERAP1
Cytokine Network 21 3 62.60 13.50 1.3e-05 3.2e-05 IL4, IL12B, IL13
Allograft rejection 37 3 35.60 10.10 7.5e-05 1.3e-04 IL4, IL12B, HLA-C
Adaptive Immune System 524 4 3.35 2.65 2.8e-02 3.2e-02 HLA-C, ICAM3, REL, ERAP1
Pathways in cancer 325 3 4.05 2.68 3.6e-02 3.9e-02 STAT3, ETS1, NOS2

Pathway enrichments for genes in autoinflammatory network

data <- V(subnet_in)$name
eTerm <- xEnricherGenes(data=data, ontology=ontology, test="fisher", RData.location=RData.location)
eTerm_in <- xEnrichConciser(eTerm)
xEnrichViewer(eTerm_in, 10)

Barplot of enriched terms

bp <- xEnrichBarplot(eTerm_in, top_num='auto', displayBy="fdr")
bp

name nAnno nOverlap fc zscore pvalue adjp members
Regulation of IFNG signaling 14 5 142.00 26.50 0.0e+00 0.0e+00 IFNG, JAK2, SOCS1, IFNGR2, PTPN2
IL23-mediated signaling events 37 6 64.60 19.50 0.0e+00 0.0e+00 IFNG, JAK2, STAT3, CCL2, IL18RAP, IL23R
Cytokine-cytokine receptor interaction 266 10 15.00 11.60 0.0e+00 0.0e+00 IFNG, IL10, CCL2, IL2RA, IFNGR2, CCL7, TNFSF15, IL18RAP, IL23R, PDGFB
Jak-STAT signaling pathway 155 8 20.60 12.30 0.0e+00 0.0e+00 IFNG, IL10, JAK2, STAT3, SOCS1, IL2RA, IFNGR2, IL23R
IL12-mediated signaling events 63 6 38.00 14.80 0.0e+00 1.0e-07 IFNG, JAK2, STAT3, SOCS1, IL2RA, IL18RAP
IFN-gamma pathway 40 5 49.80 15.50 0.0e+00 2.0e-07 IFNG, JAK2, STAT3, SOCS1, PTPN2
Cytokine Signaling in Immune system 268 8 11.90 9.09 2.0e-07 8.0e-07 IFNG, JAK2, STAT3, SOCS1, IL2RA, IFNGR2, NOD2, PTPN2
IL2-mediated signaling events 55 5 36.20 13.10 2.0e-07 9.0e-07 IFNG, STAT3, SOCS1, IL2RA, IKZF3
IL27-mediated signaling events 26 4 61.30 15.40 4.0e-07 1.5e-06 IFNG, JAK2, STAT3, IL27
Leishmania infection 72 5 27.70 11.40 8.0e-07 2.5e-06 IFNG, IL10, JAK2, IFNGR2, FCGR2A
Signaling events mediated by PTP1B 52 4 30.70 10.80 7.4e-06 2.1e-05 JAK2, STAT3, PDGFB, FCGR2A
Th1/Th2 Differentiation 19 3 62.90 13.60 1.3e-05 2.9e-05 IFNG, IL2RA, IFNGR2
SHP2 signaling 58 4 27.50 10.20 1.2e-05 2.9e-05 IFNG, JAK2, IL2RA, PDGFB
Role of Tob in T-cell activation 21 3 56.90 12.90 1.8e-05 3.7e-05 IFNG, SMAD3, IL2RA
Immune System 912 10 4.37 5.39 2.8e-05 5.4e-05 IFNG, JAK2, STAT3, SOCS1, IL2RA, IFNGR2, NOD2, CUL2, CYLD, PTPN2
Signaling events mediated by TCPTP 43 3 27.80 8.84 1.6e-04 2.3e-04 STAT3, PDGFB, PTPN2
PDGFR-beta signaling pathway 129 4 12.40 6.52 2.7e-04 3.7e-04 JAK2, STAT3, PDGFB, PTPN2
NOD-like receptor signaling pathway 62 3 19.30 7.25 4.7e-04 6.1e-04 CCL2, CCL7, NOD2
IL4-mediated signaling events 65 3 18.40 7.06 5.4e-04 6.7e-04 IL10, JAK2, SOCS1
Chemokine signaling pathway 190 4 8.39 5.17 1.1e-03 1.3e-03 JAK2, STAT3, CCL2, CCL7
Pathways in cancer 325 4 4.91 3.60 8.0e-03 8.4e-03 STAT3, SMAD3, PDGFB, CUL2

Comparing pathway enrichments across networks

list_eTerm <- list(eTerm_in, eTerm_mx, eTerm_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_Pathway <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=1e-3, wrap.width=50, bar.label=F)
bp_Pathway + theme(axis.text.y=element_text(size=11,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3 Annotation analysis of SNPs across disease categories

# pool together GWAS SNPs for polygenic autoimmune diseases
autoimmu <- c('SLE','ATD','T1D','RA','MS','CEL')
ind <- match(autoimmu, names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
imGR <- GenomicRanges::unlist(GRL)

# pool together GWAS SNPs for mixed diseases
mix <- c('AS','PSO')
ind <- match(mix, names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
mxGR <- GenomicRanges::unlist(GRL)

# pool together GWAS SNPs for polygenic autoinflammatory diseases
autoinfl <- c('IBD','CRO','UC')
ind <- match(autoinfl, names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
inGR <- GenomicRanges::unlist(GRL)

# pool together GWAS SNPs for all diseases
ind <- match(c('SLE','ATD','T1D','RA','MS','CEL','PSO','AS','IBD','CRO','UC'), names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
allGR <- GenomicRanges::unlist(GRL)

GWAS SNPs are used as the test background

background.file <- xRDataLoader(RData.customised='dbSNP_GWAS', RData.location=RData.location)

5.3.1 Via gene annotations

We use gene annotations to interpret pooled GWAS SNPs for each of the three categories by looking directly at genes harbouring these SNPs, that is, functional and phenotypic annotation of genes harbouring GWAS SNPs for each of three disease categories.

5.3.1.1 Using Gene Ontology Molecular Function (GOMF)

ontology <- "GOMF"

GOMF enrichments for SNPs in autoimmune diseases

eTerm_gomf_im <- xGRviaGeneAnno(data.file=imGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_gomf_im, 10)

GOMF enrichments for SNPs in autoinflammatory diseases

eTerm_gomf_in <- xGRviaGeneAnno(data.file=inGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_gomf_in, 10)

GOMF enrichments for SNPs in mixed diseases

eTerm_gomf_mx <- xGRviaGeneAnno(data.file=mxGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_gomf_mx, 10)

Comparing GOMF enrichments across disease categories

list_eTerm <- list(eTerm_gomf_in, eTerm_gomf_mx, eTerm_gomf_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_gomf <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=5e-2, wrap.width=100, bar.label=F, sharings=c(2,3))
bp_gomf + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

DAGplot of terms enriched in any analyses above, nodes/terms colored according to how many times being called significant enrichment.

xEnrichDAGplotAdv(bp_gomf, displayBy="nSig", colormap="white-lightgreen-green", zlim=c(1,3), layout.orientation="left_right", path.mode=c("all_paths","shortest_paths","all_shortest_paths")[1], node.info=c("term_name"), graph.node.attrs=list(fontsize=25), newpage=F)

5.3.1.2 Using Gene Ontology Biological Process (GOBP)

ontology <- "GOBP"

GOBP enrichments for SNPs in autoimmune diseases

eTerm_GOBP_im <- xGRviaGeneAnno(data.file=imGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_GOBP_im, 10)

GOBP enrichments for SNPs in autoinflammatory diseases

eTerm_GOBP_in <- xGRviaGeneAnno(data.file=inGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_GOBP_in, 10)

GOBP enrichments for SNPs in mixed diseases

eTerm_GOBP_mx <- xGRviaGeneAnno(data.file=mxGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_GOBP_mx, 10)

Comparing GOBP enrichments across disease categories

list_eTerm <- list(eTerm_GOBP_in, eTerm_GOBP_mx, eTerm_GOBP_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_GOBP <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=1e-2, wrap.width=100, bar.label=F, sharings=c(2,3))
bp_GOBP + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3.1.3 Using Human Phenotype Phenotypic Abnormality (HPPA)

ontology <- "HPPA"

HPPA enrichments for SNPs in autoimmune diseases

eTerm_hppa_im <- xGRviaGeneAnno(data.file=imGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_hppa_im, 10)

HPPA enrichments for SNPs in autoinflammatory diseases

eTerm_hppa_in <- xGRviaGeneAnno(data.file=inGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_hppa_in, 10)

HPPA enrichments for SNPs in mixed diseases

eTerm_hppa_mx <- xGRviaGeneAnno(data.file=mxGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_hppa_mx, 10)

Comparing HPPA enrichments across disease categories

list_eTerm <- list(eTerm_hppa_in, eTerm_hppa_mx, eTerm_hppa_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_hppa <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=1e-2, wrap.width=50, bar.label=F, sharings=c(2,3))
bp_hppa + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3.1.4 Using Mammalian Phenotypes (MP)

ontology <- "MP"

MP enrichments for SNPs in autoimmune diseases

eTerm_MP_im <- xGRviaGeneAnno(data.file=imGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_MP_im, 10)

MP enrichments for SNPs in autoinflammatory diseases

eTerm_MP_in <- xGRviaGeneAnno(data.file=inGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_MP_in, 10)

MP enrichments for SNPs in mixed diseases

eTerm_MP_mx <- xGRviaGeneAnno(data.file=mxGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_MP_mx, 10)

Comparing MP enrichments across disease categories

list_eTerm <- list(eTerm_MP_in, eTerm_MP_mx, eTerm_MP_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_MP <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=1e-6, wrap.width=100, bar.label=F, sharings=c(2,3))
bp_MP + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3.2 Via genomic annotations

Using functional genomic annotations supported in XGR, we are also able to compare and define genetic and epigenetic characteristics underlying each of the three categories.

5.3.2.1 FANTOM5 enhancer analysis

Enhancer enrichment analysis is using cell-type-specific expressed enhancers:

GR.annotation <- "FANTOM5_Enhancer_Cell"

Enhancer enrichments for SNPs in autoimmune diseases

eTerm_enhancer_im <- xGRviaGenomicAnno(data.file=imGR, background.file=background.file, format.file="GRanges", GR.annotation=GR.annotation, RData.location=RData.location)
xEnrichViewer(eTerm_enhancer_im, 10)

Enhancer enrichments for SNPs in autoinflammatory diseases

eTerm_enhancer_in <- xGRviaGenomicAnno(data.file=inGR, background.file=background.file, format.file="GRanges", GR.annotation=GR.annotation, RData.location=RData.location)
xEnrichViewer(eTerm_enhancer_in, 10)

Enhancer enrichments for SNPs in mixed diseases

eTerm_enhancer_mx <- xGRviaGenomicAnno(data.file=mxGR, background.file=background.file, format.file="GRanges", GR.annotation=GR.annotation, RData.location=RData.location)
xEnrichViewer(eTerm_enhancer_mx, 10)

Comparing enhancer enrichments across disease categories

list_eTerm <- list(eTerm_enhancer_in, eTerm_enhancer_mx, eTerm_enhancer_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_enhancer <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=5e-2, wrap.width=50, bar.label=F)
bp_enhancer + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3.2.2 ENCODE TFBS analysis

ChIP-seq TFBS enrichment analysis is using uniformly identified TFBSs assayed in the cell line Gm12878 (human lymphoblastoid cell line):

# load all cell-type-specific TFBSs
all_gr <- xRDataLoader(RData.customised="Uniform_TFBS", RData.location=RData.location)
# extract TFBSs in Gm12878, stored in a list of GR objects called 'anno_gr'
ind <- grep('Gm12878', names(all_gr))
anno_gr <- all_gr[ind]

TFBS enrichments for SNPs in autoimmune diseases

eTerm_tfbs_im <- xGRviaGenomicAnno(data.file=imGR, annotation.file=anno_gr, background.file=background.file, format.file="GRanges", RData.location=RData.location)
xEnrichViewer(eTerm_tfbs_im, 10)

TFBS enrichments for SNPs in autoinflammatory diseases

eTerm_tfbs_in <- xGRviaGenomicAnno(data.file=inGR, annotation.file=anno_gr, background.file=background.file, format.file="GRanges", RData.location=RData.location)
xEnrichViewer(eTerm_tfbs_in, 10)

TFBS enrichments for SNPs in mixed diseases

eTerm_tfbs_mx <- xGRviaGenomicAnno(data.file=mxGR, annotation.file=anno_gr, background.file=background.file, format.file="GRanges", RData.location=RData.location)
xEnrichViewer(eTerm_tfbs_mx, 10)

Comparing TFBS enrichments across disease categories

list_eTerm <- list(eTerm_tfbs_in, eTerm_tfbs_mx, eTerm_tfbs_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_tfbs <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=5e-2, wrap.width=50, bar.label=F, sharings=c(2,3))
bp_tfbs + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3.2.3 ENCODE histone analysis

Histone enrichment analysis is using uniformly identified histone marks assayed in the cell line Gm12878 (human lymphoblastoid cell line):

# load all cell-type-specific histone marks
all_gr <- xRDataLoader(RData.customised="Broad_Histone", RData.location=RData.location)
# extract histone marks in Gm12878, stored in a list of GR objects called 'anno_gr'
ind <- grep('Gm12878', names(all_gr))
anno_gr <- all_gr[ind]

histone enrichments for SNPs in autoimmune diseases

eTerm_histone_im <- xGRviaGenomicAnno(data.file=imGR, annotation.file=anno_gr, background.file=background.file, format.file="GRanges", RData.location=RData.location)
xEnrichViewer(eTerm_histone_im, 10)

histone enrichments for SNPs in autoinflammatory diseases

eTerm_histone_in <- xGRviaGenomicAnno(data.file=inGR, annotation.file=anno_gr, background.file=background.file, format.file="GRanges", RData.location=RData.location)
xEnrichViewer(eTerm_histone_in, 10)

histone enrichments for SNPs in mixed diseases

eTerm_histone_mx <- xGRviaGenomicAnno(data.file=mxGR, annotation.file=anno_gr, background.file=background.file, format.file="GRanges", RData.location=RData.location)
xEnrichViewer(eTerm_histone_mx, 10)

Comparing histone enrichments across disease categories

list_eTerm <- list(eTerm_histone_in, eTerm_histone_mx, eTerm_histone_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_histone <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=1e-2, wrap.width=50, bar.label=F, sharings=c(2,3))
bp_histone + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3.2.4 ENCODE segment analysis

Genome segmentation enrichment analysis is using 7 category segments in the cell line GM12878:

GR.annotation <- "Segment_Combined_Gm12878"

segment enrichments for SNPs in autoimmune diseases

eTerm_segment_im <- xGRviaGenomicAnno(data.file=imGR, background.file=background.file, format.file="GRanges", GR.annotation=GR.annotation, RData.location=RData.location)
xEnrichViewer(eTerm_segment_im, 10)

segment enrichments for SNPs in autoinflammatory diseases

eTerm_segment_in <- xGRviaGenomicAnno(data.file=inGR, background.file=background.file, format.file="GRanges", GR.annotation=GR.annotation, RData.location=RData.location)
xEnrichViewer(eTerm_segment_in, 10)

segment enrichments for SNPs in mixed diseases

eTerm_segment_mx <- xGRviaGenomicAnno(data.file=mxGR, background.file=background.file, format.file="GRanges", GR.annotation=GR.annotation, RData.location=RData.location)
xEnrichViewer(eTerm_segment_mx, 10)

Comparing segment enrichments across disease categories

list_eTerm <- list(eTerm_segment_in, eTerm_segment_mx, eTerm_segment_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_segment <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=5e-2, wrap.width=50, bar.label=F, sharings=c(2,3))
bp_segment + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

6 Session Info

Here is the output of sessionInfo() on the system on which this user manual was built:

> R version 3.2.4 (2016-03-10)
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> Running under: OS X 10.11.4 (El Capitan)
> 
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
> 
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods  
> [8] base     
> 
> other attached packages:
> [1] VennDiagram_1.6.16  futile.logger_1.4.1 XGR_1.0.2          
> [4] ggplot2_2.1.0       dnet_1.0.9          supraHex_1.11.1    
> [7] hexbin_1.27.1       igraph_1.0.1        rmarkdown_0.9.5    
> 
> loaded via a namespace (and not attached):
>  [1] Rcpp_0.12.5             formatR_1.3            
>  [3] highr_0.5.1             plyr_1.8.3             
>  [5] GenomeInfoDb_1.4.3      XVector_0.8.0          
>  [7] futile.options_1.0.0    bitops_1.0-6           
>  [9] zlibbioc_1.14.0         tools_3.2.4            
> [11] digest_0.6.9            evaluate_0.8.3         
> [13] nlme_3.1-125            gtable_0.2.0           
> [15] lattice_0.20-33         Matrix_1.2-4           
> [17] graph_1.46.0            Rgraphviz_2.12.0       
> [19] yaml_2.1.13             parallel_3.2.4         
> [21] rtracklayer_1.28.10     stringr_1.0.0          
> [23] knitr_1.12.3            Biostrings_2.36.4      
> [25] RCircos_1.1.3           S4Vectors_0.6.6        
> [27] IRanges_2.2.9           stats4_3.2.4           
> [29] Biobase_2.28.0          BiocParallel_1.2.22    
> [31] XML_3.98-1.4            reshape2_1.4.1         
> [33] lambda.r_1.1.7          magrittr_1.5           
> [35] GenomicAlignments_1.4.2 Rsamtools_1.20.5       
> [37] scales_0.4.0            htmltools_0.3          
> [39] BiocGenerics_0.14.0     GenomicRanges_1.20.8   
> [41] ape_3.5                 colorspace_1.2-6       
> [43] labeling_0.3            stringi_1.1.1          
> [45] RCurl_1.95-4.8          munsell_0.4.3