1 Web-app (linkto)

2 Summary

This demo showcases the analytic power of using XGR to make sense of GWAS SNPs, including network analysis, enrichment analysis, and annotation analysis. Network analysis supported in XGR is unique in its power to identify SNP-modulated gene networks (via SNP scoring, gene scoring and network scoring) and to perform cross-disease analysis. When coupled with annotation analysis (via looking at nearby gene annotations by ontologies or via looking at co-localised functional genomic annotations), XGR is able to perform in-depth interpretations of the underlying genetic landscape of immunological diseases based on GWAS summary data.

3 Packages

library(XGR)

# Specify the locations of built-in data
RData.location <- "http://galahad.well.ox.ac.uk/bigdata"

4 Data

The dataset ImmunoBase is GWAS lead SNPs associated with immunologically related human diseases, obtained from ImmunoBase.

ImmunoBase <- xRDataLoader(RData.customised='ImmunoBase', RData.location=RData.location)
# get info about diseases
disease_list <- lapply(ImmunoBase, function(x) x$disease)
# get the number of disease associated variants/SNPs
variants_list <- lapply(ImmunoBase, function(x) length(unique(names(x$variants))))
# get the number of genes that are located within 50kb distance window of SNPs
genes_list <- lapply(ImmunoBase, function(x) length(x$genes_variants<=50000))
# create a data frame
df_ib <- data.frame(Code=names(ImmunoBase), Disease=unlist(disease_list), num_SNPs=unlist(variants_list), num_nearby_genes=unlist(genes_list), stringsAsFactors=F)

We restrict common diseases and each having at least 30 GWAS lead SNPs. As such we have 11 common diseases for cross-disease analysis below.

ind <- match(c('IBD','CRO','UC','PSO','AS','RA','CEL','T1D','MS','SLE','ATD'), names(ImmunoBase))
IB <- ImmunoBase[ind]

A summary of 11 diseases (along with GWAS lead SNPs and their nearby genes):

Code	Disease	num_SNPs	num_nearby_genes
IBD	Inflammatory Bowel Disease (IBD)	148	2280
CRO	Crohn’s Disease (CRO)	174	1945
UC	Ulcerative Colitis (UC)	120	1651
PSO	Psoriasis (PSO)	68	624
AS	Ankylosing Spondylitis (AS)	32	437
RA	Rheumatoid Arthritis (RA)	165	1070
CEL	Celiac Disease (CEL)	144	734
T1D	Type 1 Diabetes (T1D)	180	1046
MS	Multiple Sclerosis (MS)	237	1725
SLE	Systemic Lupus Erythematosus (SLE)	61	742
ATD	Autoimmune Thyroid Disease (ATD)	33	233

5 Showcase

5.1 Scoring from SNPs to genes

Do SNP scoring and gene scoring for each disease

disease_ls <- lapply(IB, function(x){
    gr <- x$variant
    data <- GenomicRanges::mcols(gr)[, c('Variant','Pvalue')]
    xSNP2GeneScores(data=data, include.LD='EUR', LD.r2=0.8, significance.threshold=5e-8, distance.max=50000, decay.kernel="rapid", decay.exponent=2, GR.SNP="dbSNP_GWAS", GR.Gene="UCSC_knownCanonical", scoring.scheme="max", verbose=T, RData.location=RData.location)
})

Get all diseases and all genes

## all diseases
allDiseases <- names(IB)
## all Genes
res_Gene <- lapply(disease_ls, function(x) x$Gene$Gene)
allGenes <- unique(unlist(res_Gene))

A matrix of Genes X Diseases:

mat_Gene <- 0
for(i in 1:length(disease_ls)){
    df <- disease_ls[[i]]$Gene
    df$Disease <- rep(allDiseases[i], nrow(df))
    mat <- xSparseMatrix(df[,c("Gene","Disease","Score")], rows=allGenes, columns=allDiseases)
    mat_Gene <- mat_Gene + mat
}
mat_Gene <- as.matrix(mat_Gene)

A neighbor-joining tree (with bootstrap confidence values) based on the matrix of Genes X Diseases. The distance between two diseases is defined as the sum of their difference in gene scores (that is, manhattan/cityblock distance).

data <- mat_Gene
tree_bs <- visTreeBootstrap(data=t(data), algorithm="nj", metric="manhattan", num.bootstrap=10000, reroot="min.bootstrap", visTree=T, nodelabels.arg=list(cex=0.6))

The corresponding consensus tree of the above bootstrapped tree is (based on the majority voting):

data <- mat_Gene
tree_cons <- visTreeBootstrap(data=t(data), algorithm="nj", metric="manhattan", num.bootstrap=10000, consensus=T, reroot="min.bootstrap", visTree=T)
g <- igraph::as.igraph(tree_cons)
vertex.label <- V(g)$name
vertex.label <- gsub("Node.*", "", vertex.label)
vertex.size <- rep(12, vcount(g))
vertex.size[vertex.label==''] <- 2
xVisNet(g=g, glayout=layout_with_kk, vertex.shape="sphere", vertex.size=vertex.size, vertex.color="white", vertex.label=vertex.label, vertex.label.dist=0.9, vertex.label.font=2, , newpage=F)

Heatmap of the 1311 Genes X 11 Diseases matrix:

tree_bs_ind <- match(tree_bs$tip.label, colnames(mat_Gene))
data <- mat_Gene[, rev(tree_bs_ind)]
visHeatmapAdv(data=data, Rowv=T, Colv=F, dendrogram="none", dist.metric="manhattan", colormap="lightblue-blue", labRow=NA, KeyValueName="Gene scores", zlim=c(0,8), add.expr=abline(v=(1:(ncol(data)+1))-0.5,col="white"), lmat=rbind(c(4,3), c(2,1)), lhei=c(1,4), lwid=c(1,3), margins=c(12,2), srtCol=30, cexCol=0.8)

5.2 SNP-modulated gene networks underlying disease categories

Neighbour-joining tree of above 11 common diseases (based on scored genes under the genetic influence of GWAS SNPs) nicely captures the current disease classification/taxonomy according to the genetic and cellular basis fo autoinflammation and autoimmunity, against the traditional classification by organ or system affected.

The tree shows the analysable diseases span the autoinflammatory-autoimmne continuum (a classification system reflecting the relative role of innate immune responses versus adaptive immune responses played in the disease development), including three categories:

Classic polygenic autoinflammatory diseases, that is, polygenic diseases with a prominent autoinflammatory component, including IBD, CRO and UC;
classic polygenic autoimmune diseases, that is, polygenic diseases with a prominent autoimmune component, including SLE, ATD, CEL, MS, T1D and RA;
mixed diseases having both components, including PSO and AS.

Our analysis also shows that mixed diseases have more the autoinflammatory component than the autoimmune component. Polygenic autoinflammatory diseases may be subdivided into two subtypes, the one for SLE and ATD, the other for CEL, MS, T1D and RA.

To understand the molecular basis of these three disease categories, we next aim to identify the top SNP-modulated gene networks based on pooled GWAS SNPs for each category.

5.2.1 Polygenic autoinflammatory diseases

# pool together GWAS SNPs
autoinfl <- c('IBD','CRO','UC')
ind <- match(autoinfl, names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
inGR <- GenomicRanges::unlist(GRL)

# find maximum-scoring gene subnetwork with the desired node number=23
data <- GenomicRanges::mcols(inGR)[, c('Variant','Pvalue')]
subnet_in <- xSubneterSNPs(data=data, significance.threshold=5e-8, distance.max=50000, network="STRING_high", decay.kernel="rapid", decay.exponent=2, GR.SNP="dbSNP_GWAS", GR.Gene="UCSC_knownCanonical", scoring.scheme="max", seed.genes=T, subnet.size=23, RData.location=RData.location)

The identified gene network with 23 nodes colored according to gene scores (the higher the more signficant) is shown below. Notably, if nodes appear abnormally, please remove vertex.shape="sphere" when running the function xVisNet.

subnet <- subnet_in
pattern <- -log10(as.numeric(V(subnet)$significance))
xVisNet(g=subnet, pattern=pattern, glayout=layout_with_kk, vertex.shape="sphere", colormap="white-yellow-orange", zlim=c(0,8), newpage=F, vertex.size=10, vertex.label.color="blue", vertex.label.dist=0.45, vertex.label.font=3)

5.2.2 Mixed diseases

# pool together GWAS SNPs
mix <- c('AS','PSO')
ind <- match(mix, names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
mxGR <- GenomicRanges::unlist(GRL)

# find maximum-scoring gene subnetwork with the desired node number=23
data <- GenomicRanges::mcols(mxGR)[, c('Variant','Pvalue')]
subnet_mx <- xSubneterSNPs(data=data, significance.threshold=5e-8, distance.max=50000, network="STRING_high", decay.kernel="rapid", decay.exponent=2, GR.SNP="dbSNP_GWAS", GR.Gene="UCSC_knownCanonical", scoring.scheme="max", seed.genes=T, subnet.size=23, RData.location=RData.location)

subnet <- subnet_mx
pattern <- -log10(as.numeric(V(subnet)$significance))
xVisNet(g=subnet, pattern=pattern, glayout=layout_with_kk, vertex.shape="sphere", colormap="white-lightcyan-cyan", zlim=c(0,8), newpage=F, vertex.size=10, vertex.label.color="blue", vertex.label.dist=0.45, vertex.label.font=3)

5.2.3 Polygenic autoimmune diseases

# pool together GWAS SNPs
autoimmu <- c('SLE','ATD','T1D','RA','MS','CEL')
ind <- match(autoimmu, names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
imGR <- GenomicRanges::unlist(GRL)

# find maximum-scoring gene subnetwork with the desired node number=23
data <- GenomicRanges::mcols(imGR)[, c('Variant','Pvalue')]
subnet_im <- xSubneterSNPs(data=data, significance.threshold=5e-8, distance.max=50000, network="STRING_high", decay.kernel="rapid", decay.exponent=2, GR.SNP="dbSNP_GWAS", GR.Gene="UCSC_knownCanonical", scoring.scheme="max", seed.genes=T, subnet.size=23, RData.location=RData.location)

subnet <- subnet_im
pattern <- -log10(as.numeric(V(subnet)$significance))
xVisNet(g=subnet, pattern=pattern, glayout=layout_with_kk, vertex.shape="sphere", colormap="white-red-darkred", zlim=c(0,8), newpage=F, vertex.size=10, vertex.label.color="blue", vertex.label.dist=0.45, vertex.label.font=3)

5.2.4 Comparing network genes across disease categories

# Combine into an igraph object called "net_merged"
list_igraph <- list(subnet_in, subnet_mx, subnet_im)
## edges
ls_edges <- lapply(list_igraph, function(x){
    igraph::get.data.frame(x, what="edges")
})
relations <- do.call(rbind, ls_edges)
relations <- relations[!duplicated(relations), ]
## nodes
ls_nodes <- lapply(list_igraph, function(x){
    igraph::get.data.frame(x, what="vertices")
})
nodes <- do.call(rbind, ls_nodes)
nodes <- nodes[,c('name','description')]
nodes <- nodes[!duplicated(nodes), ]
## net_merged
net_merged <- igraph::graph.data.frame(d=relations, directed=T, vertices=nodes)

# combined into a data frame called 'df_merged'
## indicator matrix
df_nodes <- igraph::get.data.frame(net_merged, what="vertices")
mat_merged <- matrix(0, nrow=nrow(df_nodes), ncol=3)
colnames(mat_merged) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
rownames(mat_merged) <- rownames(df_nodes)
mat_merged[V(subnet_in)$name, 1] <- 1
mat_merged[V(subnet_mx)$name, 2] <- 1
mat_merged[V(subnet_im)$name, 3] <- 1
## number of genes being shared
nSharings <- apply(mat_merged, 1, sum)
df_merged <- data.frame(df_nodes, mat_merged, nSharings=nSharings, stringsAsFactors=F)

# reorder
ind <- with(df_merged, order(-nSharings, -Autoinflammatory, -Mixed, -Autoimmune, rownames(df_merged)))
df_merged <- df_merged[ind, ]
#write.table(df_merged, file="network_genes_merged.txt", sep="\t", row.names=F)

A table listing network genes across 3 disease categories is shown below. The 0-1 codings indicate whether a gene is present or absent.

name	description	Autoinflammatory	Mixed	Autoimmune	nSharings
STAT3	signal transducer and activator of transcription 3 (acute-phase response factor)	1	1	1	3
IL23R	interleukin 23 receptor	1	1	0	2
IL2RA	interleukin 2 receptor, alpha	1	0	1	2
TNFRSF1A	tumor necrosis factor receptor superfamily, member 1A	0	1	1	2
TNIP1	TNFAIP3 interacting protein 1	0	1	1	2
ATG16L1	autophagy related 16-like 1	1	0	0	1
CCL2	chemokine (C-C motif) ligand 2	1	0	0	1
CCL7	chemokine (C-C motif) ligand 7	1	0	0	1
CUL2	cullin 2	1	0	0	1
CYLD	cylindromatosis (turban tumor syndrome)	1	0	0	1
FCGR2A	Fc fragment of IgG, low affinity IIa, receptor (CD32)	1	0	0	1
IFNG	interferon, gamma	1	0	0	1
IFNGR2	interferon gamma receptor 2 (interferon gamma transducer 1)	1	0	0	1
IKZF3	IKAROS family zinc finger 3 (Aiolos)	1	0	0	1
IL10	interleukin 10	1	0	0	1
IL18RAP	interleukin 18 receptor accessory protein	1	0	0	1
IL27	interleukin 27	1	0	0	1
JAK2	Janus kinase 2	1	0	0	1
NOD2	nucleotide-binding oligomerization domain containing 2	1	0	0	1
PDGFB	platelet-derived growth factor beta polypeptide	1	0	0	1
PTPN2	protein tyrosine phosphatase, non-receptor type 2	1	0	0	1
SMAD3	SMAD family member 3	1	0	0	1
SOCS1	suppressor of cytokine signaling 1	1	0	0	1
TNFSF15	tumor necrosis factor (ligand) superfamily, member 15	1	0	0	1
ZMIZ1	zinc finger, MIZ-type containing 1	1	0	0	1
DDX58	DEAD (Asp-Glu-Ala-Asp) box polypeptide 58	0	1	0	1
ERAP1	endoplasmic reticulum aminopeptidase 1	0	1	0	1
ERAP2	endoplasmic reticulum aminopeptidase 2	0	1	0	1
ETS1	v-ets avian erythroblastosis virus E26 oncogene homolog 1	0	1	0	1
HLA-C	major histocompatibility complex, class I, C	0	1	0	1
ICAM3	intercellular adhesion molecule 3	0	1	0	1
IFNLR1	interferon, lambda receptor 1	0	1	0	1
IL12B	interleukin 12B	0	1	0	1
IL13	interleukin 13	0	1	0	1
IL23A	interleukin 23, alpha subunit p19	0	1	0	1
IL4	interleukin 4	0	1	0	1
IL6R	interleukin 6 receptor	0	1	0	1
NOS2	nitric oxide synthase 2, inducible	0	1	0	1
REL	v-rel avian reticuloendotheliosis viral oncogene homolog	0	1	0	1
RUNX3	runt-related transcription factor 3	0	1	0	1
SPATA2	spermatogenesis associated 2	0	1	0	1
STAT2	signal transducer and activator of transcription 2, 113kDa	0	1	0	1
TNFAIP3	tumor necrosis factor, alpha-induced protein 3	0	1	0	1
TYK2	tyrosine kinase 2	0	1	0	1
CIITA	class II, major histocompatibility complex, transactivator	0	0	1	1
CLEC16A	C-type lectin domain family 16, member A	0	0	1	1
ERBB3	erb-b2 receptor tyrosine kinase 3	0	0	1	1
HLA-DQA1	major histocompatibility complex, class II, DQ alpha 1	0	0	1	1
HLA-DQB1	major histocompatibility complex, class II, DQ beta 1	0	0	1	1
IFI30	interferon, gamma-inducible protein 30	0	0	1	1
IL21	interleukin 21	0	0	1	1
IRF5	interferon regulatory factor 5	0	0	1	1
ITGAM	integrin, alpha M (complement component 3 receptor 3 subunit)	0	0	1	1
MAPK1	mitogen-activated protein kinase 1	0	0	1	1
NCF2	neutrophil cytosolic factor 2	0	0	1	1
NFKB1	nuclear factor of kappa light polypeptide gene enhancer in B-cells 1	0	0	1	1
PTPN22	protein tyrosine phosphatase, non-receptor type 22 (lymphoid)	0	0	1	1
SH2B3	SH2B adaptor protein 3	0	0	1	1
STAT4	signal transducer and activator of transcription 4	0	0	1	1
TH	tyrosine hydroxylase	0	0	1	1
TSHR	thyroid stimulating hormone receptor	0	0	1	1
VCAM1	vascular cell adhesion molecule 1	0	0	1	1
VMP1	vacuole membrane protein 1	0	0	1	1

Heatmap of the indicator matrix of Genes X Disease Categories:

data <- df_merged[,3:5]
visHeatmapAdv(t(data), Rowv=F, Colv=F, dendrogram="none", key=F, zlim=c(0,1), colormap="grey-steelblue", add.expr=abline(v=(1:(nrow(data)+1))-0.5,h=(1:(ncol(data)+1))-0.5,,col="white"), lhei=c(1,4), lwid=c(1,5), margins=c(4,10), cexCol=0.8, cexRow=1)

5.2.5 Pathway analysis of network genes across disease categories

Pathway enrichment analysis for network genes is using Reactome pathways:

ontology <- "MsigdbC2CPall"

Pathway enrichments for genes in autoimmune network

data <- V(subnet_im)$name
eTerm <- xEnricherGenes(data=data, ontology=ontology, test="fisher", RData.location=RData.location)
eTerm_im <- xEnrichConciser(eTerm)
xEnrichViewer(eTerm_im, 10)

Barplot of enriched terms

bp <- xEnrichBarplot(eTerm_im, top_num='auto', displayBy="fdr")
bp

name	nAnno	nOverlap	fc	zscore	pvalue	adjp	members
Leishmania infection	72	6	38.50	14.90	0.0e+00	2.0e-07	NFKB1, MAPK1, NCF2, ITGAM, HLA-DQB1, HLA-DQA1
Cytokine Signaling in Immune system	268	7	12.10	8.57	8.0e-07	1.5e-05	MAPK1, STAT3, IL2RA, VCAM1, HLA-DQA1, CIITA, IRF5
Ceramide Signaling Pathway	22	3	62.90	13.60	1.3e-05	7.8e-05	NFKB1, MAPK1, TNFRSF1A
Role of ERBB2 in Signal Transduction and Oncology	22	3	62.90	13.60	1.3e-05	7.8e-05	MAPK1, STAT3, ERBB3
IL12-mediated signaling events	63	4	29.30	10.50	8.6e-06	7.8e-05	NFKB1, STAT3, STAT4, IL2RA
Antigen processing and presentation	88	4	21.00	8.78	3.3e-05	1.7e-04	HLA-DQB1, HLA-DQA1, IFI30, CIITA
Bioactive Peptide Induced Signaling Pathway	43	3	32.20	9.56	1.0e-04	3.2e-04	MAPK1, STAT3, STAT4
Cell adhesion molecules (CAMs)	133	4	13.90	6.98	1.6e-04	3.9e-04	ITGAM, VCAM1, HLA-DQB1, HLA-DQA1
Autoimmune thyroid disease	52	3	26.60	8.64	1.8e-04	4.0e-04	HLA-DQB1, HLA-DQA1, TSHR
IL2-mediated signaling events	55	3	25.20	8.38	2.1e-04	4.5e-04	MAPK1, STAT3, IL2RA
Acute myeloid leukemia	57	3	24.30	8.22	2.3e-04	4.7e-04	NFKB1, MAPK1, STAT3
Jak-STAT signaling pathway	155	4	11.90	6.39	3.0e-04	5.6e-04	STAT3, STAT4, IL2RA, IL21
Downstream signaling in naive CD8+ T cells	65	3	21.30	7.66	3.5e-04	5.7e-04	MAPK1, STAT4, IL2RA
Signaling events mediated by HDAC Class I	66	3	21.00	7.59	3.6e-04	5.7e-04	NFKB1, STAT3, TNFRSF1A
Immune System	912	8	4.05	4.53	3.5e-04	5.7e-04	MAPK1, STAT3, NCF2, IL2RA, VCAM1, HLA-DQA1, CIITA, IRF5
Signaling by SCF-KIT	75	3	18.50	7.08	5.3e-04	7.3e-04	MAPK1, STAT3, SH2B3
Toll-like receptor signaling pathway	102	3	13.60	5.95	1.3e-03	1.7e-03	NFKB1, MAPK1, IRF5
Leukocyte transendothelial migration	117	3	11.80	5.50	1.9e-03	2.4e-03	NCF2, ITGAM, VCAM1
Neurotrophin signaling pathway	126	3	11.00	5.26	2.4e-03	2.8e-03	NFKB1, MAPK1, SH2B3
Cytokine-cytokine receptor interaction	266	3	5.21	3.25	1.9e-02	2.1e-02	TNFRSF1A, IL2RA, IL21

Pathway enrichments for genes in mixed network

data <- V(subnet_mx)$name
eTerm <- xEnricherGenes(data=data, ontology=ontology, test="fisher", RData.location=RData.location)
eTerm_mx <- xEnrichConciser(eTerm)
xEnrichViewer(eTerm_mx, 10)

Barplot of enriched terms

bp <- xEnrichBarplot(eTerm_mx, top_num='auto', displayBy="fdr")
bp

name	nAnno	nOverlap	fc	zscore	pvalue	adjp	members
Jak-STAT signaling pathway	155	10	28.30	16.40	0.0e+00	0.0e+00	IL4, STAT2, STAT3, IL12B, IL13, IL6R, TYK2, IFNLR1, IL23A, IL23R
IL23-mediated signaling events	37	6	71.10	20.40	0.0e+00	0.0e+00	STAT3, IL12B, TYK2, NOS2, IL23A, IL23R
Cytokine-cytokine receptor interaction	266	8	13.20	9.65	1.0e-07	5.0e-07	IL4, TNFRSF1A, IL12B, IL13, IL6R, IFNLR1, IL23A, IL23R
IL27-mediated signaling events	26	4	67.50	16.20	3.0e-07	1.4e-06	STAT2, STAT3, IL12B, TYK2
IL12-mediated signaling events	63	5	34.80	12.90	2.0e-07	1.4e-06	IL4, STAT3, IL12B, TYK2, NOS2
Interleukin-6 signaling	10	3	132.00	19.70	1.2e-06	5.0e-06	STAT3, IL6R, TYK2
Immune System	912	10	4.81	5.81	9.8e-06	2.7e-05	STAT2, STAT3, TNFAIP3, IL6R, TYK2, HLA-C, ICAM3, DDX58, REL, ERAP1
Cytokine Network	21	3	62.60	13.50	1.3e-05	3.2e-05	IL4, IL12B, IL13
Allograft rejection	37	3	35.60	10.10	7.5e-05	1.3e-04	IL4, IL12B, HLA-C
Adaptive Immune System	524	4	3.35	2.65	2.8e-02	3.2e-02	HLA-C, ICAM3, REL, ERAP1
Pathways in cancer	325	3	4.05	2.68	3.6e-02	3.9e-02	STAT3, ETS1, NOS2

Pathway enrichments for genes in autoinflammatory network

data <- V(subnet_in)$name
eTerm <- xEnricherGenes(data=data, ontology=ontology, test="fisher", RData.location=RData.location)
eTerm_in <- xEnrichConciser(eTerm)
xEnrichViewer(eTerm_in, 10)

Barplot of enriched terms

bp <- xEnrichBarplot(eTerm_in, top_num='auto', displayBy="fdr")
bp

name	nAnno	nOverlap	fc	zscore	pvalue	adjp	members
Regulation of IFNG signaling	14	5	142.00	26.50	0.0e+00	0.0e+00	IFNG, JAK2, SOCS1, IFNGR2, PTPN2
IL23-mediated signaling events	37	6	64.60	19.50	0.0e+00	0.0e+00	IFNG, JAK2, STAT3, CCL2, IL18RAP, IL23R
Cytokine-cytokine receptor interaction	266	10	15.00	11.60	0.0e+00	0.0e+00	IFNG, IL10, CCL2, IL2RA, IFNGR2, CCL7, TNFSF15, IL18RAP, IL23R, PDGFB
Jak-STAT signaling pathway	155	8	20.60	12.30	0.0e+00	0.0e+00	IFNG, IL10, JAK2, STAT3, SOCS1, IL2RA, IFNGR2, IL23R
IL12-mediated signaling events	63	6	38.00	14.80	0.0e+00	1.0e-07	IFNG, JAK2, STAT3, SOCS1, IL2RA, IL18RAP
IFN-gamma pathway	40	5	49.80	15.50	0.0e+00	2.0e-07	IFNG, JAK2, STAT3, SOCS1, PTPN2
Cytokine Signaling in Immune system	268	8	11.90	9.09	2.0e-07	8.0e-07	IFNG, JAK2, STAT3, SOCS1, IL2RA, IFNGR2, NOD2, PTPN2
IL2-mediated signaling events	55	5	36.20	13.10	2.0e-07	9.0e-07	IFNG, STAT3, SOCS1, IL2RA, IKZF3
IL27-mediated signaling events	26	4	61.30	15.40	4.0e-07	1.5e-06	IFNG, JAK2, STAT3, IL27
Leishmania infection	72	5	27.70	11.40	8.0e-07	2.5e-06	IFNG, IL10, JAK2, IFNGR2, FCGR2A
Signaling events mediated by PTP1B	52	4	30.70	10.80	7.4e-06	2.1e-05	JAK2, STAT3, PDGFB, FCGR2A
Th1/Th2 Differentiation	19	3	62.90	13.60	1.3e-05	2.9e-05	IFNG, IL2RA, IFNGR2
SHP2 signaling	58	4	27.50	10.20	1.2e-05	2.9e-05	IFNG, JAK2, IL2RA, PDGFB
Role of Tob in T-cell activation	21	3	56.90	12.90	1.8e-05	3.7e-05	IFNG, SMAD3, IL2RA
Immune System	912	10	4.37	5.39	2.8e-05	5.4e-05	IFNG, JAK2, STAT3, SOCS1, IL2RA, IFNGR2, NOD2, CUL2, CYLD, PTPN2
Signaling events mediated by TCPTP	43	3	27.80	8.84	1.6e-04	2.3e-04	STAT3, PDGFB, PTPN2
PDGFR-beta signaling pathway	129	4	12.40	6.52	2.7e-04	3.7e-04	JAK2, STAT3, PDGFB, PTPN2
NOD-like receptor signaling pathway	62	3	19.30	7.25	4.7e-04	6.1e-04	CCL2, CCL7, NOD2
IL4-mediated signaling events	65	3	18.40	7.06	5.4e-04	6.7e-04	IL10, JAK2, SOCS1
Chemokine signaling pathway	190	4	8.39	5.17	1.1e-03	1.3e-03	JAK2, STAT3, CCL2, CCL7
Pathways in cancer	325	4	4.91	3.60	8.0e-03	8.4e-03	STAT3, SMAD3, PDGFB, CUL2

Comparing pathway enrichments across networks

list_eTerm <- list(eTerm_in, eTerm_mx, eTerm_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_Pathway <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=1e-3, wrap.width=50, bar.label=F)
bp_Pathway + theme(axis.text.y=element_text(size=11,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3 Annotation analysis of SNPs across disease categories

# pool together GWAS SNPs for polygenic autoimmune diseases
autoimmu <- c('SLE','ATD','T1D','RA','MS','CEL')
ind <- match(autoimmu, names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
imGR <- GenomicRanges::unlist(GRL)

# pool together GWAS SNPs for mixed diseases
mix <- c('AS','PSO')
ind <- match(mix, names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
mxGR <- GenomicRanges::unlist(GRL)

# pool together GWAS SNPs for polygenic autoinflammatory diseases
autoinfl <- c('IBD','CRO','UC')
ind <- match(autoinfl, names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
inGR <- GenomicRanges::unlist(GRL)

# pool together GWAS SNPs for all diseases
ind <- match(c('SLE','ATD','T1D','RA','MS','CEL','PSO','AS','IBD','CRO','UC'), names(ImmunoBase))
ls_gr <- lapply(ImmunoBase[ind], function(x){
    gr <- x$variant
})
GRL <- GenomicRanges::GRangesList(ls_gr)
allGR <- GenomicRanges::unlist(GRL)

GWAS SNPs are used as the test background

background.file <- xRDataLoader(RData.customised='dbSNP_GWAS', RData.location=RData.location)

5.3.1 Via gene annotations

We use gene annotations to interpret pooled GWAS SNPs for each of the three categories by looking directly at genes harbouring these SNPs, that is, functional and phenotypic annotation of genes harbouring GWAS SNPs for each of three disease categories.

5.3.1.1 Using Gene Ontology Molecular Function (GOMF)

ontology <- "GOMF"

GOMF enrichments for SNPs in autoimmune diseases

eTerm_gomf_im <- xGRviaGeneAnno(data.file=imGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_gomf_im, 10)

GOMF enrichments for SNPs in autoinflammatory diseases

eTerm_gomf_in <- xGRviaGeneAnno(data.file=inGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_gomf_in, 10)

GOMF enrichments for SNPs in mixed diseases

eTerm_gomf_mx <- xGRviaGeneAnno(data.file=mxGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_gomf_mx, 10)

Comparing GOMF enrichments across disease categories

list_eTerm <- list(eTerm_gomf_in, eTerm_gomf_mx, eTerm_gomf_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_gomf <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=5e-2, wrap.width=100, bar.label=F, sharings=c(2,3))
bp_gomf + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

DAGplot of terms enriched in any analyses above, nodes/terms colored according to how many times being called significant enrichment.

xEnrichDAGplotAdv(bp_gomf, displayBy="nSig", colormap="white-lightgreen-green", zlim=c(1,3), layout.orientation="left_right", path.mode=c("all_paths","shortest_paths","all_shortest_paths")[1], node.info=c("term_name"), graph.node.attrs=list(fontsize=25), newpage=F)

5.3.1.2 Using Gene Ontology Biological Process (GOBP)

ontology <- "GOBP"

GOBP enrichments for SNPs in autoimmune diseases

eTerm_GOBP_im <- xGRviaGeneAnno(data.file=imGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_GOBP_im, 10)

GOBP enrichments for SNPs in autoinflammatory diseases

eTerm_GOBP_in <- xGRviaGeneAnno(data.file=inGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_GOBP_in, 10)

GOBP enrichments for SNPs in mixed diseases

eTerm_GOBP_mx <- xGRviaGeneAnno(data.file=mxGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_GOBP_mx, 10)

Comparing GOBP enrichments across disease categories

list_eTerm <- list(eTerm_GOBP_in, eTerm_GOBP_mx, eTerm_GOBP_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_GOBP <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=1e-2, wrap.width=100, bar.label=F, sharings=c(2,3))
bp_GOBP + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3.1.3 Using Human Phenotype Phenotypic Abnormality (HPPA)

ontology <- "HPPA"

HPPA enrichments for SNPs in autoimmune diseases

eTerm_hppa_im <- xGRviaGeneAnno(data.file=imGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_hppa_im, 10)

HPPA enrichments for SNPs in autoinflammatory diseases

eTerm_hppa_in <- xGRviaGeneAnno(data.file=inGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_hppa_in, 10)

HPPA enrichments for SNPs in mixed diseases

eTerm_hppa_mx <- xGRviaGeneAnno(data.file=mxGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_hppa_mx, 10)

Comparing HPPA enrichments across disease categories

list_eTerm <- list(eTerm_hppa_in, eTerm_hppa_mx, eTerm_hppa_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_hppa <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=1e-2, wrap.width=50, bar.label=F, sharings=c(2,3))
bp_hppa + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3.1.4 Using Mammalian Phenotypes (MP)

ontology <- "MP"

MP enrichments for SNPs in autoimmune diseases

eTerm_MP_im <- xGRviaGeneAnno(data.file=imGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_MP_im, 10)

MP enrichments for SNPs in autoinflammatory diseases

eTerm_MP_in <- xGRviaGeneAnno(data.file=inGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_MP_in, 10)

MP enrichments for SNPs in mixed diseases

eTerm_MP_mx <- xGRviaGeneAnno(data.file=mxGR, background.file=background.file, format.file="GRanges", ontology=ontology, ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_MP_mx, 10)

Comparing MP enrichments across disease categories

list_eTerm <- list(eTerm_MP_in, eTerm_MP_mx, eTerm_MP_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_MP <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=1e-6, wrap.width=100, bar.label=F, sharings=c(2,3))
bp_MP + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3.2 Via genomic annotations

Using functional genomic annotations supported in XGR, we are also able to compare and define genetic and epigenetic characteristics underlying each of the three categories.

5.3.2.1 FANTOM5 enhancer analysis

Enhancer enrichment analysis is using cell-type-specific expressed enhancers:

GR.annotation <- "FANTOM5_Enhancer_Cell"

Enhancer enrichments for SNPs in autoimmune diseases

eTerm_enhancer_im <- xGRviaGenomicAnno(data.file=imGR, background.file=background.file, format.file="GRanges", GR.annotation=GR.annotation, RData.location=RData.location)
xEnrichViewer(eTerm_enhancer_im, 10)

Enhancer enrichments for SNPs in autoinflammatory diseases

eTerm_enhancer_in <- xGRviaGenomicAnno(data.file=inGR, background.file=background.file, format.file="GRanges", GR.annotation=GR.annotation, RData.location=RData.location)
xEnrichViewer(eTerm_enhancer_in, 10)

Enhancer enrichments for SNPs in mixed diseases

eTerm_enhancer_mx <- xGRviaGenomicAnno(data.file=mxGR, background.file=background.file, format.file="GRanges", GR.annotation=GR.annotation, RData.location=RData.location)
xEnrichViewer(eTerm_enhancer_mx, 10)

Comparing enhancer enrichments across disease categories

list_eTerm <- list(eTerm_enhancer_in, eTerm_enhancer_mx, eTerm_enhancer_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_enhancer <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=5e-2, wrap.width=50, bar.label=F)
bp_enhancer + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3.2.2 ENCODE TFBS analysis

ChIP-seq TFBS enrichment analysis is using uniformly identified TFBSs assayed in the cell line Gm12878 (human lymphoblastoid cell line):

# load all cell-type-specific TFBSs
all_gr <- xRDataLoader(RData.customised="Uniform_TFBS", RData.location=RData.location)
# extract TFBSs in Gm12878, stored in a list of GR objects called 'anno_gr'
ind <- grep('Gm12878', names(all_gr))
anno_gr <- all_gr[ind]

TFBS enrichments for SNPs in autoimmune diseases

eTerm_tfbs_im <- xGRviaGenomicAnno(data.file=imGR, annotation.file=anno_gr, background.file=background.file, format.file="GRanges", RData.location=RData.location)
xEnrichViewer(eTerm_tfbs_im, 10)

TFBS enrichments for SNPs in autoinflammatory diseases

eTerm_tfbs_in <- xGRviaGenomicAnno(data.file=inGR, annotation.file=anno_gr, background.file=background.file, format.file="GRanges", RData.location=RData.location)
xEnrichViewer(eTerm_tfbs_in, 10)

TFBS enrichments for SNPs in mixed diseases

eTerm_tfbs_mx <- xGRviaGenomicAnno(data.file=mxGR, annotation.file=anno_gr, background.file=background.file, format.file="GRanges", RData.location=RData.location)
xEnrichViewer(eTerm_tfbs_mx, 10)

Comparing TFBS enrichments across disease categories

list_eTerm <- list(eTerm_tfbs_in, eTerm_tfbs_mx, eTerm_tfbs_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_tfbs <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=5e-2, wrap.width=50, bar.label=F, sharings=c(2,3))
bp_tfbs + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3.2.3 ENCODE histone analysis

Histone enrichment analysis is using uniformly identified histone marks assayed in the cell line Gm12878 (human lymphoblastoid cell line):

# load all cell-type-specific histone marks
all_gr <- xRDataLoader(RData.customised="Broad_Histone", RData.location=RData.location)
# extract histone marks in Gm12878, stored in a list of GR objects called 'anno_gr'
ind <- grep('Gm12878', names(all_gr))
anno_gr <- all_gr[ind]

histone enrichments for SNPs in autoimmune diseases

eTerm_histone_im <- xGRviaGenomicAnno(data.file=imGR, annotation.file=anno_gr, background.file=background.file, format.file="GRanges", RData.location=RData.location)
xEnrichViewer(eTerm_histone_im, 10)

histone enrichments for SNPs in autoinflammatory diseases

eTerm_histone_in <- xGRviaGenomicAnno(data.file=inGR, annotation.file=anno_gr, background.file=background.file, format.file="GRanges", RData.location=RData.location)
xEnrichViewer(eTerm_histone_in, 10)

histone enrichments for SNPs in mixed diseases

eTerm_histone_mx <- xGRviaGenomicAnno(data.file=mxGR, annotation.file=anno_gr, background.file=background.file, format.file="GRanges", RData.location=RData.location)
xEnrichViewer(eTerm_histone_mx, 10)

Comparing histone enrichments across disease categories

list_eTerm <- list(eTerm_histone_in, eTerm_histone_mx, eTerm_histone_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_histone <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=1e-2, wrap.width=50, bar.label=F, sharings=c(2,3))
bp_histone + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

5.3.2.4 ENCODE segment analysis

Genome segmentation enrichment analysis is using 7 category segments in the cell line GM12878:

GR.annotation <- "Segment_Combined_Gm12878"

segment enrichments for SNPs in autoimmune diseases

eTerm_segment_im <- xGRviaGenomicAnno(data.file=imGR, background.file=background.file, format.file="GRanges", GR.annotation=GR.annotation, RData.location=RData.location)
xEnrichViewer(eTerm_segment_im, 10)

segment enrichments for SNPs in autoinflammatory diseases

eTerm_segment_in <- xGRviaGenomicAnno(data.file=inGR, background.file=background.file, format.file="GRanges", GR.annotation=GR.annotation, RData.location=RData.location)
xEnrichViewer(eTerm_segment_in, 10)

segment enrichments for SNPs in mixed diseases

eTerm_segment_mx <- xGRviaGenomicAnno(data.file=mxGR, background.file=background.file, format.file="GRanges", GR.annotation=GR.annotation, RData.location=RData.location)
xEnrichViewer(eTerm_segment_mx, 10)

Comparing segment enrichments across disease categories

list_eTerm <- list(eTerm_segment_in, eTerm_segment_mx, eTerm_segment_im)
names(list_eTerm) <- c('Autoinflammatory', 'Mixed', 'Autoimmune')
bp_segment <- xEnrichCompare(list_eTerm, displayBy="fdr", FDR.cutoff=5e-2, wrap.width=50, bar.label=F, sharings=c(2,3))
bp_segment + theme(axis.text.y=element_text(size=9,color="black"), axis.title.x=element_text(size=16,color="black")) + scale_fill_manual(values=c("orange","cyan","red"))

6 Session Info

Here is the output of sessionInfo() on the system on which this user manual was built:

> R version 3.2.4 (2016-03-10)
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> Running under: OS X 10.11.4 (El Capitan)
> 
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
> 
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods  
> [8] base     
> 
> other attached packages:
> [1] VennDiagram_1.6.16  futile.logger_1.4.1 XGR_1.0.2          
> [4] ggplot2_2.1.0       dnet_1.0.9          supraHex_1.11.1    
> [7] hexbin_1.27.1       igraph_1.0.1        rmarkdown_0.9.5    
> 
> loaded via a namespace (and not attached):
>  [1] Rcpp_0.12.5             formatR_1.3            
>  [3] highr_0.5.1             plyr_1.8.3             
>  [5] GenomeInfoDb_1.4.3      XVector_0.8.0          
>  [7] futile.options_1.0.0    bitops_1.0-6           
>  [9] zlibbioc_1.14.0         tools_3.2.4            
> [11] digest_0.6.9            evaluate_0.8.3         
> [13] nlme_3.1-125            gtable_0.2.0           
> [15] lattice_0.20-33         Matrix_1.2-4           
> [17] graph_1.46.0            Rgraphviz_2.12.0       
> [19] yaml_2.1.13             parallel_3.2.4         
> [21] rtracklayer_1.28.10     stringr_1.0.0          
> [23] knitr_1.12.3            Biostrings_2.36.4      
> [25] RCircos_1.1.3           S4Vectors_0.6.6        
> [27] IRanges_2.2.9           stats4_3.2.4           
> [29] Biobase_2.28.0          BiocParallel_1.2.22    
> [31] XML_3.98-1.4            reshape2_1.4.1         
> [33] lambda.r_1.1.7          magrittr_1.5           
> [35] GenomicAlignments_1.4.2 Rsamtools_1.20.5       
> [37] scales_0.4.0            htmltools_0.3          
> [39] BiocGenerics_0.14.0     GenomicRanges_1.20.8   
> [41] ape_3.5                 colorspace_1.2-6       
> [43] labeling_0.3            stringi_1.1.1          
> [45] RCurl_1.95-4.8          munsell_0.4.3

Demo Series 2: using the XGR package to interpret GWAS summary data

Mon Aug 1 15:44:18 2016

1 Web-app (linkto)

2 Summary

3 Packages

4 Data

5 Showcase

5.1 Scoring from SNPs to genes

5.2 SNP-modulated gene networks underlying disease categories

5.2.1 Polygenic autoinflammatory diseases

5.2.2 Mixed diseases

5.2.3 Polygenic autoimmune diseases

5.2.4 Comparing network genes across disease categories

5.2.5 Pathway analysis of network genes across disease categories

5.3 Annotation analysis of SNPs across disease categories

5.3.1 Via gene annotations

5.3.1.1 Using Gene Ontology Molecular Function (GOMF)

5.3.1.2 Using Gene Ontology Biological Process (GOBP)

5.3.1.3 Using Human Phenotype Phenotypic Abnormality (HPPA)

5.3.1.4 Using Mammalian Phenotypes (MP)

5.3.2 Via genomic annotations

5.3.2.1 FANTOM5 enhancer analysis

5.3.2.2 ENCODE TFBS analysis

5.3.2.3 ENCODE histone analysis

5.3.2.4 ENCODE segment analysis

6 Session Info