This demo showcases the analytic power of using XGR to make sense of differential genes summarised from differential expression studies, including enrichment analysis and network analysis. Gene-level enrichment analysis supported in XGR is unique in its engagement to produce much more informative enrichment results. It is achieved either taking into account the ontology tree-like structure when using a structured ontology, or going through a filtering procedure when using a non-structured ontology (eg a collection of pathways). Gene-level network analysis supported in XGR is able to identify a maximum-scoring gene subnetwork (with a desired number of nodes), achieved via heuristically solving prize-collecting Steiner tree problem, this algorithm that has been demonstrated superior over other state-of-the-art methods (Fang and Gough 2014).
First of all, load the XGR package and specify the location of built-in data.
library(XGR)
# Specify the location of built-in data
RData.location <- "http://galahad.well.ox.ac.uk/bigdata"
We here illustrate the functionalities supported in XGR to interpret differentially expressed genes induced by innate immune stimuli (Fairfax et al. 2014), that is, genes differentially induced by 24-hour interferon gamma (IFN24), 24-hour LPS (LPS24) or 2-hour LPS (LPS2).
# Load differential expression analysis results
res <- xRDataLoader(RData.customised='JKscience_TS1A', RData.location=RData.location)
background <- res$Symbol
# Create a data frame for genes significantly induced by IFN24
flag <- res$logFC_INF24_Naive<0 & res$fdr_INF24_Naive<0.01
df_IFN24 <- res[flag, c('Symbol','logFC_INF24_Naive','fdr_INF24_Naive')]
# Create a data frame for genes significantly induced by LPS24
flag <- res$logFC_LPS24_Naive<0 & res$fdr_LPS24_Naive<0.01
df_LPS24 <- res[flag, c('Symbol','logFC_LPS24_Naive','fdr_LPS24_Naive')]
# Create a data frame for genes significantly induced by LPS2
flag <- res$logFC_LPS2_Naive<0 & res$fdr_LPS2_Naive<0.01
df_LPS2 <- res[flag, c('Symbol','logFC_LPS2_Naive','fdr_LPS2_Naive')]
This is demonstrated by performing Disease Ontology (DO) enrichment analysis for differential genes induced by 24-hour interferon gamma.
DO enrichment analysis without considering tree structure
data <- df_IFN24$Symbol
eTerm_IFN24_DO_none <- xEnricherGenes(data=data, background=background, ontology="DO", ontology.algorithm="none", RData.location=RData.location)
xEnrichViewer(eTerm_IFN24_DO_none, 10)
Barplot of enriched terms
bp_IFN24_DO_none <- xEnrichBarplot(eTerm_IFN24_DO_none, top_num="auto", displayBy="fdr")
bp_IFN24_DO_none
DAGplot of enriched terms
xEnrichDAGplot(eTerm_IFN24_DO_none, top_num="auto", displayBy="fdr", node.info=c("both"), graph.node.attrs=list(fontsize=30), newpage=F)
DO enrichment analysis after considering tree structure
data <- df_IFN24$Symbol
eTerm_IFN24_DO_lea <- xEnricherGenes(data=data, background=background, ontology="DO", ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_IFN24_DO_lea, 10)
Barplot of enriched terms
bp_IFN24_DO_lea <- xEnrichBarplot(eTerm_IFN24_DO_lea, top_num="auto", displayBy="fdr")
bp_IFN24_DO_lea
DAGplot of enriched terms
xEnrichDAGplot(eTerm_IFN24_DO_lea, top_num="auto", displayBy="fdr", node.info=c("both"), graph.node.attrs=list(fontsize=30), newpage=F)
Comparing DO-based enrichment results with/without considering tree structure
list_eTerm <- list(eTerm_IFN24_DO_none, eTerm_IFN24_DO_lea)
names(list_eTerm) <- c('DO Tree (-)', 'DO Tree (+)')
bp_IFN24_DO <- xEnrichCompare(list_eTerm, displayBy="fc")
bp_IFN24_DO + theme(axis.text.y=element_text(size=10))
DAGplot of terms enriched in any analyses above, nodes/terms colored according to how many times being called significant enrichment. Also shown is the term name (if significant) prefixed in the form of ‘x1-x2’. In this case, x1 for ‘DO Tree (-)’, x2 for ‘DO Tree (+)’. The value of x1 (or x2) can be ‘1’ or ‘0’, denoting whether this term is called significant or not.
xEnrichDAGplotAdv(bp_IFN24_DO, displayBy="nSig", colormap="white-lightcyan-cyan", layout.orientation="top_bottom", node.info=c("term_name"), graph.node.attrs=list(fontsize=30), newpage=F)
Pathway enrichment analysis without filtering
data <- df_IFN24$Symbol
eTerm_IFN24_MsigdbC2REACTOME <- xEnricherGenes(data=data, background=background, ontology="MsigdbC2REACTOME", RData.location=RData.location)
xEnrichViewer(eTerm_IFN24_MsigdbC2REACTOME, 10)
Barplot of enriched terms
bp_IFN24_MsigdbC2REACTOME <- xEnrichBarplot(eTerm_IFN24_MsigdbC2REACTOME, top_num=10, displayBy="fdr")
bp_IFN24_MsigdbC2REACTOME
Pathway enrichments after filtering
eTerm_IFN24_MsigdbC2REACTOME_filtering <- xEnrichConciser(eTerm_IFN24_MsigdbC2REACTOME)
xEnrichViewer(eTerm_IFN24_MsigdbC2REACTOME_filtering, 10)
Barplot of enriched terms
bp_IFN24_MsigdbC2REACTOME_filtering <- xEnrichBarplot(eTerm_IFN24_MsigdbC2REACTOME_filtering, top_num=10, displayBy="fdr")
bp_IFN24_MsigdbC2REACTOME_filtering
This is demonstrated using differential genes induced by 24-hour interferon gamma against 24-hour LPS and 2-hour LPS. For each condition, the redundant pathways are filtered out.
Choose Reactome pathways:
ontology <- "MsigdbC2REACTOME"
Pathway enrichments for differential genes induced by 24-hour interferon gamma
data <- df_IFN24$Symbol
eTerm <- xEnricherGenes(data=data, background=background, ontology=ontology, RData.location=RData.location)
eTerm_IFN24_MsigdbC2REACTOME <- xEnrichConciser(eTerm)
xEnrichViewer(eTerm_IFN24_MsigdbC2REACTOME, 10)
Barplot of enriched terms
bp_IFN24_MsigdbC2REACTOME <- xEnrichBarplot(eTerm_IFN24_MsigdbC2REACTOME, top_num=10, displayBy="fc")
bp_IFN24_MsigdbC2REACTOME
Pathway enrichments for differential genes induced by 24-hour LPS
data <- df_LPS24$Symbol
eTerm <- xEnricherGenes(data=data, background=background, ontology=ontology, RData.location=RData.location)
eTerm_LPS24_MsigdbC2REACTOME <- xEnrichConciser(eTerm)
xEnrichViewer(eTerm_LPS24_MsigdbC2REACTOME, 10)
Barplot of enriched terms
bp_LPS24_MsigdbC2REACTOME <- xEnrichBarplot(eTerm_LPS24_MsigdbC2REACTOME, top_num=10, displayBy="fc")
bp_LPS24_MsigdbC2REACTOME
Pathway enrichments for differential genes induced by 2-hour LPS
data <- df_LPS2$Symbol
eTerm <- xEnricherGenes(data=data, background=background, ontology=ontology, RData.location=RData.location)
eTerm_LPS2_MsigdbC2REACTOME <- xEnrichConciser(eTerm)
xEnrichViewer(eTerm_LPS2_MsigdbC2REACTOME, 10)
Barplot of enriched terms
bp_LPS2_MsigdbC2REACTOME <- xEnrichBarplot(eTerm_LPS2_MsigdbC2REACTOME, top_num=10, displayBy="fc")
bp_LPS2_MsigdbC2REACTOME
Comparing pathway-based enrichment results across conditions
Comparisons across 2 conditions: LPS2 vs LPS24
list_eTerm <- list(eTerm_LPS2_MsigdbC2REACTOME, eTerm_LPS24_MsigdbC2REACTOME)
names(list_eTerm) <- c('Pathways for LPS2', 'Pathways for LPS24')
bp_Pathway <- xEnrichCompare(list_eTerm, displayBy="fc", FDR.cutoff=1e-2, wrap.width=50)
bp_Pathway + theme(axis.text.y=element_text(size=10))
Comparisons across 3 conditions: LPS2 vs LPS24 vs IFN24
list_eTerm <- list(eTerm_LPS2_MsigdbC2REACTOME, eTerm_LPS24_MsigdbC2REACTOME, eTerm_IFN24_MsigdbC2REACTOME)
names(list_eTerm) <- c('Pathways for LPS2', 'Pathways for LPS24', 'Pathways for INF24')
bp_Pathway <- xEnrichCompare(list_eTerm, displayBy="fc", FDR.cutoff=5e-3, wrap.width=50)
bp_Pathway + theme(axis.text.y=element_text(size=10))
This is demonstrated using differential genes induced by 24-hour interferon gamma. Comparisons are made using networks of different sources: functional interactions from the STRING database versus pathway interactions from the Pathway Commons database.
Using functional interactions (undirected)
# find maximum-scoring gene subnetwork with the desired node number=75
data <- df_IFN24[,c("Symbol","fdr_INF24_Naive")]
subg_func <- xSubneterGenes(data=data, network="STRING_high", subnet.size=75, RData.location=RData.location)
Visualise the identified subnetwork with nodes/genes colored according to FDR
subg <- subg_func
pattern <- -log10(as.numeric(V(subg)$significance))
xVisNet(g=subg, pattern=pattern, vertex.shape="sphere", vertex.label.font=2, newpage=F)
Using Pathway interactions (directed)
# find maximum-scoring gene subnetwork with the desired node number=75
data <- df_IFN24[,c("Symbol","fdr_INF24_Naive")]
subg_path <- xSubneterGenes(data=data, network="PCommonsDN_medium", subnet.size=75, RData.location=RData.location)
Visualise the identified network with nodes/genes colored according to FDR
subg <- subg_path
pattern <- -log10(as.numeric(V(subg)$significance))
xVisNet(g=subg, pattern=pattern, vertex.shape="sphere", vertex.label.font=2, newpage=F)
Comparing subnetworks identified using two different sources
# identify common genes
net_func_path <- graph.intersection(subg_func, as.undirected(subg_path), keep.all.vertices=T)
Visualise the network nodes according to how many times they are found in two networks. If an edge has two nodes found in both networks, this edge is also shown.
allnodes <- V(net_func_path)$name
df_id <- cbind(path=match(allnodes, V(subg_path)$name), func=match(allnodes, V(subg_func)$name))
pattern <- apply(!is.na(df_id), 1, sum)
names(pattern) <- allnodes
g <- net_func_path
xVisNet(g, pattern=pattern, colormap="wyr", zlim=c(0,2), vertex.shape="sphere", vertex.label.font=2, newpage=F)
Venn diagram of network genes
# if the package 'VennDiagram' is uninstalled, please do it first via:
# source("http://bioconductor.org/biocLite.R"); biocLite('VennDiagram',ask=FALSE)
library(VennDiagram)
data <- list()
data$Functional <- V(subg_func)$name
data$Pathway <- V(subg_path)$name
vp <- venn.diagram(x=data, filename=NULL, fill=c("skyblue","pink1"), category.names=names(data))
grid.draw(vp)
Here is the output of sessionInfo()
on the system on which this user manual was built:
> R version 3.3.2 (2016-10-31)
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> Running under: OS X El Capitan 10.11.6
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] VennDiagram_1.6.17 futile.logger_1.4.3 XGR_1.0.6
> [4] ggplot2_2.2.0 dnet_1.0.9 supraHex_1.11.2
> [7] hexbin_1.27.1 igraph_1.0.1 rmarkdown_1.2
>
> loaded via a namespace (and not attached):
> [1] Rcpp_0.12.8 XVector_0.12.1
> [3] GenomeInfoDb_1.8.7 plyr_1.8.4
> [5] futile.options_1.0.0 bitops_1.0-6
> [7] zlibbioc_1.18.0 tools_3.3.2
> [9] digest_0.6.10 evaluate_0.10
> [11] tibble_1.2 nlme_3.1-128
> [13] gtable_0.2.0 lattice_0.20-34
> [15] Matrix_1.2-7.1 graph_1.50.0
> [17] Rgraphviz_2.16.0 yaml_2.1.14
> [19] parallel_3.3.2 rtracklayer_1.32.2
> [21] stringr_1.1.0 knitr_1.15.1
> [23] Biostrings_2.40.2 RCircos_1.2.0
> [25] S4Vectors_0.10.3 IRanges_2.6.1
> [27] stats4_3.3.2 rprojroot_1.1
> [29] Biobase_2.32.0 BiocParallel_1.6.6
> [31] XML_3.98-1.5 lambda.r_1.1.9
> [33] reshape2_1.4.2 magrittr_1.5
> [35] codetools_0.2-15 GenomicAlignments_1.8.4
> [37] Rsamtools_1.24.0 backports_1.0.4
> [39] scales_0.4.1.9000 htmltools_0.3.5
> [41] BiocGenerics_0.18.0 GenomicRanges_1.24.3
> [43] SummarizedExperiment_1.2.3 assertthat_0.1
> [45] ape_3.5 colorspace_1.3-1
> [47] labeling_0.3 stringi_1.1.2
> [49] RCurl_1.95-4.8 lazyeval_0.2.0
> [51] munsell_0.4.3
Below is the list of references that this demo cites:
Fairfax, Benjamin P, Peter Humburg, Seiko Makino, Vivek Naranbhai, Daniel Wong, Evelyn Lau, Luke Jostins, et al. 2014. “Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression.” Science (New York, N.Y.) 343 (6175): 1246949. doi:10.1126/science.1246949.
Fang, Hai, and Julian Gough. 2014. “The ’dnet’ approach promotes emerging research on cancer patient survival.” Genome Medicine 6 (8): 64. doi:10.1186/s13073-014-0064-8.