1 Web-app (linkto)

2 Summary

This demo showcases the analytic power of using XGR to make sense of differential genes summarised from differential expression studies, including enrichment analysis and network analysis. Gene-level enrichment analysis supported in XGR is unique in its engagement to produce much more informative enrichment results. It is achieved either taking into account the ontology tree-like structure when using a structured ontology, or going through a filtering procedure when using a non-structured ontology (eg a collection of pathways). Gene-level network analysis supported in XGR is able to identify a maximum-scoring gene subnetwork (with a desired number of nodes), achieved via heuristically solving prize-collecting Steiner tree problem, this algorithm that has been demonstrated superior over other state-of-the-art methods (Fang and Gough 2014).

3 Package

First of all, load the XGR package and specify the location of built-in data.

library(XGR)

# Specify the location of built-in data
RData.location <- "http://galahad.well.ox.ac.uk/bigdata"

4 Data

We here illustrate the functionalities supported in XGR to interpret differentially expressed genes induced by innate immune stimuli (Fairfax et al. 2014), that is, genes differentially induced by 24-hour interferon gamma (IFN24), 24-hour LPS (LPS24) or 2-hour LPS (LPS2).

# Load differential expression analysis results
res <- xRDataLoader(RData.customised='JKscience_TS1A', RData.location=RData.location)
background <- res$Symbol
# Create a data frame for genes significantly induced by IFN24
flag <- res$logFC_INF24_Naive<0 & res$fdr_INF24_Naive<0.01
df_IFN24 <- res[flag, c('Symbol','logFC_INF24_Naive','fdr_INF24_Naive')]
# Create a data frame for genes significantly induced by LPS24
flag <- res$logFC_LPS24_Naive<0 & res$fdr_LPS24_Naive<0.01
df_LPS24 <- res[flag, c('Symbol','logFC_LPS24_Naive','fdr_LPS24_Naive')]
# Create a data frame for genes significantly induced by LPS2
flag <- res$logFC_LPS2_Naive<0 & res$fdr_LPS2_Naive<0.01
df_LPS2 <- res[flag, c('Symbol','logFC_LPS2_Naive','fdr_LPS2_Naive')]

5 Showcase

5.1 Enrichment analysis:

5.1.1 Necessity of respecting ontology tree structure when using structured ontologies

This is demonstrated by performing Disease Ontology (DO) enrichment analysis for differential genes induced by 24-hour interferon gamma.

DO enrichment analysis without considering tree structure

data <- df_IFN24$Symbol
eTerm_IFN24_DO_none <- xEnricherGenes(data=data, background=background, ontology="DO", ontology.algorithm="none", RData.location=RData.location)
xEnrichViewer(eTerm_IFN24_DO_none, 10)

Barplot of enriched terms

bp_IFN24_DO_none <- xEnrichBarplot(eTerm_IFN24_DO_none, top_num="auto", displayBy="fdr")
bp_IFN24_DO_none

DAGplot of enriched terms

xEnrichDAGplot(eTerm_IFN24_DO_none, top_num="auto", displayBy="fdr", node.info=c("both"), graph.node.attrs=list(fontsize=30), newpage=F)

DO enrichment analysis after considering tree structure

data <- df_IFN24$Symbol
eTerm_IFN24_DO_lea <- xEnricherGenes(data=data, background=background, ontology="DO", ontology.algorithm="lea", RData.location=RData.location)
xEnrichViewer(eTerm_IFN24_DO_lea, 10)

Barplot of enriched terms

bp_IFN24_DO_lea <- xEnrichBarplot(eTerm_IFN24_DO_lea, top_num="auto", displayBy="fdr")
bp_IFN24_DO_lea

DAGplot of enriched terms

xEnrichDAGplot(eTerm_IFN24_DO_lea, top_num="auto", displayBy="fdr", node.info=c("both"), graph.node.attrs=list(fontsize=30), newpage=F)

Comparing DO-based enrichment results with/without considering tree structure

list_eTerm <- list(eTerm_IFN24_DO_none, eTerm_IFN24_DO_lea)
names(list_eTerm) <- c('DO Tree (-)', 'DO Tree (+)')
bp_IFN24_DO <- xEnrichCompare(list_eTerm, displayBy="fc")
bp_IFN24_DO + theme(axis.text.y=element_text(size=10))

DAGplot of terms enriched in any analyses above, nodes/terms colored according to how many times being called significant enrichment. Also shown is the term name (if significant) prefixed in the form of ‘x1-x2’. In this case, x1 for ‘DO Tree (-)’, x2 for ‘DO Tree (+)’. The value of x1 (or x2) can be ‘1’ or ‘0’, denoting whether this term is called significant or not.

xEnrichDAGplotAdv(bp_IFN24_DO, displayBy="nSig", colormap="white-lightcyan-cyan", layout.orientation="top_bottom", node.info=c("term_name"), graph.node.attrs=list(fontsize=30), newpage=F)

5.1.2 Necessity of further filtering out redundant terms when using non-structured ontologies

Pathway enrichment analysis without filtering

data <- df_IFN24$Symbol
eTerm_IFN24_MsigdbC2REACTOME <- xEnricherGenes(data=data, background=background, ontology="MsigdbC2REACTOME", RData.location=RData.location)
xEnrichViewer(eTerm_IFN24_MsigdbC2REACTOME, 10)

Barplot of enriched terms

bp_IFN24_MsigdbC2REACTOME <- xEnrichBarplot(eTerm_IFN24_MsigdbC2REACTOME, top_num=10, displayBy="fdr")
bp_IFN24_MsigdbC2REACTOME

Pathway enrichments after filtering

eTerm_IFN24_MsigdbC2REACTOME_filtering <- xEnrichConciser(eTerm_IFN24_MsigdbC2REACTOME)
xEnrichViewer(eTerm_IFN24_MsigdbC2REACTOME_filtering, 10)

Barplot of enriched terms

bp_IFN24_MsigdbC2REACTOME_filtering <- xEnrichBarplot(eTerm_IFN24_MsigdbC2REACTOME_filtering, top_num=10, displayBy="fdr")
bp_IFN24_MsigdbC2REACTOME_filtering

5.2 Pathway analysis: comparisons across conditions

This is demonstrated using differential genes induced by 24-hour interferon gamma against 24-hour LPS and 2-hour LPS. For each condition, the redundant pathways are filtered out.

Choose Reactome pathways:

ontology <- "MsigdbC2REACTOME"

Pathway enrichments for differential genes induced by 24-hour interferon gamma

data <- df_IFN24$Symbol
eTerm <- xEnricherGenes(data=data, background=background, ontology=ontology, RData.location=RData.location)
eTerm_IFN24_MsigdbC2REACTOME <- xEnrichConciser(eTerm)
xEnrichViewer(eTerm_IFN24_MsigdbC2REACTOME, 10)

Barplot of enriched terms

bp_IFN24_MsigdbC2REACTOME <- xEnrichBarplot(eTerm_IFN24_MsigdbC2REACTOME, top_num=10, displayBy="fc")
bp_IFN24_MsigdbC2REACTOME

Pathway enrichments for differential genes induced by 24-hour LPS

data <- df_LPS24$Symbol
eTerm <- xEnricherGenes(data=data, background=background, ontology=ontology, RData.location=RData.location)
eTerm_LPS24_MsigdbC2REACTOME <- xEnrichConciser(eTerm)
xEnrichViewer(eTerm_LPS24_MsigdbC2REACTOME, 10)

Barplot of enriched terms

bp_LPS24_MsigdbC2REACTOME <- xEnrichBarplot(eTerm_LPS24_MsigdbC2REACTOME, top_num=10, displayBy="fc")
bp_LPS24_MsigdbC2REACTOME

Pathway enrichments for differential genes induced by 2-hour LPS

data <- df_LPS2$Symbol
eTerm <- xEnricherGenes(data=data, background=background, ontology=ontology, RData.location=RData.location)
eTerm_LPS2_MsigdbC2REACTOME <- xEnrichConciser(eTerm)
xEnrichViewer(eTerm_LPS2_MsigdbC2REACTOME, 10)

Barplot of enriched terms

bp_LPS2_MsigdbC2REACTOME <- xEnrichBarplot(eTerm_LPS2_MsigdbC2REACTOME, top_num=10, displayBy="fc")
bp_LPS2_MsigdbC2REACTOME

Comparing pathway-based enrichment results across conditions

Comparisons across 2 conditions: LPS2 vs LPS24

list_eTerm <- list(eTerm_LPS2_MsigdbC2REACTOME, eTerm_LPS24_MsigdbC2REACTOME)
names(list_eTerm) <- c('Pathways for LPS2', 'Pathways for LPS24')
bp_Pathway <- xEnrichCompare(list_eTerm, displayBy="fc", FDR.cutoff=1e-2, wrap.width=50)
bp_Pathway + theme(axis.text.y=element_text(size=10))

Comparisons across 3 conditions: LPS2 vs LPS24 vs IFN24

list_eTerm <- list(eTerm_LPS2_MsigdbC2REACTOME, eTerm_LPS24_MsigdbC2REACTOME, eTerm_IFN24_MsigdbC2REACTOME)
names(list_eTerm) <- c('Pathways for LPS2', 'Pathways for LPS24', 'Pathways for INF24')
bp_Pathway <- xEnrichCompare(list_eTerm, displayBy="fc", FDR.cutoff=5e-3, wrap.width=50)
bp_Pathway + theme(axis.text.y=element_text(size=10))

5.3 Network analysis: comparisons using networks of different sources

This is demonstrated using differential genes induced by 24-hour interferon gamma. Comparisons are made using networks of different sources: functional interactions from the STRING database versus pathway interactions from the Pathway Commons database.

Using functional interactions (undirected)

# find maximum-scoring gene subnetwork with the desired node number=75
data <- df_IFN24[,c("Symbol","fdr_INF24_Naive")]
subg_func <- xSubneterGenes(data=data, network="STRING_high", subnet.size=75, RData.location=RData.location)

Visualise the identified subnetwork with nodes/genes colored according to FDR

subg <- subg_func
pattern <- -log10(as.numeric(V(subg)$significance))
xVisNet(g=subg, pattern=pattern, vertex.shape="sphere", vertex.label.font=2, newpage=F)

Using Pathway interactions (directed)

# find maximum-scoring gene subnetwork with the desired node number=75
data <- df_IFN24[,c("Symbol","fdr_INF24_Naive")]
subg_path <- xSubneterGenes(data=data, network="PCommonsDN_medium", subnet.size=75, RData.location=RData.location)

Visualise the identified network with nodes/genes colored according to FDR

subg <- subg_path
pattern <- -log10(as.numeric(V(subg)$significance))
xVisNet(g=subg, pattern=pattern, vertex.shape="sphere", vertex.label.font=2, newpage=F)

Comparing subnetworks identified using two different sources

# identify common genes
net_func_path <- graph.intersection(subg_func, as.undirected(subg_path), keep.all.vertices=T)

Visualise the network nodes according to how many times they are found in two networks. If an edge has two nodes found in both networks, this edge is also shown.

allnodes <- V(net_func_path)$name
df_id <- cbind(path=match(allnodes, V(subg_path)$name), func=match(allnodes, V(subg_func)$name))
pattern <- apply(!is.na(df_id), 1, sum)
names(pattern) <- allnodes
g <- net_func_path
xVisNet(g, pattern=pattern, colormap="wyr", zlim=c(0,2), vertex.shape="sphere", vertex.label.font=2, newpage=F)

Venn diagram of network genes

# if the package 'VennDiagram' is uninstalled, please do it first via:
# source("http://bioconductor.org/biocLite.R"); biocLite('VennDiagram',ask=FALSE)

library(VennDiagram)

data <- list()
data$Functional <- V(subg_func)$name
data$Pathway <- V(subg_path)$name
vp <- venn.diagram(x=data, filename=NULL, fill=c("skyblue","pink1"), category.names=names(data))
grid.draw(vp)

6 Session Info

Here is the output of sessionInfo() on the system on which this user manual was built:

> R version 3.3.2 (2016-10-31)
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> Running under: OS X El Capitan 10.11.6
> 
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
> 
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods  
> [8] base     
> 
> other attached packages:
> [1] VennDiagram_1.6.17  futile.logger_1.4.3 XGR_1.0.6          
> [4] ggplot2_2.2.0       dnet_1.0.9          supraHex_1.11.2    
> [7] hexbin_1.27.1       igraph_1.0.1        rmarkdown_1.2      
> 
> loaded via a namespace (and not attached):
>  [1] Rcpp_0.12.8                XVector_0.12.1            
>  [3] GenomeInfoDb_1.8.7         plyr_1.8.4                
>  [5] futile.options_1.0.0       bitops_1.0-6              
>  [7] zlibbioc_1.18.0            tools_3.3.2               
>  [9] digest_0.6.10              evaluate_0.10             
> [11] tibble_1.2                 nlme_3.1-128              
> [13] gtable_0.2.0               lattice_0.20-34           
> [15] Matrix_1.2-7.1             graph_1.50.0              
> [17] Rgraphviz_2.16.0           yaml_2.1.14               
> [19] parallel_3.3.2             rtracklayer_1.32.2        
> [21] stringr_1.1.0              knitr_1.15.1              
> [23] Biostrings_2.40.2          RCircos_1.2.0             
> [25] S4Vectors_0.10.3           IRanges_2.6.1             
> [27] stats4_3.3.2               rprojroot_1.1             
> [29] Biobase_2.32.0             BiocParallel_1.6.6        
> [31] XML_3.98-1.5               lambda.r_1.1.9            
> [33] reshape2_1.4.2             magrittr_1.5              
> [35] codetools_0.2-15           GenomicAlignments_1.8.4   
> [37] Rsamtools_1.24.0           backports_1.0.4           
> [39] scales_0.4.1.9000          htmltools_0.3.5           
> [41] BiocGenerics_0.18.0        GenomicRanges_1.24.3      
> [43] SummarizedExperiment_1.2.3 assertthat_0.1            
> [45] ape_3.5                    colorspace_1.3-1          
> [47] labeling_0.3               stringi_1.1.2             
> [49] RCurl_1.95-4.8             lazyeval_0.2.0            
> [51] munsell_0.4.3

7 References

Below is the list of references that this demo cites:

Fairfax, Benjamin P, Peter Humburg, Seiko Makino, Vivek Naranbhai, Daniel Wong, Evelyn Lau, Luke Jostins, et al. 2014. “Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression.” Science (New York, N.Y.) 343 (6175): 1246949. doi:10.1126/science.1246949.

Fang, Hai, and Julian Gough. 2014. “The ’dnet’ approach promotes emerging research on cancer patient survival.” Genome Medicine 6 (8): 64. doi:10.1186/s13073-014-0064-8.