Here, we are evaluating the ‘druggability’ of latent variables.
First, import packages to process and plot the data.
library(dplyr)
library(tidyr)
library(purrr)
library(synapser)
synLogin()
## Welcome, Robert Allaway!
## NULL
library(clusterProfiler)
plier_model <- readr::read_rds(synGet("syn18689545")$path)
plier_loadings_df <- plier_model$Z %>% as.data.frame() %>% purrr::set_names(rownames(plier_model$B))
drug_targets <- feather::read_feather(synGet('syn20700199')$path)
drug_targets <- drug_targets %>%
filter(mean_pchembl > 7) %>%
mutate(gene= hugo_gene) %>%
select(gene, std_name)
First, create a list of all druggable genes. We’re using the drug-target explorer data, for any gene for which there is drug-target relationship with a mean_pchembl value >7, which corresponds to 1 uM. This gives us a rough approximation of druggable targets in the human genome but could be a bit under conservative.
Then, we’ll select the top 5% of genes by loading in each latent variable. This gives us genes that contribute most strongly to a given LV.
Then, we’ll perform a weighted Kolmogorov-Smirnov test using the GSEA
function from the clusterProfiler
package to assess whether any of the LVs are enriched in the universe of druggable genes. Here, we’re treating each LV as the ranked list of genes (for standard GSEA, this would be the differentially expressed genes, for example) and treating the list of all drug targets as the gene set we’re looking for enrichment in.
term2gene <- drug_targets %>%
mutate(term = 'all_drugs') %>%
select(term, gene) %>%
distinct()
tidy <- plier_loadings_df %>%
tibble::rownames_to_column('gene') %>%
gather(lv, loading, -gene) %>%
filter(loading > 0) %>%
group_by(lv) %>%
# filter(loading > quantile(loading, 0.95)) %>% ##I think preselecting for percentile weighting might mess up this test
filter(any(gene %in% term2gene$gene)) %>% ##remove LVs that have no overlap if any exist
nest() %>%
mutate(data = lapply(data, function(x){
geneList <- x$loading
names(geneList) <- as.character(x$gene)
geneList <- sort(geneList, decreasing = TRUE)
geneList
})) ##nest gene lists, looking at top 5% of genes in each LV
tidy_2 <- tidy %>% ###run GSEA on nested gene lists vs universe of "druggable genes"
mutate(data = sapply(data, function(x){
tryCatch(GSEA(geneList = x, TERM2GENE = term2gene), error=function(err) NA)
}))
tidy_3 <- tidy_2 %>% ##store plots and data frames for LVs with significant results
mutate(plot = lapply(data, function(x){
if(nrow(x)>0){
plot <- gseaplot(x, geneSetID = 1)
}else{
plot <- NA
}
return(plot)
})) %>%
mutate(res = lapply(data, function(x){
if(nrow(x)>0){
res <- x@result
}else{
res <- NA
}
return(res)
}))
tidy_filt <- tidy_3 %>%
filter(!is.na(plot)) %>%
select(-data)
Then we simply look for LVs where we got a significant (unadjusted) result, and then plot the results. There are 82 LVs significantly enriched for drug targets (again, when considering the 95th percentile of genes in the LV).
Here’s a table of all results. p.adjust shows nothing as being significantly enriched, but I don’t know enough about how the GSEA
function does the p-value calculation to know if this makes much sense in this context. E.g. it’s pretty clear that there are many druggable targets in each of the LVs in the table (see plots further down), but after adjustment nothing is signficant. One possibility is that the “gene set” of druggable targets is too large at about ~1500 genes?
apply(tidy_filt, 1 , function(x){
res <- x[['res']] %>% #we need to fix p.adj because of the way this was run
mutate(p.adjust = p.adjust(pvalue, n = nrow(tidy))) %>%
mutate(lv = x[['lv']]) %>%
select(lv,setSize, enrichmentScore, NES,pvalue,p.adjust,
leading_edge, core_enrichment)
res
}) %>% bind_rows() %>% DT::datatable()
Let’s plot these results.
apply(tidy_filt, 1, function(x){
plt <- x[['plot']]
res <- x[['res']] %>% #we need to fix p.adj because of the way this was run
mutate(p.adjust = p.adjust(pvalue, n = nrow(tidy))) %>%
mutate(lv = x[['lv']])
tab_1 <- select(res,lv,setSize, enrichmentScore, NES,pvalue,p.adjust) %>%
ggpubr::ggtexttable(theme = ggpubr::ttheme(base_size = 10))
tab_2 <- select(res,leading_edge, core_enrichment) %>%
mutate(core_enrichment = stringr::str_trunc(core_enrichment, width = 30)) %>%
ggpubr::ggtexttable(theme = ggpubr::ttheme(base_size = 10))
cowplot::plot_grid(plt, tab_1, tab_2, nrow=3,ncol=1, rel_heights = c(5,1,1))
})
## [[1]]
##
## [[2]]
##
## [[3]]
##
## [[4]]
##
## [[5]]
##
## [[6]]
##
## [[7]]
##
## [[8]]
##
## [[9]]
##
## [[10]]
##
## [[11]]
##
## [[12]]
##
## [[13]]
##
## [[14]]
##
## [[15]]
##
## [[16]]
##
## [[17]]
##
## [[18]]
##
## [[19]]
##
## [[20]]
##
## [[21]]
##
## [[22]]
##
## [[23]]
##
## [[24]]
##
## [[25]]
##
## [[26]]
##
## [[27]]
##
## [[28]]
##
## [[29]]
##
## [[30]]
##
## [[31]]
##
## [[32]]
##
## [[33]]
##
## [[34]]
##
## [[35]]
##
## [[36]]
##
## [[37]]
##
## [[38]]
##
## [[39]]
##
## [[40]]
##
## [[41]]
##
## [[42]]
##
## [[43]]
##
## [[44]]
##
## [[45]]
##
## [[46]]
##
## [[47]]
##
## [[48]]
##
## [[49]]
##
## [[50]]
##
## [[51]]
##
## [[52]]
##
## [[53]]
##
## [[54]]
##
## [[55]]
##
## [[56]]
##
## [[57]]
##
## [[58]]
##
## [[59]]
##
## [[60]]
##
## [[61]]
##
## [[62]]
##
## [[63]]
##
## [[64]]
##
## [[65]]
##
## [[66]]
##
## [[67]]
##
## [[68]]
##
## [[69]]
##
## [[70]]
##
## [[71]]
##
## [[72]]
##
## [[73]]
##
## [[74]]
##
## [[75]]
##
## [[76]]
##
## [[77]]
##
## [[78]]
##
## [[79]]
##
## [[80]]
##
## [[81]]
##
## [[82]]
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] clusterProfiler_3.14.0 synapser_0.6.61 purrr_0.3.3
## [4] tidyr_1.0.0 dplyr_0.8.3
##
## loaded via a namespace (and not attached):
## [1] enrichplot_1.6.0 bit64_0.9-7 RColorBrewer_1.1-2
## [4] progress_1.2.2 httr_1.4.1 tools_3.6.1
## [7] backports_1.1.5 DT_0.9 R6_2.4.0
## [10] DBI_1.0.0 lazyeval_0.2.2 BiocGenerics_0.32.0
## [13] colorspace_1.4-1 tidyselect_0.2.5 gridExtra_2.3
## [16] prettyunits_1.0.2 bit_1.1-14 compiler_3.6.1
## [19] Biobase_2.46.0 xml2_1.2.2 labeling_0.3
## [22] triebeard_0.3.0 scales_1.0.0 readr_1.3.1
## [25] ggridges_0.5.1 stringr_1.4.0 digest_0.6.22
## [28] rmarkdown_1.16 DOSE_3.12.0 pkgconfig_2.0.3
## [31] htmltools_0.4.0 fastmap_1.0.1 htmlwidgets_1.5.1
## [34] rlang_0.4.1 RSQLite_2.1.2 shiny_1.4.0
## [37] gridGraphics_0.4-1 farver_1.1.0 jsonlite_1.6
## [40] crosstalk_1.0.0 BiocParallel_1.20.0 GOSemSim_2.12.0
## [43] magrittr_1.5 feather_0.3.5 ggplotify_0.0.4
## [46] GO.db_3.10.0 Matrix_1.2-17 Rcpp_1.0.2
## [49] munsell_0.5.0 S4Vectors_0.24.0 viridis_0.5.1
## [52] lifecycle_0.1.0 stringi_1.4.3 yaml_2.2.0
## [55] ggraph_2.0.0 MASS_7.3-51.4 plyr_1.8.4
## [58] qvalue_2.18.0 grid_3.6.1 blob_1.2.0
## [61] promises_1.1.0 parallel_3.6.1 ggrepel_0.8.1
## [64] DO.db_2.9 crayon_1.3.4 lattice_0.20-38
## [67] graphlayouts_0.5.0 cowplot_1.0.0 splines_3.6.1
## [70] hms_0.5.2 zeallot_0.1.0 knitr_1.25
## [73] pillar_1.4.2 ggpubr_0.2.3 fgsea_1.12.0
## [76] igraph_1.2.4.1 ggsignif_0.6.0 reshape2_1.4.3
## [79] codetools_0.2-16 PythonEmbedInR_0.3.37 stats4_3.6.1
## [82] fastmatch_1.1-0 glue_1.3.1 evaluate_0.14
## [85] BiocManager_1.30.9 data.table_1.12.6 httpuv_1.5.2
## [88] vctrs_0.2.0 tweenr_1.0.1 urltools_1.7.3
## [91] gtable_0.3.0 polyclip_1.10-0 assertthat_0.2.1
## [94] ggplot2_3.2.1 pack_0.1-1 xfun_0.10
## [97] ggforce_0.3.1 mime_0.7 europepmc_0.3
## [100] xtable_1.8-4 tidygraph_1.1.2 later_1.0.0
## [103] viridisLite_0.3.0 tibble_2.1.3 rvcheck_0.1.6
## [106] AnnotationDbi_1.48.0 memoise_1.1.0 IRanges_2.20.0
## [109] ellipsis_0.3.0