Here, we are evaluating the ‘druggability’ of latent variables.

Import packages

First, import packages to process and plot the data.

library(dplyr)
library(tidyr)
library(purrr)
library(synapser)
synLogin()

## Welcome, Robert Allaway!

## NULL

library(clusterProfiler)

plier_model <- readr::read_rds(synGet("syn18689545")$path)

plier_loadings_df <- plier_model$Z %>% as.data.frame() %>% purrr::set_names(rownames(plier_model$B))

drug_targets <- feather::read_feather(synGet('syn20700199')$path)

drug_targets <- drug_targets %>% 
  filter(mean_pchembl > 7) %>% 
  mutate(gene= hugo_gene) %>% 
  select(gene, std_name)

Significance testing

First, create a list of all druggable genes. We’re using the drug-target explorer data, for any gene for which there is drug-target relationship with a mean_pchembl value >7, which corresponds to 1 uM. This gives us a rough approximation of druggable targets in the human genome but could be a bit under conservative.

Then, we’ll select the top 5% of genes by loading in each latent variable. This gives us genes that contribute most strongly to a given LV.

Then, we’ll perform a weighted Kolmogorov-Smirnov test using the GSEA function from the clusterProfiler package to assess whether any of the LVs are enriched in the universe of druggable genes. Here, we’re treating each LV as the ranked list of genes (for standard GSEA, this would be the differentially expressed genes, for example) and treating the list of all drug targets as the gene set we’re looking for enrichment in.

term2gene <- drug_targets %>% 
  mutate(term = 'all_drugs') %>% 
  select(term, gene) %>% 
  distinct()

tidy <- plier_loadings_df %>% 
      tibble::rownames_to_column('gene') %>%
  gather(lv, loading, -gene) %>% 
  filter(loading > 0) %>% 
  group_by(lv) %>% 
  # filter(loading > quantile(loading, 0.95)) %>% ##I think preselecting for percentile weighting might mess up this test
  filter(any(gene %in% term2gene$gene)) %>% ##remove LVs that have no overlap if any exist
  nest() %>% 
  mutate(data = lapply(data, function(x){
    geneList <- x$loading 
    names(geneList) <- as.character(x$gene) 
    geneList <- sort(geneList, decreasing = TRUE)
    geneList
  })) ##nest gene lists, looking at top 5% of genes in each LV 

tidy_2 <- tidy %>% ###run GSEA on nested gene lists vs universe of "druggable genes" 
  mutate(data = sapply(data, function(x){
   tryCatch(GSEA(geneList = x, TERM2GENE = term2gene), error=function(err) NA)
 }))

tidy_3 <- tidy_2 %>%  ##store plots and data frames for LVs with significant results
  mutate(plot = lapply(data, function(x){
    if(nrow(x)>0){
      plot <- gseaplot(x, geneSetID = 1)
    }else{
        plot <- NA
    }
    return(plot)
    })) %>% 
    mutate(res = lapply(data, function(x){
    if(nrow(x)>0){
      res <- x@result
    }else{
        res <- NA
    }
    return(res)
    }))

tidy_filt <- tidy_3 %>% 
  filter(!is.na(plot)) %>% 
  select(-data)

Then we simply look for LVs where we got a significant (unadjusted) result, and then plot the results. There are 82 LVs significantly enriched for drug targets (again, when considering the 95th percentile of genes in the LV).

Here’s a table of all results. p.adjust shows nothing as being significantly enriched, but I don’t know enough about how the GSEA function does the p-value calculation to know if this makes much sense in this context. E.g. it’s pretty clear that there are many druggable targets in each of the LVs in the table (see plots further down), but after adjustment nothing is signficant. One possibility is that the “gene set” of druggable targets is too large at about ~1500 genes?

apply(tidy_filt, 1 , function(x){

  res <- x[['res']] %>%  #we need to fix p.adj because of the way this was run
    mutate(p.adjust = p.adjust(pvalue, n = nrow(tidy))) %>% 
    mutate(lv = x[['lv']]) %>% 
    select(lv,setSize, enrichmentScore, NES,pvalue,p.adjust,
           leading_edge, core_enrichment)
  
  res
  
}) %>% bind_rows() %>% DT::datatable()

Let’s plot these results.

apply(tidy_filt, 1, function(x){

  plt <- x[['plot']]
  
  res <- x[['res']] %>%  #we need to fix p.adj because of the way this was run
    mutate(p.adjust = p.adjust(pvalue, n = nrow(tidy))) %>% 
    mutate(lv = x[['lv']])
  
  tab_1 <- select(res,lv,setSize, enrichmentScore, NES,pvalue,p.adjust) %>% 
    ggpubr::ggtexttable(theme = ggpubr::ttheme(base_size = 10))
  
  tab_2 <- select(res,leading_edge, core_enrichment) %>% 
    mutate(core_enrichment = stringr::str_trunc(core_enrichment, width = 30)) %>% 
    ggpubr::ggtexttable(theme = ggpubr::ttheme(base_size = 10))

  cowplot::plot_grid(plt, tab_1, tab_2, nrow=3,ncol=1, rel_heights = c(5,1,1))
})

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

## 
## [[12]]

## 
## [[13]]

## 
## [[14]]

## 
## [[15]]

## 
## [[16]]

## 
## [[17]]

## 
## [[18]]

## 
## [[19]]

## 
## [[20]]

## 
## [[21]]

## 
## [[22]]

## 
## [[23]]

## 
## [[24]]

## 
## [[25]]

## 
## [[26]]

## 
## [[27]]

## 
## [[28]]

## 
## [[29]]

## 
## [[30]]

## 
## [[31]]

## 
## [[32]]

## 
## [[33]]

## 
## [[34]]

## 
## [[35]]

## 
## [[36]]

## 
## [[37]]

## 
## [[38]]

## 
## [[39]]

## 
## [[40]]

## 
## [[41]]

## 
## [[42]]

## 
## [[43]]

## 
## [[44]]

## 
## [[45]]

## 
## [[46]]

## 
## [[47]]

## 
## [[48]]

## 
## [[49]]

## 
## [[50]]

## 
## [[51]]

## 
## [[52]]

## 
## [[53]]

## 
## [[54]]

## 
## [[55]]

## 
## [[56]]

## 
## [[57]]

## 
## [[58]]

## 
## [[59]]

## 
## [[60]]

## 
## [[61]]

## 
## [[62]]

## 
## [[63]]

## 
## [[64]]

## 
## [[65]]

## 
## [[66]]

## 
## [[67]]

## 
## [[68]]

## 
## [[69]]

## 
## [[70]]

## 
## [[71]]

## 
## [[72]]

## 
## [[73]]

## 
## [[74]]

## 
## [[75]]

## 
## [[76]]

## 
## [[77]]

## 
## [[78]]

## 
## [[79]]

## 
## [[80]]

## 
## [[81]]

## 
## [[82]]

sessionInfo()

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] clusterProfiler_3.14.0 synapser_0.6.61        purrr_0.3.3           
## [4] tidyr_1.0.0            dplyr_0.8.3           
## 
## loaded via a namespace (and not attached):
##   [1] enrichplot_1.6.0      bit64_0.9-7           RColorBrewer_1.1-2   
##   [4] progress_1.2.2        httr_1.4.1            tools_3.6.1          
##   [7] backports_1.1.5       DT_0.9                R6_2.4.0             
##  [10] DBI_1.0.0             lazyeval_0.2.2        BiocGenerics_0.32.0  
##  [13] colorspace_1.4-1      tidyselect_0.2.5      gridExtra_2.3        
##  [16] prettyunits_1.0.2     bit_1.1-14            compiler_3.6.1       
##  [19] Biobase_2.46.0        xml2_1.2.2            labeling_0.3         
##  [22] triebeard_0.3.0       scales_1.0.0          readr_1.3.1          
##  [25] ggridges_0.5.1        stringr_1.4.0         digest_0.6.22        
##  [28] rmarkdown_1.16        DOSE_3.12.0           pkgconfig_2.0.3      
##  [31] htmltools_0.4.0       fastmap_1.0.1         htmlwidgets_1.5.1    
##  [34] rlang_0.4.1           RSQLite_2.1.2         shiny_1.4.0          
##  [37] gridGraphics_0.4-1    farver_1.1.0          jsonlite_1.6         
##  [40] crosstalk_1.0.0       BiocParallel_1.20.0   GOSemSim_2.12.0      
##  [43] magrittr_1.5          feather_0.3.5         ggplotify_0.0.4      
##  [46] GO.db_3.10.0          Matrix_1.2-17         Rcpp_1.0.2           
##  [49] munsell_0.5.0         S4Vectors_0.24.0      viridis_0.5.1        
##  [52] lifecycle_0.1.0       stringi_1.4.3         yaml_2.2.0           
##  [55] ggraph_2.0.0          MASS_7.3-51.4         plyr_1.8.4           
##  [58] qvalue_2.18.0         grid_3.6.1            blob_1.2.0           
##  [61] promises_1.1.0        parallel_3.6.1        ggrepel_0.8.1        
##  [64] DO.db_2.9             crayon_1.3.4          lattice_0.20-38      
##  [67] graphlayouts_0.5.0    cowplot_1.0.0         splines_3.6.1        
##  [70] hms_0.5.2             zeallot_0.1.0         knitr_1.25           
##  [73] pillar_1.4.2          ggpubr_0.2.3          fgsea_1.12.0         
##  [76] igraph_1.2.4.1        ggsignif_0.6.0        reshape2_1.4.3       
##  [79] codetools_0.2-16      PythonEmbedInR_0.3.37 stats4_3.6.1         
##  [82] fastmatch_1.1-0       glue_1.3.1            evaluate_0.14        
##  [85] BiocManager_1.30.9    data.table_1.12.6     httpuv_1.5.2         
##  [88] vctrs_0.2.0           tweenr_1.0.1          urltools_1.7.3       
##  [91] gtable_0.3.0          polyclip_1.10-0       assertthat_0.2.1     
##  [94] ggplot2_3.2.1         pack_0.1-1            xfun_0.10            
##  [97] ggforce_0.3.1         mime_0.7              europepmc_0.3        
## [100] xtable_1.8-4          tidygraph_1.1.2       later_1.0.0          
## [103] viridisLite_0.3.0     tibble_2.1.3          rvcheck_0.1.6        
## [106] AnnotationDbi_1.48.0  memoise_1.1.0         IRanges_2.20.0       
## [109] ellipsis_0.3.0

Tumor type association with latent variable expression in NF tumors

Robert Allaway

2019-11-09

Import packages

Significance testing