Here, we are evaluating the expression of latent variables in NF tumors as they pertain to the sex of the patient. These data were generated as part of developing the braiNFood app.

Import packages

First, import packages to process and plot the data. We’ll use infer to perform significance testing.

library(synapser)
synLogin()
## Welcome, Robert Allaway!
## NULL
library(dplyr)
library(purrr)
library(tidyr)
library(infer)
library(stringr)
library(ggbeeswarm)

mp_res<-synTableQuery("SELECT * FROM syn21046991")$filepath %>% 
  readr::read_csv() %>% 
  filter(!is.na(sex)) 
## 
Create CSV FileHandle [##########----------]50.32%   79390/157768       
Create CSV FileHandle [####################]100.00%   157768/157768   Done...    
Downloading  [#-------------------]7.43%   2.0MB/26.9MB (2.9MB/s) Job-99654934488548328164858682.csv     
Downloading  [###-----------------]14.85%   4.0MB/26.9MB (3.9MB/s) Job-99654934488548328164858682.csv     
Downloading  [####----------------]22.28%   6.0MB/26.9MB (5.0MB/s) Job-99654934488548328164858682.csv     
Downloading  [######--------------]29.71%   8.0MB/26.9MB (5.5MB/s) Job-99654934488548328164858682.csv     
Downloading  [#######-------------]37.13%   10.0MB/26.9MB (5.9MB/s) Job-99654934488548328164858682.csv     
Downloading  [#########-----------]44.56%   12.0MB/26.9MB (6.4MB/s) Job-99654934488548328164858682.csv     
Downloading  [##########----------]51.99%   14.0MB/26.9MB (6.9MB/s) Job-99654934488548328164858682.csv     
Downloading  [############--------]59.41%   16.0MB/26.9MB (6.8MB/s) Job-99654934488548328164858682.csv     
Downloading  [#############-------]66.84%   18.0MB/26.9MB (6.9MB/s) Job-99654934488548328164858682.csv     
Downloading  [###############-----]74.27%   20.0MB/26.9MB (7.0MB/s) Job-99654934488548328164858682.csv     
Downloading  [################----]81.69%   22.0MB/26.9MB (7.1MB/s) Job-99654934488548328164858682.csv     
Downloading  [##################--]89.12%   24.0MB/26.9MB (7.1MB/s) Job-99654934488548328164858682.csv     
Downloading  [###################-]96.54%   26.0MB/26.9MB (7.3MB/s) Job-99654934488548328164858682.csv     
Downloading  [####################]100.00%   26.9MB/26.9MB (7.0MB/s) Job-99654934488548328164858682.csv Done...

Significance testing

We already have the expression of the latent variables in these tumors, and we’ve filtered out samples that have low variance across the cohort as part of that data generation process.

Then, we define a function to perform a t-test based on the expression of a latent variable as a function of the sex of the patient the sample was taken from.

Then we take the tidy data frame of latent variable data, group by the variable and tumor type, nest the dataframe based on those groups, and then calculate the p-value for each nested data frame. Finally, we plot a boxplot for any latent variable where the BH-adjusted p-value is <0.1 when comparing female to male tumors.

In addition, we’ll also do this analysis between the two sexes without regard for the tumor type to see if there are any consistent differences when the type of tumor is not a factor.

ttest <- function(x){
  out <- tryCatch(
        {
          bar <- x %>% t_test(formula = value ~ sex, order = c("Female", "Male"))
          p <- p.adjust(bar$p_value, n = length(unique(mp_res$latent_var)), method = "BH")
           return(p)
        }, error=function(cond) {
            return(NA)
        })
}


res_model <- mp_res %>% 
  group_by(latent_var, tumorType) %>% 
  nest() %>% 
  mutate(pval = map(data,ttest) %>% 
           as.numeric %>% 
           round(., digits = 3)) %>% 
  filter(pval < 0.1) %>% 
  mutate(title = paste0(latent_var, " in ", tumorType, " BH p-value = ", pval) %>% 
           str_wrap(., width = 40)) %>% 
  mutate(plots = map2(title, data, function(.x,.y){
      ggplot(data = .y) +
      geom_boxplot(aes(x = sex, y = value, fill = sex)) +
      geom_beeswarm(aes(x = sex, y = value)) +
      ggtitle(.x) +
      theme_bw()
  }))


res_no_grouping <- mp_res %>% 
  group_by(latent_var) %>% 
  nest() %>% 
  mutate(pval = map(data,ttest) %>% 
           as.numeric %>% 
           round(., digits = 3)) %>% 
  filter(pval < 0.1) %>% 
  mutate(title = paste0(latent_var, " BH p-value = ", pval) %>% 
                      str_wrap(., width = 40)) %>% 
  mutate(plots = map2(title, data, function(.x,.y){
      ggplot(data = .y) +
      geom_boxplot(aes(x = sex, y = value, fill = sex)) +
      geom_beeswarm(aes(x = sex, y = value)) +
      ggtitle(.x) +
      theme_bw()
  }))

Plots

Tumor Specific

Here are the latent variables where BH p < 0.1 when grouping by tumor type.

res_model$plots
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

## 
## [[12]]

## 
## [[13]]

## 
## [[14]]

## 
## [[15]]

## 
## [[16]]

## 
## [[17]]

## 
## [[18]]

## 
## [[19]]

## 
## [[20]]

## 
## [[21]]

All Tumors at Once

Here are the latent variables where BH p < 0.1 when considering all tumors at once.

res_no_grouping$plots
## [[1]]