Synapse tables with data

For this data we are using any data for which there are gene variants (cNFs, pNFs, MPNSTs): - Exome-Seq variants - WGS Variants

We also have Immune Data - Tumor Deconvolution Table

Let’s see if there are any correlations between specific variants and immune populations. First we collect all the data and find ‘recurrent’ mutations. Currently we’re focusing on mutations that occur in at least 3 samples.

wgs.vars=synTableQuery("SELECT Hugo_Symbol,Protein_position,specimenID,IMPACT,FILTER,ExAC_AF,gnomAD_AF FROM syn20551862")$asDataFrame()
## 
Downloading  [#-------------------]3.11%   2.0MB/64.3MB (2.4MB/s) Job-9963192151260824318551758.csv     
Downloading  [#-------------------]6.22%   4.0MB/64.3MB (3.2MB/s) Job-9963192151260824318551758.csv     
Downloading  [##------------------]9.33%   6.0MB/64.3MB (3.7MB/s) Job-9963192151260824318551758.csv     
Downloading  [##------------------]12.44%   8.0MB/64.3MB (4.2MB/s) Job-9963192151260824318551758.csv     
Downloading  [###-----------------]15.55%   10.0MB/64.3MB (4.6MB/s) Job-9963192151260824318551758.csv     
Downloading  [####----------------]18.66%   12.0MB/64.3MB (5.1MB/s) Job-9963192151260824318551758.csv     
Downloading  [####----------------]21.77%   14.0MB/64.3MB (5.3MB/s) Job-9963192151260824318551758.csv     
Downloading  [#####---------------]24.88%   16.0MB/64.3MB (5.6MB/s) Job-9963192151260824318551758.csv     
Downloading  [######--------------]27.99%   18.0MB/64.3MB (5.9MB/s) Job-9963192151260824318551758.csv     
Downloading  [######--------------]31.10%   20.0MB/64.3MB (6.3MB/s) Job-9963192151260824318551758.csv     
Downloading  [#######-------------]34.21%   22.0MB/64.3MB (6.5MB/s) Job-9963192151260824318551758.csv     
Downloading  [#######-------------]37.32%   24.0MB/64.3MB (6.9MB/s) Job-9963192151260824318551758.csv     
Downloading  [########------------]40.43%   26.0MB/64.3MB (7.2MB/s) Job-9963192151260824318551758.csv     
Downloading  [#########-----------]43.54%   28.0MB/64.3MB (7.5MB/s) Job-9963192151260824318551758.csv     
Downloading  [#########-----------]46.65%   30.0MB/64.3MB (7.8MB/s) Job-9963192151260824318551758.csv     
Downloading  [##########----------]49.76%   32.0MB/64.3MB (8.1MB/s) Job-9963192151260824318551758.csv     
Downloading  [###########---------]52.88%   34.0MB/64.3MB (8.3MB/s) Job-9963192151260824318551758.csv     
Downloading  [###########---------]55.99%   36.0MB/64.3MB (8.6MB/s) Job-9963192151260824318551758.csv     
Downloading  [############--------]59.10%   38.0MB/64.3MB (8.8MB/s) Job-9963192151260824318551758.csv     
Downloading  [############--------]62.21%   40.0MB/64.3MB (9.0MB/s) Job-9963192151260824318551758.csv     
Downloading  [#############-------]65.32%   42.0MB/64.3MB (9.2MB/s) Job-9963192151260824318551758.csv     
Downloading  [##############------]68.43%   44.0MB/64.3MB (9.4MB/s) Job-9963192151260824318551758.csv     
Downloading  [##############------]71.54%   46.0MB/64.3MB (9.6MB/s) Job-9963192151260824318551758.csv     
Downloading  [###############-----]74.65%   48.0MB/64.3MB (9.8MB/s) Job-9963192151260824318551758.csv     
Downloading  [################----]77.76%   50.0MB/64.3MB (10.0MB/s) Job-9963192151260824318551758.csv     
Downloading  [################----]80.87%   52.0MB/64.3MB (10.2MB/s) Job-9963192151260824318551758.csv     
Downloading  [#################---]83.98%   54.0MB/64.3MB (10.3MB/s) Job-9963192151260824318551758.csv     
Downloading  [#################---]87.09%   56.0MB/64.3MB (10.5MB/s) Job-9963192151260824318551758.csv     
Downloading  [##################--]90.20%   58.0MB/64.3MB (10.6MB/s) Job-9963192151260824318551758.csv     
Downloading  [###################-]93.31%   60.0MB/64.3MB (10.4MB/s) Job-9963192151260824318551758.csv     
Downloading  [###################-]96.42%   62.0MB/64.3MB (10.1MB/s) Job-9963192151260824318551758.csv     
Downloading  [####################]99.53%   64.0MB/64.3MB (10.0MB/s) Job-9963192151260824318551758.csv     
Downloading  [####################]100.00%   64.3MB/64.3MB (10.0MB/s) Job-9963192151260824318551758.csv Done...
exome.vars=synTableQuery("SELECT Hugo_Symbol,Protein_position,specimenID,IMPACT,FILTER,ExAC_AF,gnomAD_AF FROM syn20554939")$asDataFrame()
## 
Downloading  [--------------------]2.42%   2.0MB/82.5MB (8.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [#-------------------]4.85%   4.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [#-------------------]7.27%   6.0MB/82.5MB (7.6MB/s) Job-99631939200806938668386157.csv     
Downloading  [##------------------]9.69%   8.0MB/82.5MB (7.6MB/s) Job-99631939200806938668386157.csv     
Downloading  [##------------------]12.11%   10.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [###-----------------]14.54%   12.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [###-----------------]16.96%   14.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [####----------------]19.38%   16.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [####----------------]21.81%   18.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [#####---------------]24.23%   20.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [#####---------------]26.65%   22.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [######--------------]29.07%   24.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [######--------------]31.50%   26.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [#######-------------]33.92%   28.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [#######-------------]36.34%   30.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [########------------]38.77%   32.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [########------------]41.19%   34.0MB/82.5MB (8.0MB/s) Job-99631939200806938668386157.csv     
Downloading  [#########-----------]43.61%   36.0MB/82.5MB (8.0MB/s) Job-99631939200806938668386157.csv     
Downloading  [#########-----------]46.03%   38.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [##########----------]48.46%   40.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [##########----------]50.88%   42.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [###########---------]53.30%   44.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [###########---------]55.73%   46.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [############--------]58.15%   48.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [############--------]60.57%   50.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [#############-------]62.99%   52.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [#############-------]65.42%   54.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [##############------]67.84%   56.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [##############------]70.26%   58.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [###############-----]72.69%   60.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [###############-----]75.11%   62.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [################----]77.53%   64.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [################----]79.95%   66.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [################----]82.38%   68.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [#################---]84.80%   70.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [#################---]87.22%   72.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [##################--]89.65%   74.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [##################--]92.07%   76.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv     
Downloading  [###################-]94.49%   78.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [###################-]96.91%   80.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [####################]99.34%   82.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv     
Downloading  [####################]100.00%   82.5MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv Done...
all.vars<-rbind(select(wgs.vars,'Hugo_Symbol','Protein_position','specimenID','IMPACT','gnomAD_AF'),
    select(exome.vars,'Hugo_Symbol','Protein_position','specimenID','IMPACT','gnomAD_AF'))%>%
  subset(gnomAD_AF<0.01)

imm.tab<-synapser::synTableQuery("SELECT * FROM syn20710536")$asDataFrame()%>%
  select(-c(ROW_ID,ROW_VERSION))
## 
Downloading  [####################]100.00%   1.1MB/1.1MB (8.9MB/s) Job-99631947456270340357717309.csv Done...
samps<-intersect(imm.tab$specimenID,all.vars$specimenID)

tab<-imm.tab%>%subset(specimenID%in%samps)%>%
    left_join(all.vars,by='specimenID')

tab<-subset(tab,!tumorType%in%c('Other','High Grade Glioma','Low Grade Glioma'))

top.genes=tab%>%group_by(tumorType)%>%
  mutate(numSamps=n_distinct(specimenID))%>%
      group_by(tumorType,Hugo_Symbol)%>%
    mutate(numMutated=n_distinct(specimenID))%>%
    ungroup()%>%
  subset(numMutated>1)%>%
      subset(numMutated<(numSamps-1))%>%
  select(tumorType,Hugo_Symbol,numSamps,numMutated)%>%distinct()

gene.count=top.genes%>%group_by(tumorType)%>%mutate(numGenes=n_distinct(Hugo_Symbol))%>%select(tumorType,numGenes)%>%distinct()
DT::datatable(gene.count)

Ok, we have some genes and counts that we can look at!

Test significance of each gene/immune population

Now we can loop through every tumor type and gene

red.genes<-c("NF1","SUZ12","CDKN2A","EED")##for testing

vals<-tab%>%#subset(Hugo_Symbol%in%red.genes)%>%
    mutate(mutated=ifelse(is.na(IMPACT),'WT','Mutated'))%>%
  select(cell_type,method,tumorType,score,Hugo_Symbol,specimenID,mutated)%>%
  distinct()%>%
  spread(key=Hugo_Symbol,value='mutated',fill='WT')

counts<-vals%>%
  gather(key=gene,value=status,-c(cell_type,method,tumorType,score,specimenID))%>% 
    select(cell_type,method,tumorType,score,gene,specimenID,status)%>%
    group_by(cell_type,method,tumorType,gene)%>%
    mutate(numVals=n_distinct(status))%>%
    subset(numVals==2)%>%ungroup()

#so now we have only 
with.sig<-counts%>%ungroup()%>%subset(gene%in%top.genes$Hugo_Symbol)%>%
  subset(method!='xcell')%>%
    group_by(method,cell_type,gene)%>%
  mutate(pval=t.test(score~status)$p.value)%>%ungroup()%>%
  group_by(cell_type,method)%>%
  mutate(corP=p.adjust(pval))%>%ungroup()%>%
  select(cell_type,method,gene,pval,corP)%>%distinct()

sig.vals<-subset(with.sig,corP<0.3)

DT::datatable(sig.vals)

Interesting! Some genes actually pass p-value correction. What do they look like? Here let’s write the messiest possible code to print.

##i am sooo not proud of this horrible code. apologies!!!
for(method in c('cibersort','mcp_counter')){
  ##first let's reduce the counts and signifiance measures
  rcounts<-subset(counts, method==method)
  nplot<-sig.vals[which(sig.vals$method==method),]##subset fails here,      

  for(ct in unique(nplot$cell_type)){
    tplot<-nplot[which(nplot$cell_type==ct),]
    if(nrow(tplot)==0)
      next
    print(tplot)
    p<-rcounts%>%
    subset(cell_type==ct)%>%
    subset(gene%in%tplot$gene)%>%
    ggplot(aes(x=gene,y=score,col=status))+
    geom_boxplot(outlier.shape=NA)+
    geom_point(position=position_jitterdodge(),aes(group=status))+
    theme(axis.text.x = element_text(angle = 90, hjust = 1))+
    ggtitle(paste(method,ct,'predictions'))
    if(method=='cibersort')
      p<-p+scale_y_log10()
    print(p)
  }
}
## # A tibble: 10 x 5
##    cell_type    method    gene              pval   corP
##    <chr>        <chr>     <chr>            <dbl>  <dbl>
##  1 B cell naive cibersort AC078925.1 0.000000587 0.0791
##  2 B cell naive cibersort C20orf26   0.000000496 0.0668
##  3 B cell naive cibersort CENPE      0.000000848 0.114 
##  4 B cell naive cibersort FAM160A2   0.000000681 0.0917
##  5 B cell naive cibersort IL21R      0.00000193  0.260 
##  6 B cell naive cibersort PLEKHH3    0.000000227 0.0306
##  7 B cell naive cibersort PRDM2      0.000000679 0.0915
##  8 B cell naive cibersort RALBP1     0.000000681 0.0917
##  9 B cell naive cibersort SERPINB6   0.000000205 0.0276
## 10 B cell naive cibersort ZDHHC11B   0.000000587 0.0791

## # A tibble: 21 x 5
##    cell_type       method    gene             pval    corP
##    <chr>           <chr>     <chr>           <dbl>   <dbl>
##  1 NK cell resting cibersort AMPD1    0.00000122   0.164  
##  2 NK cell resting cibersort ANKRD24  0.0000000169 0.00229
##  3 NK cell resting cibersort ATP8B4   0.00000122   0.164  
##  4 NK cell resting cibersort CCL25    0.00000122   0.164  
##  5 NK cell resting cibersort FAM21C   0.00000122   0.164  
##  6 NK cell resting cibersort GLMN     0.00000122   0.164  
##  7 NK cell resting cibersort GOLGA6A  0.00000122   0.164  
##  8 NK cell resting cibersort IGHV3-66 0.00000122   0.164  
##  9 NK cell resting cibersort LGALS9C  0.00000122   0.164  
## 10 NK cell resting cibersort NID1     0.00000122   0.164  
## # … with 11 more rows

## # A tibble: 14 x 5
##    cell_type                    method    gene          pval         corP
##    <chr>                        <chr>     <chr>        <dbl>        <dbl>
##  1 T cell CD4+ memory activated cibersort CEP104    3.94e- 7 0.0408      
##  2 T cell CD4+ memory activated cibersort INF2      3.94e- 7 0.0408      
##  3 T cell CD4+ memory activated cibersort KIAA0226L 1.76e-13 0.0000000183
##  4 T cell CD4+ memory activated cibersort LGR6      3.94e- 7 0.0408      
##  5 T cell CD4+ memory activated cibersort MYO3B     1.76e-13 0.0000000183
##  6 T cell CD4+ memory activated cibersort NOP56     3.94e- 7 0.0408      
##  7 T cell CD4+ memory activated cibersort RHBDF2    1.76e-13 0.0000000183
##  8 T cell CD4+ memory activated cibersort SH3BP4    1.76e-13 0.0000000183
##  9 T cell CD4+ memory activated cibersort SLC22A14  3.94e- 7 0.0408      
## 10 T cell CD4+ memory activated cibersort SLC27A3   1.76e-13 0.0000000183
## 11 T cell CD4+ memory activated cibersort SORL1     3.94e- 7 0.0408      
## 12 T cell CD4+ memory activated cibersort TAS2R43   3.94e- 7 0.0408      
## 13 T cell CD4+ memory activated cibersort ZNF48     3.94e- 7 0.0408      
## 14 T cell CD4+ memory activated cibersort ZNF687    3.94e- 7 0.0408

## # A tibble: 1 x 5
##   cell_type         method    gene           pval   corP
##   <chr>             <chr>     <chr>         <dbl>  <dbl>
## 1 NK cell activated cibersort FAM27E1 0.000000125 0.0169

## # A tibble: 1 x 5
##   cell_type     method    gene         pval   corP
##   <chr>         <chr>     <chr>       <dbl>  <dbl>
## 1 Macrophage M2 cibersort KRT18 0.000000125 0.0169

## # A tibble: 3 x 5
##   cell_type                method    gene            pval  corP
##   <chr>                    <chr>     <chr>          <dbl> <dbl>
## 1 T cell follicular helper cibersort KRTAP12-3 0.00000192 0.259
## 2 T cell follicular helper cibersort MCM9      0.00000192 0.259
## 3 T cell follicular helper cibersort MTERFD1   0.00000192 0.259

## # A tibble: 1 x 5
##   cell_type              method    gene              pval     corP
##   <chr>                  <chr>     <chr>            <dbl>    <dbl>
## 1 Dendritic cell resting cibersort KRTAP2-2 0.00000000315 0.000391

## # A tibble: 1 x 5
##   cell_type method    gene          pval  corP
##   <chr>     <chr>     <chr>        <dbl> <dbl>
## 1 Monocyte  cibersort PLEKHG2 0.00000159 0.214

## # A tibble: 1 x 5
##   cell_type     method    gene        pval  corP
##   <chr>         <chr>     <chr>      <dbl> <dbl>
## 1 Macrophage M1 cibersort RTCA  0.00000174 0.234

## # A tibble: 1 x 5
##   cell_type           method    gene         pval  corP
##   <chr>               <chr>     <chr>       <dbl> <dbl>
## 1 Mast cell activated cibersort TAS1R2 0.00000101 0.137

## # A tibble: 9 x 5
##   cell_type  method      gene          pval     corP
##   <chr>      <chr>       <chr>        <dbl>    <dbl>
## 1 Neutrophil mcp_counter C17orf80  1.06e- 6 0.142   
## 2 Neutrophil mcp_counter CNDP1     1.06e- 6 0.142   
## 3 Neutrophil mcp_counter KRTAP4-11 1.06e- 6 0.142   
## 4 Neutrophil mcp_counter MPPE1     1.06e- 6 0.142   
## 5 Neutrophil mcp_counter MRGPRX4   8.09e-10 0.000109
## 6 Neutrophil mcp_counter PRSS48    1.06e- 6 0.142   
## 7 Neutrophil mcp_counter RBM44     8.09e-10 0.000109
## 8 Neutrophil mcp_counter SNAPC2    1.06e- 6 0.142   
## 9 Neutrophil mcp_counter ZSWIM3    8.09e-10 0.000109

## # A tibble: 1 x 5
##   cell_type                    method      gene           pval     corP
##   <chr>                        <chr>       <chr>         <dbl>    <dbl>
## 1 Cancer associated fibroblast mcp_counter DDX49 0.00000000285 0.000384

Next steps

We should look into each individual tumor type, but I was having trouble executing the proper group by’s and gave up. Happy for someone else to give it a whirl

#this is a failed attempt to group by tumor type
#with.sig<-counts%>%ungroup()%>%subset(gene%in%top.genes$Hugo_Symbol)%>%
#    group_by(latent_var,tumorType,gene)%>%
#  mutate(pval=t.test(value~status)$p.value)%>%
#  ungroup()%>%
#  group_by(latent_var)%>%
#  mutate(corP=p.adjust(pval))%>%ungroup()%>%
#  select(latent_var,tumorType,gene,pval,corP)%>%distinct()

#sig.vals<-subset(with.sig,corP<0.05)

#DT::datatable(sig.vals)