For this data we are using any data for which there are gene variants (cNFs, pNFs, MPNSTs): - Exome-Seq variants - WGS Variants
We also have Immune Data - Tumor Deconvolution Table
Let’s see if there are any correlations between specific variants and immune populations. First we collect all the data and find ‘recurrent’ mutations. Currently we’re focusing on mutations that occur in at least 3 samples.
wgs.vars=synTableQuery("SELECT Hugo_Symbol,Protein_position,specimenID,IMPACT,FILTER,ExAC_AF,gnomAD_AF FROM syn20551862")$asDataFrame()
##
Downloading [#-------------------]3.11% 2.0MB/64.3MB (2.4MB/s) Job-9963192151260824318551758.csv
Downloading [#-------------------]6.22% 4.0MB/64.3MB (3.2MB/s) Job-9963192151260824318551758.csv
Downloading [##------------------]9.33% 6.0MB/64.3MB (3.7MB/s) Job-9963192151260824318551758.csv
Downloading [##------------------]12.44% 8.0MB/64.3MB (4.2MB/s) Job-9963192151260824318551758.csv
Downloading [###-----------------]15.55% 10.0MB/64.3MB (4.6MB/s) Job-9963192151260824318551758.csv
Downloading [####----------------]18.66% 12.0MB/64.3MB (5.1MB/s) Job-9963192151260824318551758.csv
Downloading [####----------------]21.77% 14.0MB/64.3MB (5.3MB/s) Job-9963192151260824318551758.csv
Downloading [#####---------------]24.88% 16.0MB/64.3MB (5.6MB/s) Job-9963192151260824318551758.csv
Downloading [######--------------]27.99% 18.0MB/64.3MB (5.9MB/s) Job-9963192151260824318551758.csv
Downloading [######--------------]31.10% 20.0MB/64.3MB (6.3MB/s) Job-9963192151260824318551758.csv
Downloading [#######-------------]34.21% 22.0MB/64.3MB (6.5MB/s) Job-9963192151260824318551758.csv
Downloading [#######-------------]37.32% 24.0MB/64.3MB (6.9MB/s) Job-9963192151260824318551758.csv
Downloading [########------------]40.43% 26.0MB/64.3MB (7.2MB/s) Job-9963192151260824318551758.csv
Downloading [#########-----------]43.54% 28.0MB/64.3MB (7.5MB/s) Job-9963192151260824318551758.csv
Downloading [#########-----------]46.65% 30.0MB/64.3MB (7.8MB/s) Job-9963192151260824318551758.csv
Downloading [##########----------]49.76% 32.0MB/64.3MB (8.1MB/s) Job-9963192151260824318551758.csv
Downloading [###########---------]52.88% 34.0MB/64.3MB (8.3MB/s) Job-9963192151260824318551758.csv
Downloading [###########---------]55.99% 36.0MB/64.3MB (8.6MB/s) Job-9963192151260824318551758.csv
Downloading [############--------]59.10% 38.0MB/64.3MB (8.8MB/s) Job-9963192151260824318551758.csv
Downloading [############--------]62.21% 40.0MB/64.3MB (9.0MB/s) Job-9963192151260824318551758.csv
Downloading [#############-------]65.32% 42.0MB/64.3MB (9.2MB/s) Job-9963192151260824318551758.csv
Downloading [##############------]68.43% 44.0MB/64.3MB (9.4MB/s) Job-9963192151260824318551758.csv
Downloading [##############------]71.54% 46.0MB/64.3MB (9.6MB/s) Job-9963192151260824318551758.csv
Downloading [###############-----]74.65% 48.0MB/64.3MB (9.8MB/s) Job-9963192151260824318551758.csv
Downloading [################----]77.76% 50.0MB/64.3MB (10.0MB/s) Job-9963192151260824318551758.csv
Downloading [################----]80.87% 52.0MB/64.3MB (10.2MB/s) Job-9963192151260824318551758.csv
Downloading [#################---]83.98% 54.0MB/64.3MB (10.3MB/s) Job-9963192151260824318551758.csv
Downloading [#################---]87.09% 56.0MB/64.3MB (10.5MB/s) Job-9963192151260824318551758.csv
Downloading [##################--]90.20% 58.0MB/64.3MB (10.6MB/s) Job-9963192151260824318551758.csv
Downloading [###################-]93.31% 60.0MB/64.3MB (10.4MB/s) Job-9963192151260824318551758.csv
Downloading [###################-]96.42% 62.0MB/64.3MB (10.1MB/s) Job-9963192151260824318551758.csv
Downloading [####################]99.53% 64.0MB/64.3MB (10.0MB/s) Job-9963192151260824318551758.csv
Downloading [####################]100.00% 64.3MB/64.3MB (10.0MB/s) Job-9963192151260824318551758.csv Done...
exome.vars=synTableQuery("SELECT Hugo_Symbol,Protein_position,specimenID,IMPACT,FILTER,ExAC_AF,gnomAD_AF FROM syn20554939")$asDataFrame()
##
Downloading [--------------------]2.42% 2.0MB/82.5MB (8.9MB/s) Job-99631939200806938668386157.csv
Downloading [#-------------------]4.85% 4.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [#-------------------]7.27% 6.0MB/82.5MB (7.6MB/s) Job-99631939200806938668386157.csv
Downloading [##------------------]9.69% 8.0MB/82.5MB (7.6MB/s) Job-99631939200806938668386157.csv
Downloading [##------------------]12.11% 10.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [###-----------------]14.54% 12.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [###-----------------]16.96% 14.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [####----------------]19.38% 16.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [####----------------]21.81% 18.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [#####---------------]24.23% 20.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [#####---------------]26.65% 22.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [######--------------]29.07% 24.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [######--------------]31.50% 26.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [#######-------------]33.92% 28.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [#######-------------]36.34% 30.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [########------------]38.77% 32.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [########------------]41.19% 34.0MB/82.5MB (8.0MB/s) Job-99631939200806938668386157.csv
Downloading [#########-----------]43.61% 36.0MB/82.5MB (8.0MB/s) Job-99631939200806938668386157.csv
Downloading [#########-----------]46.03% 38.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [##########----------]48.46% 40.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [##########----------]50.88% 42.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [###########---------]53.30% 44.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [###########---------]55.73% 46.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [############--------]58.15% 48.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [############--------]60.57% 50.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [#############-------]62.99% 52.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [#############-------]65.42% 54.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [##############------]67.84% 56.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [##############------]70.26% 58.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [###############-----]72.69% 60.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [###############-----]75.11% 62.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [################----]77.53% 64.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [################----]79.95% 66.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [################----]82.38% 68.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [#################---]84.80% 70.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [#################---]87.22% 72.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [##################--]89.65% 74.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [##################--]92.07% 76.0MB/82.5MB (7.9MB/s) Job-99631939200806938668386157.csv
Downloading [###################-]94.49% 78.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [###################-]96.91% 80.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [####################]99.34% 82.0MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv
Downloading [####################]100.00% 82.5MB/82.5MB (7.8MB/s) Job-99631939200806938668386157.csv Done...
all.vars<-rbind(select(wgs.vars,'Hugo_Symbol','Protein_position','specimenID','IMPACT','gnomAD_AF'),
select(exome.vars,'Hugo_Symbol','Protein_position','specimenID','IMPACT','gnomAD_AF'))%>%
subset(gnomAD_AF<0.01)
imm.tab<-synapser::synTableQuery("SELECT * FROM syn20710536")$asDataFrame()%>%
select(-c(ROW_ID,ROW_VERSION))
##
Downloading [####################]100.00% 1.1MB/1.1MB (8.9MB/s) Job-99631947456270340357717309.csv Done...
samps<-intersect(imm.tab$specimenID,all.vars$specimenID)
tab<-imm.tab%>%subset(specimenID%in%samps)%>%
left_join(all.vars,by='specimenID')
tab<-subset(tab,!tumorType%in%c('Other','High Grade Glioma','Low Grade Glioma'))
top.genes=tab%>%group_by(tumorType)%>%
mutate(numSamps=n_distinct(specimenID))%>%
group_by(tumorType,Hugo_Symbol)%>%
mutate(numMutated=n_distinct(specimenID))%>%
ungroup()%>%
subset(numMutated>1)%>%
subset(numMutated<(numSamps-1))%>%
select(tumorType,Hugo_Symbol,numSamps,numMutated)%>%distinct()
gene.count=top.genes%>%group_by(tumorType)%>%mutate(numGenes=n_distinct(Hugo_Symbol))%>%select(tumorType,numGenes)%>%distinct()
DT::datatable(gene.count)
Ok, we have some genes and counts that we can look at!
Now we can loop through every tumor type and gene
red.genes<-c("NF1","SUZ12","CDKN2A","EED")##for testing
vals<-tab%>%#subset(Hugo_Symbol%in%red.genes)%>%
mutate(mutated=ifelse(is.na(IMPACT),'WT','Mutated'))%>%
select(cell_type,method,tumorType,score,Hugo_Symbol,specimenID,mutated)%>%
distinct()%>%
spread(key=Hugo_Symbol,value='mutated',fill='WT')
counts<-vals%>%
gather(key=gene,value=status,-c(cell_type,method,tumorType,score,specimenID))%>%
select(cell_type,method,tumorType,score,gene,specimenID,status)%>%
group_by(cell_type,method,tumorType,gene)%>%
mutate(numVals=n_distinct(status))%>%
subset(numVals==2)%>%ungroup()
#so now we have only
with.sig<-counts%>%ungroup()%>%subset(gene%in%top.genes$Hugo_Symbol)%>%
subset(method!='xcell')%>%
group_by(method,cell_type,gene)%>%
mutate(pval=t.test(score~status)$p.value)%>%ungroup()%>%
group_by(cell_type,method)%>%
mutate(corP=p.adjust(pval))%>%ungroup()%>%
select(cell_type,method,gene,pval,corP)%>%distinct()
sig.vals<-subset(with.sig,corP<0.3)
DT::datatable(sig.vals)
Interesting! Some genes actually pass p-value correction. What do they look like? Here let’s write the messiest possible code to print.
##i am sooo not proud of this horrible code. apologies!!!
for(method in c('cibersort','mcp_counter')){
##first let's reduce the counts and signifiance measures
rcounts<-subset(counts, method==method)
nplot<-sig.vals[which(sig.vals$method==method),]##subset fails here,
for(ct in unique(nplot$cell_type)){
tplot<-nplot[which(nplot$cell_type==ct),]
if(nrow(tplot)==0)
next
print(tplot)
p<-rcounts%>%
subset(cell_type==ct)%>%
subset(gene%in%tplot$gene)%>%
ggplot(aes(x=gene,y=score,col=status))+
geom_boxplot(outlier.shape=NA)+
geom_point(position=position_jitterdodge(),aes(group=status))+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
ggtitle(paste(method,ct,'predictions'))
if(method=='cibersort')
p<-p+scale_y_log10()
print(p)
}
}
## # A tibble: 10 x 5
## cell_type method gene pval corP
## <chr> <chr> <chr> <dbl> <dbl>
## 1 B cell naive cibersort AC078925.1 0.000000587 0.0791
## 2 B cell naive cibersort C20orf26 0.000000496 0.0668
## 3 B cell naive cibersort CENPE 0.000000848 0.114
## 4 B cell naive cibersort FAM160A2 0.000000681 0.0917
## 5 B cell naive cibersort IL21R 0.00000193 0.260
## 6 B cell naive cibersort PLEKHH3 0.000000227 0.0306
## 7 B cell naive cibersort PRDM2 0.000000679 0.0915
## 8 B cell naive cibersort RALBP1 0.000000681 0.0917
## 9 B cell naive cibersort SERPINB6 0.000000205 0.0276
## 10 B cell naive cibersort ZDHHC11B 0.000000587 0.0791
## # A tibble: 21 x 5
## cell_type method gene pval corP
## <chr> <chr> <chr> <dbl> <dbl>
## 1 NK cell resting cibersort AMPD1 0.00000122 0.164
## 2 NK cell resting cibersort ANKRD24 0.0000000169 0.00229
## 3 NK cell resting cibersort ATP8B4 0.00000122 0.164
## 4 NK cell resting cibersort CCL25 0.00000122 0.164
## 5 NK cell resting cibersort FAM21C 0.00000122 0.164
## 6 NK cell resting cibersort GLMN 0.00000122 0.164
## 7 NK cell resting cibersort GOLGA6A 0.00000122 0.164
## 8 NK cell resting cibersort IGHV3-66 0.00000122 0.164
## 9 NK cell resting cibersort LGALS9C 0.00000122 0.164
## 10 NK cell resting cibersort NID1 0.00000122 0.164
## # … with 11 more rows
## # A tibble: 14 x 5
## cell_type method gene pval corP
## <chr> <chr> <chr> <dbl> <dbl>
## 1 T cell CD4+ memory activated cibersort CEP104 3.94e- 7 0.0408
## 2 T cell CD4+ memory activated cibersort INF2 3.94e- 7 0.0408
## 3 T cell CD4+ memory activated cibersort KIAA0226L 1.76e-13 0.0000000183
## 4 T cell CD4+ memory activated cibersort LGR6 3.94e- 7 0.0408
## 5 T cell CD4+ memory activated cibersort MYO3B 1.76e-13 0.0000000183
## 6 T cell CD4+ memory activated cibersort NOP56 3.94e- 7 0.0408
## 7 T cell CD4+ memory activated cibersort RHBDF2 1.76e-13 0.0000000183
## 8 T cell CD4+ memory activated cibersort SH3BP4 1.76e-13 0.0000000183
## 9 T cell CD4+ memory activated cibersort SLC22A14 3.94e- 7 0.0408
## 10 T cell CD4+ memory activated cibersort SLC27A3 1.76e-13 0.0000000183
## 11 T cell CD4+ memory activated cibersort SORL1 3.94e- 7 0.0408
## 12 T cell CD4+ memory activated cibersort TAS2R43 3.94e- 7 0.0408
## 13 T cell CD4+ memory activated cibersort ZNF48 3.94e- 7 0.0408
## 14 T cell CD4+ memory activated cibersort ZNF687 3.94e- 7 0.0408
## # A tibble: 1 x 5
## cell_type method gene pval corP
## <chr> <chr> <chr> <dbl> <dbl>
## 1 NK cell activated cibersort FAM27E1 0.000000125 0.0169
## # A tibble: 1 x 5
## cell_type method gene pval corP
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Macrophage M2 cibersort KRT18 0.000000125 0.0169
## # A tibble: 3 x 5
## cell_type method gene pval corP
## <chr> <chr> <chr> <dbl> <dbl>
## 1 T cell follicular helper cibersort KRTAP12-3 0.00000192 0.259
## 2 T cell follicular helper cibersort MCM9 0.00000192 0.259
## 3 T cell follicular helper cibersort MTERFD1 0.00000192 0.259
## # A tibble: 1 x 5
## cell_type method gene pval corP
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Dendritic cell resting cibersort KRTAP2-2 0.00000000315 0.000391
## # A tibble: 1 x 5
## cell_type method gene pval corP
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Monocyte cibersort PLEKHG2 0.00000159 0.214
## # A tibble: 1 x 5
## cell_type method gene pval corP
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Macrophage M1 cibersort RTCA 0.00000174 0.234
## # A tibble: 1 x 5
## cell_type method gene pval corP
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Mast cell activated cibersort TAS1R2 0.00000101 0.137
## # A tibble: 9 x 5
## cell_type method gene pval corP
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Neutrophil mcp_counter C17orf80 1.06e- 6 0.142
## 2 Neutrophil mcp_counter CNDP1 1.06e- 6 0.142
## 3 Neutrophil mcp_counter KRTAP4-11 1.06e- 6 0.142
## 4 Neutrophil mcp_counter MPPE1 1.06e- 6 0.142
## 5 Neutrophil mcp_counter MRGPRX4 8.09e-10 0.000109
## 6 Neutrophil mcp_counter PRSS48 1.06e- 6 0.142
## 7 Neutrophil mcp_counter RBM44 8.09e-10 0.000109
## 8 Neutrophil mcp_counter SNAPC2 1.06e- 6 0.142
## 9 Neutrophil mcp_counter ZSWIM3 8.09e-10 0.000109
## # A tibble: 1 x 5
## cell_type method gene pval corP
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Cancer associated fibroblast mcp_counter DDX49 0.00000000285 0.000384
We should look into each individual tumor type, but I was having trouble executing the proper group by’s and gave up. Happy for someone else to give it a whirl
#this is a failed attempt to group by tumor type
#with.sig<-counts%>%ungroup()%>%subset(gene%in%top.genes$Hugo_Symbol)%>%
# group_by(latent_var,tumorType,gene)%>%
# mutate(pval=t.test(value~status)$p.value)%>%
# ungroup()%>%
# group_by(latent_var)%>%
# mutate(corP=p.adjust(pval))%>%ungroup()%>%
# select(latent_var,tumorType,gene,pval,corP)%>%distinct()
#sig.vals<-subset(with.sig,corP<0.05)
#DT::datatable(sig.vals)