We have two tables on synapse that have cNF patient data in them - Patient Table - Sample Table
pat.tab<-synapser::synTableQuery("SELECT Patient,Race,Gender,Age,Pain,Itching FROM syn7342635")$asDataFrame()%>%
select(-c(ROW_ID,ROW_VERSION))
samp.tab<-synapser::synTableQuery("SELECT Patient,TumorNumber,Length_in_mm FROM syn5556216")$asDataFrame()%>%
select(-c(ROW_ID,ROW_VERSION))
pat.tab
## Patient Race Gender Age Pain Itching
## 1 1 White Female 48 FALSE TRUE
## 2 2 Asian Female 34 FALSE FALSE
## 3 3 Hispanic Female 48 FALSE FALSE
## 4 4 White Female 36 FALSE TRUE
## 5 5 White Male 65 FALSE FALSE
## 6 6 White Female 46 FALSE TRUE
## 7 8 Black Female 27 FALSE FALSE
## 8 9 White Male 35 FALSE FALSE
## 9 10 White Male 58 FALSE FALSE
## 10 11 Black Female 30 TRUE TRUE
## 11 13 White Female 41 TRUE TRUE
samp.tab
## Patient TumorNumber Length_in_mm
## 1 1 3 NA
## 2 1 4 14
## 3 1 6 14
## 4 1 9 15
## 5 1 1 15
## 6 1 2 13
## 7 2 4 7
## 8 2 6 6
## 9 2 7 5
## 10 2 10 7
## 11 2 1 5
## 12 2 2 7
## 13 2 8 9
## 14 2 9 7
## 15 3 1 11
## 16 3 2 16
## 17 3 3 20
## 18 3 4 15
## 19 4 1 8
## 20 4 10 7
## 21 4 4 5
## 22 4 9 10
## 23 5 11 10
## 24 5 12 9
## 25 5 13 11
## 26 5 15 9
## 27 5 5 8
## 28 5 8 6
## 29 6 4 15
## 30 6 5 10
## 31 6 6 15
## 32 6 7 13
## 33 8 1 10
## 34 8 5 13
## 35 8 7 12
## 36 8 9 5
## 37 8 4 10
## 38 8 6 10
## 39 9 1 12
## 40 9 6 5
## 41 9 7 5
## 42 9 10 8
## 43 10 1 11
## 44 10 2 20
## 45 10 3 10
## 46 10 5 20
## 47 10 4 5
## 48 11 1 13
## 49 11 7 10
## 50 11 8 10
## 51 11 14 20
## 52 11 2 5
## 53 11 3 3
## 54 13 1 18
## 55 13 2 7
## 56 13 3 12
## 57 13 7 6
## 58 1 NA NA
## 59 2 NA NA
## 60 3 NA NA
## 61 4 NA NA
## 62 5 NA NA
## 63 6 NA NA
## 64 8 NA NA
## 65 9 NA NA
## 66 10 NA NA
## 67 11 NA NA
## 68 9 3 13
For the purposes of this analysis we want to have just the age, sex, tumor size and and ‘reformed’ patient name
full.tab<-samp.tab%>%left_join(pat.tab,by='Patient')%>%
mutate(specimenID=paste0('patient',Patient,'tumor',TumorNumber))
full.tab
## Patient TumorNumber Length_in_mm Race Gender Age Pain Itching
## 1 1 3 NA White Female 48 FALSE TRUE
## 2 1 4 14 White Female 48 FALSE TRUE
## 3 1 6 14 White Female 48 FALSE TRUE
## 4 1 9 15 White Female 48 FALSE TRUE
## 5 1 1 15 White Female 48 FALSE TRUE
## 6 1 2 13 White Female 48 FALSE TRUE
## 7 2 4 7 Asian Female 34 FALSE FALSE
## 8 2 6 6 Asian Female 34 FALSE FALSE
## 9 2 7 5 Asian Female 34 FALSE FALSE
## 10 2 10 7 Asian Female 34 FALSE FALSE
## 11 2 1 5 Asian Female 34 FALSE FALSE
## 12 2 2 7 Asian Female 34 FALSE FALSE
## 13 2 8 9 Asian Female 34 FALSE FALSE
## 14 2 9 7 Asian Female 34 FALSE FALSE
## 15 3 1 11 Hispanic Female 48 FALSE FALSE
## 16 3 2 16 Hispanic Female 48 FALSE FALSE
## 17 3 3 20 Hispanic Female 48 FALSE FALSE
## 18 3 4 15 Hispanic Female 48 FALSE FALSE
## 19 4 1 8 White Female 36 FALSE TRUE
## 20 4 10 7 White Female 36 FALSE TRUE
## 21 4 4 5 White Female 36 FALSE TRUE
## 22 4 9 10 White Female 36 FALSE TRUE
## 23 5 11 10 White Male 65 FALSE FALSE
## 24 5 12 9 White Male 65 FALSE FALSE
## 25 5 13 11 White Male 65 FALSE FALSE
## 26 5 15 9 White Male 65 FALSE FALSE
## 27 5 5 8 White Male 65 FALSE FALSE
## 28 5 8 6 White Male 65 FALSE FALSE
## 29 6 4 15 White Female 46 FALSE TRUE
## 30 6 5 10 White Female 46 FALSE TRUE
## 31 6 6 15 White Female 46 FALSE TRUE
## 32 6 7 13 White Female 46 FALSE TRUE
## 33 8 1 10 Black Female 27 FALSE FALSE
## 34 8 5 13 Black Female 27 FALSE FALSE
## 35 8 7 12 Black Female 27 FALSE FALSE
## 36 8 9 5 Black Female 27 FALSE FALSE
## 37 8 4 10 Black Female 27 FALSE FALSE
## 38 8 6 10 Black Female 27 FALSE FALSE
## 39 9 1 12 White Male 35 FALSE FALSE
## 40 9 6 5 White Male 35 FALSE FALSE
## 41 9 7 5 White Male 35 FALSE FALSE
## 42 9 10 8 White Male 35 FALSE FALSE
## 43 10 1 11 White Male 58 FALSE FALSE
## 44 10 2 20 White Male 58 FALSE FALSE
## 45 10 3 10 White Male 58 FALSE FALSE
## 46 10 5 20 White Male 58 FALSE FALSE
## 47 10 4 5 White Male 58 FALSE FALSE
## 48 11 1 13 Black Female 30 TRUE TRUE
## 49 11 7 10 Black Female 30 TRUE TRUE
## 50 11 8 10 Black Female 30 TRUE TRUE
## 51 11 14 20 Black Female 30 TRUE TRUE
## 52 11 2 5 Black Female 30 TRUE TRUE
## 53 11 3 3 Black Female 30 TRUE TRUE
## 54 13 1 18 White Female 41 TRUE TRUE
## 55 13 2 7 White Female 41 TRUE TRUE
## 56 13 3 12 White Female 41 TRUE TRUE
## 57 13 7 6 White Female 41 TRUE TRUE
## 58 1 NA NA White Female 48 FALSE TRUE
## 59 2 NA NA Asian Female 34 FALSE FALSE
## 60 3 NA NA Hispanic Female 48 FALSE FALSE
## 61 4 NA NA White Female 36 FALSE TRUE
## 62 5 NA NA White Male 65 FALSE FALSE
## 63 6 NA NA White Female 46 FALSE TRUE
## 64 8 NA NA Black Female 27 FALSE FALSE
## 65 9 NA NA White Male 35 FALSE FALSE
## 66 10 NA NA White Male 58 FALSE FALSE
## 67 11 NA NA Black Female 30 TRUE TRUE
## 68 9 3 13 White Male 35 FALSE FALSE
## specimenID
## 1 patient1tumor3
## 2 patient1tumor4
## 3 patient1tumor6
## 4 patient1tumor9
## 5 patient1tumor1
## 6 patient1tumor2
## 7 patient2tumor4
## 8 patient2tumor6
## 9 patient2tumor7
## 10 patient2tumor10
## 11 patient2tumor1
## 12 patient2tumor2
## 13 patient2tumor8
## 14 patient2tumor9
## 15 patient3tumor1
## 16 patient3tumor2
## 17 patient3tumor3
## 18 patient3tumor4
## 19 patient4tumor1
## 20 patient4tumor10
## 21 patient4tumor4
## 22 patient4tumor9
## 23 patient5tumor11
## 24 patient5tumor12
## 25 patient5tumor13
## 26 patient5tumor15
## 27 patient5tumor5
## 28 patient5tumor8
## 29 patient6tumor4
## 30 patient6tumor5
## 31 patient6tumor6
## 32 patient6tumor7
## 33 patient8tumor1
## 34 patient8tumor5
## 35 patient8tumor7
## 36 patient8tumor9
## 37 patient8tumor4
## 38 patient8tumor6
## 39 patient9tumor1
## 40 patient9tumor6
## 41 patient9tumor7
## 42 patient9tumor10
## 43 patient10tumor1
## 44 patient10tumor2
## 45 patient10tumor3
## 46 patient10tumor5
## 47 patient10tumor4
## 48 patient11tumor1
## 49 patient11tumor7
## 50 patient11tumor8
## 51 patient11tumor14
## 52 patient11tumor2
## 53 patient11tumor3
## 54 patient13tumor1
## 55 patient13tumor2
## 56 patient13tumor3
## 57 patient13tumor7
## 58 patient1tumorNA
## 59 patient2tumorNA
## 60 patient3tumorNA
## 61 patient4tumorNA
## 62 patient5tumorNA
## 63 patient6tumorNA
## 64 patient8tumorNA
## 65 patient9tumorNA
## 66 patient10tumorNA
## 67 patient11tumorNA
## 68 patient9tumor3
Now we can evaluate other things
##now what do we see on a tissue level?
require(ggplot2)
p<-ggplot(full.tab)+geom_point(aes(x=Age,y=Length_in_mm,color=Itching,shape=Gender))+ggtitle("Age by tumor size with itching")
print(p)
## Warning: Removed 11 rows containing missing values (geom_point).
p<-ggplot(full.tab)+geom_point(aes(x=Age,y=Length_in_mm,color=Itching,shape=Pain))+ggtitle("Age by tumor size with pain")
print(p)
## Warning: Removed 11 rows containing missing values (geom_point).
So the pain/itching variables only go by patient, so we can’t figure out which samples are in pain/itching.
exp.data<-synTableQuery('select * from syn20449214 where tumorType=\'Cutaneous Neurofibroma\'')$asDataFrame()
data.with.var<-exp.data%>%left_join(full.tab,by='specimenID')
#now compute the correlation with size for each transcript...?
gene.cors=data.with.var%>%group_by(Symbol)%>%mutate(corVal=cor(zScore,Length_in_mm))
top.genes=select(gene.cors,corVal)%>%distinct()%>%arrange(desc(corVal))%>%select(Symbol)
## Adding missing grouping variables: `Symbol`
bottom.genes=select(gene.cors,corVal)%>%distinct()%>%arrange(corVal)%>%select(Symbol)
## Adding missing grouping variables: `Symbol`
Now we have the genes most correlated with tumor size
top.genes
## # A tibble: 19,098 x 1
## # Groups: Symbol [19,098]
## Symbol
## <chr>
## 1 AKAP12
## 2 SOCS2
## 3 DAPK2
## 4 SVEP1
## 5 RGL1
## 6 DTX4
## 7 HOXB3
## 8 ADAMTS5
## 9 VPS35L
## 10 ABI3BP
## # … with 19,088 more rows
and the genes least correlated
bottom.genes
## # A tibble: 19,098 x 1
## # Groups: Symbol [19,098]
## Symbol
## <chr>
## 1 DAPL1
## 2 BPNT1
## 3 GSKIP
## 4 RPUSD3
## 5 ZNF707
## 6 IGHMBP2
## 7 WDR12
## 8 IPO13
## 9 MARCH9
## 10 SP6
## # … with 19,088 more rows
Now what do these look like?
ggplot(subset(data.with.var,Symbol%in%c(top.genes$Symbol[1:10])))+geom_point(aes(x=Length_in_mm,y=zScore,col=Symbol,shape=sex))+ggtitle('10 most correlated genes')
ggplot(subset(data.with.var,Symbol%in%c(bottom.genes$Symbol[1:10])))+geom_point(aes(x=Length_in_mm,y=zScore,col=Symbol,shape=sex))+ggtitle('10 most anti-correlated genes')
We should do pathway enrichment of these genes, see if they are doing anything. The fact that HOXB3 is there is already interesting. What else could there be???
I think this is actually pretty cool - we can try to identify which are the ‘driving’ genes of cNF growth. * What pathways are enriched in these genes? * Are these correlations statistically significant? * Are there differences in correlated genes b/w male and female patients? * What are these genes doing in pNFs and MPNSTs? * Are any of these correlated with immune reponse (conversely are immune signatures correlated with this size variable)?
Are there other questions we can answer?