Synapse tables with data in them

We have two tables on synapse that have cNF patient data in them - Patient Table - Sample Table

pat.tab<-synapser::synTableQuery("SELECT Patient,Race,Gender,Age,Pain,Itching FROM syn7342635")$asDataFrame()%>%
  select(-c(ROW_ID,ROW_VERSION))

samp.tab<-synapser::synTableQuery("SELECT Patient,TumorNumber,Length_in_mm FROM syn5556216")$asDataFrame()%>%
  select(-c(ROW_ID,ROW_VERSION))
pat.tab
##    Patient     Race Gender Age  Pain Itching
## 1        1    White Female  48 FALSE    TRUE
## 2        2    Asian Female  34 FALSE   FALSE
## 3        3 Hispanic Female  48 FALSE   FALSE
## 4        4    White Female  36 FALSE    TRUE
## 5        5    White   Male  65 FALSE   FALSE
## 6        6    White Female  46 FALSE    TRUE
## 7        8    Black Female  27 FALSE   FALSE
## 8        9    White   Male  35 FALSE   FALSE
## 9       10    White   Male  58 FALSE   FALSE
## 10      11    Black Female  30  TRUE    TRUE
## 11      13    White Female  41  TRUE    TRUE
samp.tab
##    Patient TumorNumber Length_in_mm
## 1        1           3           NA
## 2        1           4           14
## 3        1           6           14
## 4        1           9           15
## 5        1           1           15
## 6        1           2           13
## 7        2           4            7
## 8        2           6            6
## 9        2           7            5
## 10       2          10            7
## 11       2           1            5
## 12       2           2            7
## 13       2           8            9
## 14       2           9            7
## 15       3           1           11
## 16       3           2           16
## 17       3           3           20
## 18       3           4           15
## 19       4           1            8
## 20       4          10            7
## 21       4           4            5
## 22       4           9           10
## 23       5          11           10
## 24       5          12            9
## 25       5          13           11
## 26       5          15            9
## 27       5           5            8
## 28       5           8            6
## 29       6           4           15
## 30       6           5           10
## 31       6           6           15
## 32       6           7           13
## 33       8           1           10
## 34       8           5           13
## 35       8           7           12
## 36       8           9            5
## 37       8           4           10
## 38       8           6           10
## 39       9           1           12
## 40       9           6            5
## 41       9           7            5
## 42       9          10            8
## 43      10           1           11
## 44      10           2           20
## 45      10           3           10
## 46      10           5           20
## 47      10           4            5
## 48      11           1           13
## 49      11           7           10
## 50      11           8           10
## 51      11          14           20
## 52      11           2            5
## 53      11           3            3
## 54      13           1           18
## 55      13           2            7
## 56      13           3           12
## 57      13           7            6
## 58       1          NA           NA
## 59       2          NA           NA
## 60       3          NA           NA
## 61       4          NA           NA
## 62       5          NA           NA
## 63       6          NA           NA
## 64       8          NA           NA
## 65       9          NA           NA
## 66      10          NA           NA
## 67      11          NA           NA
## 68       9           3           13

For the purposes of this analysis we want to have just the age, sex, tumor size and and ‘reformed’ patient name

full.tab<-samp.tab%>%left_join(pat.tab,by='Patient')%>%
  mutate(specimenID=paste0('patient',Patient,'tumor',TumorNumber))
full.tab
##    Patient TumorNumber Length_in_mm     Race Gender Age  Pain Itching
## 1        1           3           NA    White Female  48 FALSE    TRUE
## 2        1           4           14    White Female  48 FALSE    TRUE
## 3        1           6           14    White Female  48 FALSE    TRUE
## 4        1           9           15    White Female  48 FALSE    TRUE
## 5        1           1           15    White Female  48 FALSE    TRUE
## 6        1           2           13    White Female  48 FALSE    TRUE
## 7        2           4            7    Asian Female  34 FALSE   FALSE
## 8        2           6            6    Asian Female  34 FALSE   FALSE
## 9        2           7            5    Asian Female  34 FALSE   FALSE
## 10       2          10            7    Asian Female  34 FALSE   FALSE
## 11       2           1            5    Asian Female  34 FALSE   FALSE
## 12       2           2            7    Asian Female  34 FALSE   FALSE
## 13       2           8            9    Asian Female  34 FALSE   FALSE
## 14       2           9            7    Asian Female  34 FALSE   FALSE
## 15       3           1           11 Hispanic Female  48 FALSE   FALSE
## 16       3           2           16 Hispanic Female  48 FALSE   FALSE
## 17       3           3           20 Hispanic Female  48 FALSE   FALSE
## 18       3           4           15 Hispanic Female  48 FALSE   FALSE
## 19       4           1            8    White Female  36 FALSE    TRUE
## 20       4          10            7    White Female  36 FALSE    TRUE
## 21       4           4            5    White Female  36 FALSE    TRUE
## 22       4           9           10    White Female  36 FALSE    TRUE
## 23       5          11           10    White   Male  65 FALSE   FALSE
## 24       5          12            9    White   Male  65 FALSE   FALSE
## 25       5          13           11    White   Male  65 FALSE   FALSE
## 26       5          15            9    White   Male  65 FALSE   FALSE
## 27       5           5            8    White   Male  65 FALSE   FALSE
## 28       5           8            6    White   Male  65 FALSE   FALSE
## 29       6           4           15    White Female  46 FALSE    TRUE
## 30       6           5           10    White Female  46 FALSE    TRUE
## 31       6           6           15    White Female  46 FALSE    TRUE
## 32       6           7           13    White Female  46 FALSE    TRUE
## 33       8           1           10    Black Female  27 FALSE   FALSE
## 34       8           5           13    Black Female  27 FALSE   FALSE
## 35       8           7           12    Black Female  27 FALSE   FALSE
## 36       8           9            5    Black Female  27 FALSE   FALSE
## 37       8           4           10    Black Female  27 FALSE   FALSE
## 38       8           6           10    Black Female  27 FALSE   FALSE
## 39       9           1           12    White   Male  35 FALSE   FALSE
## 40       9           6            5    White   Male  35 FALSE   FALSE
## 41       9           7            5    White   Male  35 FALSE   FALSE
## 42       9          10            8    White   Male  35 FALSE   FALSE
## 43      10           1           11    White   Male  58 FALSE   FALSE
## 44      10           2           20    White   Male  58 FALSE   FALSE
## 45      10           3           10    White   Male  58 FALSE   FALSE
## 46      10           5           20    White   Male  58 FALSE   FALSE
## 47      10           4            5    White   Male  58 FALSE   FALSE
## 48      11           1           13    Black Female  30  TRUE    TRUE
## 49      11           7           10    Black Female  30  TRUE    TRUE
## 50      11           8           10    Black Female  30  TRUE    TRUE
## 51      11          14           20    Black Female  30  TRUE    TRUE
## 52      11           2            5    Black Female  30  TRUE    TRUE
## 53      11           3            3    Black Female  30  TRUE    TRUE
## 54      13           1           18    White Female  41  TRUE    TRUE
## 55      13           2            7    White Female  41  TRUE    TRUE
## 56      13           3           12    White Female  41  TRUE    TRUE
## 57      13           7            6    White Female  41  TRUE    TRUE
## 58       1          NA           NA    White Female  48 FALSE    TRUE
## 59       2          NA           NA    Asian Female  34 FALSE   FALSE
## 60       3          NA           NA Hispanic Female  48 FALSE   FALSE
## 61       4          NA           NA    White Female  36 FALSE    TRUE
## 62       5          NA           NA    White   Male  65 FALSE   FALSE
## 63       6          NA           NA    White Female  46 FALSE    TRUE
## 64       8          NA           NA    Black Female  27 FALSE   FALSE
## 65       9          NA           NA    White   Male  35 FALSE   FALSE
## 66      10          NA           NA    White   Male  58 FALSE   FALSE
## 67      11          NA           NA    Black Female  30  TRUE    TRUE
## 68       9           3           13    White   Male  35 FALSE   FALSE
##          specimenID
## 1    patient1tumor3
## 2    patient1tumor4
## 3    patient1tumor6
## 4    patient1tumor9
## 5    patient1tumor1
## 6    patient1tumor2
## 7    patient2tumor4
## 8    patient2tumor6
## 9    patient2tumor7
## 10  patient2tumor10
## 11   patient2tumor1
## 12   patient2tumor2
## 13   patient2tumor8
## 14   patient2tumor9
## 15   patient3tumor1
## 16   patient3tumor2
## 17   patient3tumor3
## 18   patient3tumor4
## 19   patient4tumor1
## 20  patient4tumor10
## 21   patient4tumor4
## 22   patient4tumor9
## 23  patient5tumor11
## 24  patient5tumor12
## 25  patient5tumor13
## 26  patient5tumor15
## 27   patient5tumor5
## 28   patient5tumor8
## 29   patient6tumor4
## 30   patient6tumor5
## 31   patient6tumor6
## 32   patient6tumor7
## 33   patient8tumor1
## 34   patient8tumor5
## 35   patient8tumor7
## 36   patient8tumor9
## 37   patient8tumor4
## 38   patient8tumor6
## 39   patient9tumor1
## 40   patient9tumor6
## 41   patient9tumor7
## 42  patient9tumor10
## 43  patient10tumor1
## 44  patient10tumor2
## 45  patient10tumor3
## 46  patient10tumor5
## 47  patient10tumor4
## 48  patient11tumor1
## 49  patient11tumor7
## 50  patient11tumor8
## 51 patient11tumor14
## 52  patient11tumor2
## 53  patient11tumor3
## 54  patient13tumor1
## 55  patient13tumor2
## 56  patient13tumor3
## 57  patient13tumor7
## 58  patient1tumorNA
## 59  patient2tumorNA
## 60  patient3tumorNA
## 61  patient4tumorNA
## 62  patient5tumorNA
## 63  patient6tumorNA
## 64  patient8tumorNA
## 65  patient9tumorNA
## 66 patient10tumorNA
## 67 patient11tumorNA
## 68   patient9tumor3

Now we can evaluate other things

Plot the data

##now what do we see on a tissue level? 
require(ggplot2)

p<-ggplot(full.tab)+geom_point(aes(x=Age,y=Length_in_mm,color=Itching,shape=Gender))+ggtitle("Age by tumor size with itching")

print(p)
## Warning: Removed 11 rows containing missing values (geom_point).

p<-ggplot(full.tab)+geom_point(aes(x=Age,y=Length_in_mm,color=Itching,shape=Pain))+ggtitle("Age by tumor size with pain")
print(p)
## Warning: Removed 11 rows containing missing values (geom_point).

So the pain/itching variables only go by patient, so we can’t figure out which samples are in pain/itching.

Get the expression data and see what correlates

exp.data<-synTableQuery('select * from syn20449214 where tumorType=\'Cutaneous Neurofibroma\'')$asDataFrame()

data.with.var<-exp.data%>%left_join(full.tab,by='specimenID')

#now compute the correlation with size for each transcript...?
gene.cors=data.with.var%>%group_by(Symbol)%>%mutate(corVal=cor(zScore,Length_in_mm))

top.genes=select(gene.cors,corVal)%>%distinct()%>%arrange(desc(corVal))%>%select(Symbol)
## Adding missing grouping variables: `Symbol`
bottom.genes=select(gene.cors,corVal)%>%distinct()%>%arrange(corVal)%>%select(Symbol)
## Adding missing grouping variables: `Symbol`

Now we have the genes most correlated with tumor size

top.genes
## # A tibble: 19,098 x 1
## # Groups:   Symbol [19,098]
##    Symbol 
##    <chr>  
##  1 AKAP12 
##  2 SOCS2  
##  3 DAPK2  
##  4 SVEP1  
##  5 RGL1   
##  6 DTX4   
##  7 HOXB3  
##  8 ADAMTS5
##  9 VPS35L 
## 10 ABI3BP 
## # … with 19,088 more rows

and the genes least correlated

bottom.genes
## # A tibble: 19,098 x 1
## # Groups:   Symbol [19,098]
##    Symbol 
##    <chr>  
##  1 DAPL1  
##  2 BPNT1  
##  3 GSKIP  
##  4 RPUSD3 
##  5 ZNF707 
##  6 IGHMBP2
##  7 WDR12  
##  8 IPO13  
##  9 MARCH9 
## 10 SP6    
## # … with 19,088 more rows

Now what do these look like?

ggplot(subset(data.with.var,Symbol%in%c(top.genes$Symbol[1:10])))+geom_point(aes(x=Length_in_mm,y=zScore,col=Symbol,shape=sex))+ggtitle('10 most correlated genes')

ggplot(subset(data.with.var,Symbol%in%c(bottom.genes$Symbol[1:10])))+geom_point(aes(x=Length_in_mm,y=zScore,col=Symbol,shape=sex))+ggtitle('10 most anti-correlated genes')

We should do pathway enrichment of these genes, see if they are doing anything. The fact that HOXB3 is there is already interesting. What else could there be???

Next steps

I think this is actually pretty cool - we can try to identify which are the ‘driving’ genes of cNF growth. * What pathways are enriched in these genes? * Are these correlations statistically significant? * Are there differences in correlated genes b/w male and female patients? * What are these genes doing in pNFs and MPNSTs? * Are any of these correlated with immune reponse (conversely are immune signatures correlated with this size variable)?

Are there other questions we can answer?