Tables

Synapse Tables enable storage of tabular data in Synapse in a form that can be queried using a SQL-like query language.

A table has a Schema and holds a set of rows conforming to that schema.

A Schema defines a series of Column of the following types: STRING, DOUBLE, INTEGER, BOOLEAN, DATE, ENTITYID, FILEHANDLEID, LINK, LARGETEXT, USERID

Preliminaries:

To create a Table, you first need to create a Table Schema. This defines the columns of the table:

cols <- list(
    Column(name = "Name", columnType = "STRING", maximumSize = 20),
    Column(name = "Chromosome", columnType = "STRING", maximumSize = 20),
    Column(name = "Start", columnType = "INTEGER"),
    Column(name = "End", columnType = "INTEGER"),
    Column(name = "Strand", columnType = "STRING", enumValues = list("+", "-"), maximumSize = 1),
    Column(name = "TranscriptionFactor", columnType = "BOOLEAN"))

schema <- Schema(name = "My Favorite Genes", columns = cols, parent = project)

Next, let’s load some data. Let’s say we had a file, genes.csv:

Let’s store that in Synapse:

The Table() function takes two arguments, a schema object and data in some form, which can be:

  • a path to a CSV file
  • a data frame
  • a RowSet object
  • a list of lists where each of the inner lists is a row

We now have a table populated with data. Let’s try to query:

synTableQuery() downloads the data and saves it to a csv file at location:

To load the data into an R data.frame:

Changing Data

Once the schema is settled, changes come in two flavors: appending new rows and updating existing ones.

Changing data in a table requires row IDs and version numbers for each row to be modified (called ROW_ID and ROW_VERSION). We get those by querying before updating. Minimizing change sets to contain only rows that actually change will make processing faster. Appending new rows can be accomplished by leaving values for ROW_ID and ROW_VERSION blank.

Appending new rows is fairly straightforward. To continue the previous example, we might add some new genes:

For example, let’s update the names of some of our favorite genes:

Let’s save that:

Now, query the table again to see your changes:

One other piece of information required for making changes to tables is the etag, which is used by the Synapse server to prevent concurrent users from making conflicting changes through a technique called optimistic concurrency. This comes as a result of running synTableQuery. In the example above, you could see the etag by running results$etag - but you should never have to use it directly. In case of a conflict, your update may be rejected. You then have to do another query and try your update again.

Updating Column Type

Column “Notes” has type STRING with “maximumSize” set to 20. We cannot add a new row with “Notes” as “a very looooooooong note” since it has more than 20 characters. Let“s change the ColumnType to”STRING" with “maximumSize” set to 100:

# getting the existing table metadata and data
originalSchema <- synGet(tableId)
oldQueryResults <- synTableQuery(sprintf("SELECT * FROM %s", tableId))
## 
 [####################]100.00%   1/1   Done...    
Downloading  [####################]100.00%   506.0bytes/506.0bytes (852.6kB/s) Job-1904361199894056137446.csv Done...
oldData <- as.data.frame(oldQueryResults)

# remove the column
originalSchema$removeColumn(notesColumn)
## NULL
newSchema <- synStore(originalSchema)

# create a new Column
newCol <- Column(name = "Notes", columnType = "STRING", maximumSize = 100)

# add the new column to the new table
newSchema$addColumn(newCol)
## NULL
newSchema <- synStore(newSchema)

# copy the data over to the new column
newQueryResults <- synTableQuery(sprintf("SELECT * FROM %s", newSchema$properties$id))
## 
 [####################]100.00%   1/1   Done...    
Downloading  [####################]100.00%   481.0bytes/481.0bytes (477.0kB/s) Job-1914034795490069341292.csv Done...
newData <- as.data.frame(newQueryResults)
newData["Notes"] <- oldData["Notes"]

# save the change
synStore(Table(tableId, newData))
## 
Uploading [--------------------]0.00%   0.0bytes/506.0bytes  file1cb731718b1     
Uploading [####################]100.00%   506.0bytes/506.0bytes (854.8bytes/s) file1cb731718b1 Done...    
 [####################]100.00%   1/1   Done...
## <synapseclient.table.CsvFileTable object at 0x7fcad1980e10>

# add the new data
moreGenes <- data.frame(
    Name = c("not_sure"),
    Chromosome = c(2),
    Start = c(12345),
    End = c(126000),
    Strand = c("+"),
    TranscriptionFactor = c(F),
    Notes = c("a very looooooooong note"))
synStore(Table(tableId, moreGenes))
## 
Uploading [--------------------]0.00%   0.0bytes/142.0bytes  file1cb47085fc4     
Uploading [####################]100.00%   142.0bytes/142.0bytes (242.9bytes/s) file1cb47085fc4 Done...    
 [####################]100.00%   1/1   Done...
## <synapseclient.table.CsvFileTable object at 0x7fcad1980588>

To access a column that you do not have a reference to, please see:

Table Attached Files

Synapse tables support a special column type called ‘File’ which contain a file handle, an identifier of a file stored in Synapse. Here’s an example of how to upload files into Synapse, associate them with a table and read them back later:

To download attached files in a table:

result <- synTableQuery(sprintf("select * from %s", newTable$schema$properties$id))
data <- synDownloadTableColumns(result, columns = c("covers"))

Set Annotations

A table schema is a Synapse entity. Annotations on table works the same way as annotations on any other entity types.

To set annotation on table, use synSetAnnotations() on the schema:

To view annotations on table, retrieve the schema:

Please visit synapser vignettes to read more about annotations.

Deleting Table

Deleting the schema deletes the whole table and all rows:

Queries

The query language is quite similar to SQL select statements, except that joins are not supported. The documentation for the Synapse API has lots of query examples.

For more details see the native reference documentation, e.g.: