Overview

The synapser package provides an interface to Synapse, a collaborative workspace for reproducible data intensive research projects, providing support for:

  • integrated presentation of data, code and text
  • fine grained access control
  • provenance tracking

The synapser package lets you communicate with the Synapse platform to create collaborative data analysis projects and access data using the R programming language. Other Synapse clients exist for Python, Java, and the web browser.

If you’re just getting started with Synapse, have a look at the Getting Started guides for Synapse.

Good example projects are:

Installation

synapser is available as a ready-built package for Microsoft Windows and Mac OSX. For Linux systems, it is available to install from source. It can be installed or upgraded using the standard install.packages() command, adding the Sage Bionetworks R Archive Network (RAN) to the repository list, e.g.:

install.packages("synapser", repos = c("http://ran.synapse.org", "http://cran.fhcrc.org"))

Alternatively, edit your ~/.Rprofile and configure your default repositories:

options(repos = c("http://ran.synapse.org", "http://cran.fhcrc.org"))

after which you may run install.packages without specifying the repositories:

install.packages("synapser")

For a detailed installation guide see installation vignette. Please refer to the Troubleshooting guide for more information.

Connecting to Synapse

To use Synapse, you’ll need to register for an account. The Synapse website can authenticate using a Google account. If you authenticate using a Google account, you’ll need to create a personal access token to log in to Synapse through the programmatic clients. See the Manage Synapse Credentials vignette for more information.

Once that’s done, you’ll be able to load the library and login:

library(synapser)
## 
## TERMS OF USE NOTICE:
##   When using Synapse, remember that the terms and conditions of use require that you:
##   1) Attribute data contributors when discussing these data or results from these data.
##   2) Not discriminate, identify, or recontact individuals or groups represented by the data.
##   3) Use and contribute only data de-identified to HIPAA standards.
##   4) Redistribute data only under these same terms of use.
synLogin()
## NULL

For more ways to manage your Synapse credentials, please see the Manage Synapse Credentials vignette, and the native reference documentation:

?synLogin
?synLogout

Accessing Data

To make the example below print useful information, we prepare a file:

# use hex_digits to generate random string
hex_digits <- c(as.character(0:9), letters[1:6])
projectName <- sprintf("My unique project %s", paste0(sample(hex_digits, 32, replace = TRUE), collapse = ""))
project <- Project(projectName)
project <- synStore(project)

# Create some files
filePath <- tempfile()
connection <- file(filePath)
writeChar("a \t b \t c \n d \t e \t f \n", connection, eos = NULL)
close(connection)
file <- File(path = filePath, parent = project)
# Add a version comment
file$properties$versionComment <- 'Some sort of comment about the new version of the file.'
file <- synStore(file)
synId <- file$properties$id

Synapse identifiers are used to refer to projects and data which are represented by entity objects. For example, the entity above represents a tab-delimited file containing a 2 by 3 matrix. Getting the entity retrieves an object that holds metadata describing the matrix, and also downloads the file to a local cache:

fileEntity <- synGet(synId)

View the entity’s metadata in the R console:

print(fileEntity)
## File(id='syn53468047', etag='92ba143c-226b-421a-9ac3-3992742831b1', modifiedBy='3324230', versionLabel='1', isLatestVersion=True, name='file8d8159001777', versionComment='Some sort of comment about the new version of the file.', synapseStore=True, _file_handle={'id': '132873000', 'etag': '276042e9-0ec1-4bd7-8dfe-3edc5d00b7a9', 'createdBy': '3324230', 'createdOn': '2024-02-04T19:31:46.000Z', 'modifiedOn': '2024-02-04T19:31:46.000Z', 'concreteType': 'org.sagebionetworks.repo.model.file.S3FileHandle', 'contentType': 'application/octet-stream', 'contentMd5': '8465d33d9f407ef250ce519e92f300fb', 'fileName': 'file8d8159001777', 'storageLocationId': 1, 'contentSize': 23, 'status': 'AVAILABLE', 'bucketName': 'proddata.sagebase.org', 'key': '3324230/2e56b45e-7227-4624-ba7b-8c1f043b8bbd/file8d8159001777', 'isPreview': False, 'externalURL': None}, versionNumber=1, cacheDir='/var/folders/67/ghxb0p_j4r5gjj95l9z3502c0000gq/T/RtmpxF8ZnS', path='/var/folders/67/ghxb0p_j4r5gjj95l9z3502c0000gq/T/RtmpxF8ZnS/file8d8159001777', createdBy='3324230', createdOn='2024-02-04T19:31:46.346Z', parentId='syn53468046', dataFileHandleId='132873000', files=['file8d8159001777'], concreteType='org.sagebionetworks.repo.model.FileEntity', modifiedOn='2024-02-04T19:31:46.346Z')

This is one simple way to read in a small matrix (we load just the first few rows):

read.table(fileEntity$path, nrows = 2)
##   V1 V2 V3
## 1  a  b  c
## 2  d  e  f

View the entity in the browser:

synOnweb(synId)

Download Location

By default the download location will always be in the Synapse cache. You can specify the downloadLocation parameter.

entity <- synGet("syn00123", downloadLocation = "/path/to/folder")

For more details see the native reference documentation, e.g.:

?synGet
?synOnweb

Organizing Data in a Project

You can create your own projects and upload your own data sets. Synapse stores entities in a hierarchical or tree structure. Projects are at the top level and must be uniquely named:

project <- Project(projectName)
project <- synStore(project)

Creating a folder:

dataFolder <- Folder("Data", parent = project)
dataFolder <- synStore(dataFolder)

Adding files to the project:

filePath <- tempfile()
connection <- file(filePath)
writeChar("this is the content of the file", connection, eos = NULL)
close(connection)
file <- File(path = filePath, parent = dataFolder)
file <- synStore(file)

You can print the properties of an entity (such as the file we just created):

file$properties
## Dict (13 items)

Most other properties are immutable, but you can change an entity’s name:

file$properties$name <- "different name"

Update Synapse with the change:

file <- synStore(file)
file$properties
## Dict (13 items)

You can list all children of an entity:

children <- synGetChildren(project$properties$id)
as.list(children)
## [[1]]
## [[1]]$name
## [1] "Data"
## 
## [[1]]$id
## [1] "syn53468048"
## 
## [[1]]$type
## [1] "org.sagebionetworks.repo.model.Folder"
## 
## [[1]]$versionNumber
## [1] 1
## 
## [[1]]$versionLabel
## [1] "1"
## 
## [[1]]$isLatestVersion
## [1] TRUE
## 
## [[1]]$benefactorId
## [1] 53468046
## 
## [[1]]$createdOn
## [1] "2024-02-04T19:31:47.353Z"
## 
## [[1]]$modifiedOn
## [1] "2024-02-04T19:31:47.353Z"
## 
## [[1]]$createdBy
## [1] "3324230"
## 
## [[1]]$modifiedBy
## [1] "3324230"
## 
## 
## [[2]]
## [[2]]$name
## [1] "file8d8159001777"
## 
## [[2]]$id
## [1] "syn53468047"
## 
## [[2]]$type
## [1] "org.sagebionetworks.repo.model.FileEntity"
## 
## [[2]]$versionNumber
## [1] 1
## 
## [[2]]$versionLabel
## [1] "1"
## 
## [[2]]$isLatestVersion
## [1] TRUE
## 
## [[2]]$benefactorId
## [1] 53468046
## 
## [[2]]$createdOn
## [1] "2024-02-04T19:31:46.346Z"
## 
## [[2]]$modifiedOn
## [1] "2024-02-04T19:31:46.346Z"
## 
## [[2]]$createdBy
## [1] "3324230"
## 
## [[2]]$modifiedBy
## [1] "3324230"

You can also filter by type:

filesAndFolders <- synGetChildren(project$properties$id, includeTypes = c("file", "folder"))
as.list(filesAndFolders)
## [[1]]
## [[1]]$name
## [1] "Data"
## 
## [[1]]$id
## [1] "syn53468048"
## 
## [[1]]$type
## [1] "org.sagebionetworks.repo.model.Folder"
## 
## [[1]]$versionNumber
## [1] 1
## 
## [[1]]$versionLabel
## [1] "1"
## 
## [[1]]$isLatestVersion
## [1] TRUE
## 
## [[1]]$benefactorId
## [1] 53468046
## 
## [[1]]$createdOn
## [1] "2024-02-04T19:31:47.353Z"
## 
## [[1]]$modifiedOn
## [1] "2024-02-04T19:31:47.353Z"
## 
## [[1]]$createdBy
## [1] "3324230"
## 
## [[1]]$modifiedBy
## [1] "3324230"
## 
## 
## [[2]]
## [[2]]$name
## [1] "file8d8159001777"
## 
## [[2]]$id
## [1] "syn53468047"
## 
## [[2]]$type
## [1] "org.sagebionetworks.repo.model.FileEntity"
## 
## [[2]]$versionNumber
## [1] 1
## 
## [[2]]$versionLabel
## [1] "1"
## 
## [[2]]$isLatestVersion
## [1] TRUE
## 
## [[2]]$benefactorId
## [1] 53468046
## 
## [[2]]$createdOn
## [1] "2024-02-04T19:31:46.346Z"
## 
## [[2]]$modifiedOn
## [1] "2024-02-04T19:31:46.346Z"
## 
## [[2]]$createdBy
## [1] "3324230"
## 
## [[2]]$modifiedBy
## [1] "3324230"

You can avoid reading all children into memory at once by iterating through one at a time:

children <- synGetChildren(project$properties$id)
tryCatch({
  while (TRUE) {
    child <- nextElem(children)
    print(child)
  }
}, error = function(e) {
    print("Reached end of list.")
})
## $name
## [1] "Data"
## 
## $id
## [1] "syn53468048"
## 
## $type
## [1] "org.sagebionetworks.repo.model.Folder"
## 
## $versionNumber
## [1] 1
## 
## $versionLabel
## [1] "1"
## 
## $isLatestVersion
## [1] TRUE
## 
## $benefactorId
## [1] 53468046
## 
## $createdOn
## [1] "2024-02-04T19:31:47.353Z"
## 
## $modifiedOn
## [1] "2024-02-04T19:31:47.353Z"
## 
## $createdBy
## [1] "3324230"
## 
## $modifiedBy
## [1] "3324230"
## 
## $name
## [1] "file8d8159001777"
## 
## $id
## [1] "syn53468047"
## 
## $type
## [1] "org.sagebionetworks.repo.model.FileEntity"
## 
## $versionNumber
## [1] 1
## 
## $versionLabel
## [1] "1"
## 
## $isLatestVersion
## [1] TRUE
## 
## $benefactorId
## [1] 53468046
## 
## $createdOn
## [1] "2024-02-04T19:31:46.346Z"
## 
## $modifiedOn
## [1] "2024-02-04T19:31:46.346Z"
## 
## $createdBy
## [1] "3324230"
## 
## $modifiedBy
## [1] "3324230"
## 
## [1] "Reached end of list."

You can move files to a different parent:

newFolder <- Folder("New Parent", parent = project)
newFolder <- synStore(newFolder)

file <- synMove(file, newFolder)

Content can be deleted:

synDelete(file)
## NULL

Deletion of a project will also delete its contents, in this case the folder:

folderId <- dataFolder$properties$id
synDelete(project)
## NULL
tryCatch(
  synGet(folderId),
  error = function(e) {
    message(sprintf("Retrieving a deleted folder causes: %s", as.character(e)))
  },
  silent = TRUE
)
## Retrieving a deleted folder causes: Error in value[[3L]](cond): 404 Client Error: 
## Entity syn53468048 is in trash can.

In addition to simple data storage, Synapse entities can be annotated with key/value metadata, described in markdown documents (wikis), and linked together in provenance graphs to create a reproducible record of a data analysis pipeline.

For more details see the native reference documentation, e.g.:

?Project
?Folder
?File
?Link
?synStore

Annotating Synapse Entities

# (We use a time stamp just to help ensure uniqueness.)
projectName <- sprintf("My unique project created on %s", format(Sys.time(), "%a %b %d %H%M%OS4 %Y"))
project <- Project(projectName)
# This will erase all existing annotations
project$annotations <- list(annotationName = "annotationValue")
project <- synStore(project)
project <- synGet(project$properties$id)
project$annotations
## {
##   "annotationName": [
##     "annotationValue"
##   ]
## }
synGetAnnotations(project)
## $annotationName
## [1] "annotationValue"

Provenance

Synapse provides tools for tracking ‘provenance’, or the transformation of raw data into processed results, by linking derived data objects to source data and the code used to perform the transformation.

The Activity object represents the source of a data set or the data processing steps used to produce it. Using W3C provenance ontology terms, a result is generated by a combination of data and code which are either used or executed.

Creating an activity object:

act <- Activity(
  name = "clustering",
  description = "whizzy clustering",
  used = c("syn1234", "syn1235"),
  executed = "syn4567")

Here, syn1234 and syn1235 might be two types of measurements on a common set of samples. Some whizzy clustering code might be referred to by syn4567.

Alternatively, you can build an activity up piecemeal:

act <- Activity(name = "clustering", description = "whizzy clustering")
act$used(c("syn12345", "syn12346"))
act$executed("syn4567")

The used and executed can reference entities in Synapse or URLs.

Entity examples:

  act$used("syn12345")
  act$used(project)
  act$used(target = "syn12345", targetVersion = 2)

URL examples:

  act$used("http://mydomain.com/my/awesome/data.RData")
  act$used(url = "http://mydomain.com/my/awesome/data.RData", name = "Awesome Data")
  act$used(url = "https://github.com/joe_hacker/code_repo", name = "Gnarly hacks", wasExecuted = TRUE)

Storing entities with provenance

The activity can be passed in when storing an Entity to set the Entity’s provenance:

project <- synGet(project$properties$id)
project <- synStore(project, activity = act)

We’ve now recorded that ‘project’ is the output of syn4567 applied to the data stored in syn1234 and syn1235.

Recording data source

The synStore() has shortcuts for specifying the used and executed lists directly. For example, when storing a data entity, it’s a good idea to record its source:

project <- synStore(
  project,
  activityName = "data-r-us",
  activityDescription = "downloaded from data-r-us",
  used = "http://data-r-us.com/excellent/data.xyz")

For more information:

?Activity
?synDeleteProvenance

Tables

Tables can be built up by adding sets of rows that follow a user-defined schema and queried using a SQL-like syntax. Please visit the Table vignettes for more information.

Wikis

Wiki pages can be attached to an Synapse entity (i.e. project, folder, file, etc). Text and graphics can be composed in markdown and rendered in the web view of the object.

Creating a Wiki

project <- synGet(project$properties$id)
content <- "
# My Wiki Page

Here is a description of my **fantastic** project!
"
# attachment
filePath <- tempfile()
connection <- file(filePath)
writeChar("this is the content of the file", connection, eos = NULL)
close(connection)

wiki <- Wiki(owner = project,
             title = "My Wiki Page",
             markdown = content,
             attachments = list(filePath))
wiki <- synStore(wiki)

Updating a Wiki

project <- synGet(project$properties$id)
wiki <- synGetWiki(project)
wiki.markdown <- "
# My Wiki Page

Here is a description of my **fantastic** project! Let's
*emphasize* the important stuff.
"

wiki <- synStore(wiki)

For more information:

?Wiki
?synGetWiki

Evaluations

An Evaluation is a Synapse construct useful for building processing pipelines and for scoring predictive modeling and data analysis challenges.

Creating an Evaluation:

eval <- Evaluation(
  name = sprintf("My unique evaluation created on %s", format(Sys.time(), "%a %b %d %H%M%OS4 %Y")),
  description = "testing",
  contentSource = project$properties$id,
  submissionReceiptMessage = "Thank you for your submission!",
  submissionInstructionsMessage = "This evaluation only accepts files.")
eval <- synStore(eval)

Retrieving the created Evaluation:

eval <- synGetEvaluation(eval$id)
eval
## {
##   "contentSource": "syn53468051",
##   "createdOn": "2024-02-04T19:32:00.421Z",
##   "description": "testing",
##   "etag": "771e6dce-6c45-43c0-b3eb-4adc29621a54",
##   "id": "9615529",
##   "name": "My unique evaluation created on Sun Feb 04 113200.2679 2024",
##   "ownerId": "3324230",
##   "submissionInstructionsMessage": "This evaluation only accepts files.",
##   "submissionReceiptMessage": "Thank you for your submission!"
## }

Submitting a file to an existing Evaluation:

# first create a file to submit
filePath <- tempfile()
connection <- file(filePath)
writeChar("this is my first submission", connection, eos = NULL)
close(connection)
file <- File(path = filePath, parent = project)
file <- synStore(file)
# submit the created file
submission <- synSubmit(eval, file)

List submissions:

submissions <- synGetSubmissionBundles(eval)
as.list(submissions)
## [[1]]
## [[1]][[1]]
## {
##   "contributors": [
##     {
##       "createdOn": "2024-02-04T19:32:02.879Z",
##       "principalId": "3324230"
##     }
##   ],
##   "createdOn": "2024-02-04T19:32:02.879Z",
##   "entityBundleJSON": "{\"entity\":{\"name\":\"file8d815491b24e\",\"id\":\"syn53468052\",\"etag\":\"05ef3cbb-ca67-424b-b3b4-0499c4f02ec1\",\"createdOn\":\"2024-02-04T19:32:02.287Z\",\"modifiedOn\":\"2024-02-04T19:32:02.287Z\",\"createdBy\":\"3324230\",\"modifiedBy\":\"3324230\",\"parentId\":\"syn53468051\",\"concreteType\":\"org.sagebionetworks.repo.model.FileEntity\",\"versionNumber\":1,\"versionLabel\":\"1\",\"isLatestVersion\":true,\"dataFileHandleId\":\"132873005\"},\"entityType\":\"file\",\"annotations\":{\"id\":\"syn53468052\",\"etag\":\"00000000-0000-0000-0000-000000000000\",\"annotations\":{}},\"fileHandles\":[{\"id\":\"132873005\",\"etag\":\"0b2f433b-cee8-4f56-8715-9b58f0cd90ec\",\"createdBy\":\"3324230\",\"createdOn\":\"2024-02-04T19:32:02.000Z\",\"modifiedOn\":\"2024-02-04T19:32:02.000Z\",\"concreteType\":\"org.sagebionetworks.repo.model.file.S3FileHandle\",\"contentType\":\"application/octet-stream\",\"contentMd5\":\"3f466b7f85d184292a68cea1c4f7cfc2\",\"fileName\":\"file8d815491b24e\",\"storageLocationId\":1,\"contentSize\":27,\"status\":\"AVAILABLE\",\"bucketName\":\"proddata.sagebase.org\",\"key\":\"3324230/a5ce1154-9c2a-4302-a8d3-a54da0f28cb3/file8d815491b24e\",\"isPreview\":false}]}",
##   "entityId": "syn53468052",
##   "evaluationId": "9615529",
##   "id": "9742381",
##   "name": "file8d815491b24e",
##   "userId": "3324230",
##   "versionNumber": 1
## }
## 
## [[1]][[2]]
## {
##   "entityId": "syn53468052",
##   "etag": "cebe7e14-1704-4230-89c2-1b2fea5815d2",
##   "id": "9742381",
##   "modifiedOn": "2024-02-04T19:32:02.879Z",
##   "status": "RECEIVED",
##   "statusVersion": 0,
##   "submissionAnnotations": {},
##   "versionNumber": 1
## }

Retrieving submission by id:

# Not evaluating this section because of SYNPY-235
submission <- synGetSubmission(submission$id)
submission

Retrieving the submission status:

submissionStatus <- synGetSubmissionStatus(submission)
submissionStatus
## {
##   "entityId": "syn53468052",
##   "etag": "cebe7e14-1704-4230-89c2-1b2fea5815d2",
##   "id": "9742381",
##   "modifiedOn": "2024-02-04T19:32:02.879Z",
##   "status": "RECEIVED",
##   "statusVersion": 0,
##   "submissionAnnotations": {},
##   "versionNumber": 1
## }

To view the annotations:

submissionStatus$submissionAnnotations
## {}

To update an annotation:

submissionStatus$annotations["doubleAnnos"] <- list(c("rank" = 3))
synStore(submissionStatus)

Query an evaluation:

queryString <- sprintf("query=select * from evaluation_%s LIMIT %s OFFSET %s'", eval$id, 10, 0)
synRestGET(paste("/evaluation/submission/query?", URLencode(queryString), sep = ""))
## $headers
## list()
## 
## $rows
## list()
## 
## $totalNumberOfResults
## [1] 0

To learn more about writing an evaluation query, please see: http://docs.synapse.org/rest/GET/evaluation/submission/query.html

For more information, please see:

?synGetEvaluation
?synSubmit
?synGetSubmissionBundles
?synGetSubmission
?synGetSubmissionStatus

Sharing Access to Content

By default, data sets in Synapse are private to your user account, but they can easily be shared with specific users, groups, or the public.

Retrieve the sharing setting on an entity:

synGetAcl(project, principal_id = "273950")
## list()

The first time an entity is shared, an ACL object is created for that entity. Let’s make project public:

acl <- synSetPermissions(project, principalId = 273949, accessType = list("READ"))
acl
## $id
## [1] "syn53468051"
## 
## $creationDate
## [1] "2024-02-04T19:31:53.162Z"
## 
## $etag
## [1] "fcd01110-b0b7-4f7b-9fa2-630b4fa3b072"
## 
## $resourceAccess
## $resourceAccess[[1]]
## $resourceAccess[[1]]$principalId
## [1] 273949
## 
## $resourceAccess[[1]]$accessType
## [1] "READ"
## 
## 
## $resourceAccess[[2]]
## $resourceAccess[[2]]$principalId
## [1] 3324230
## 
## $resourceAccess[[2]]$accessType
## [1] "CHANGE_SETTINGS"    "CREATE"             "MODERATE"          
## [4] "DELETE"             "UPDATE"             "CHANGE_PERMISSIONS"
## [7] "READ"               "DOWNLOAD"

Now public can read:

synGetAcl(project, principal_id = 273950)
## [1] "READ"

Get permissions will obtain more human-readable view of an entity’s permissions

permissions = synGetPermissions(project)
permissions$can_view
## [1] TRUE
?synGetAcl
?synSetPermissions
?synGetPermissions
synDelete(project)
## NULL

File Views

A file view can be defined by its scope. It allows querying for FileEntity within the scope using a SQL-like syntax. Please visit the Views vignettes for more information.

Accessing the API Directly

These methods enable access to the Synapse REST(ish) API taking care of details like endpoints and authentication. See the REST API documentation.

?synRestGET
?synRestPOST
?synRestPUT
?synRestDELETE

Synapse Utilities

We provide some utility functions in the synapserutils package:

  • Copy Files, Folders, Tables, Links, Projects, and Wiki Pages.
  • Upload data to Synapse in bulk.
  • Download data from Synapse in bulk.

Please visit the synapserutils Github repository for instructions on how to download.

More information

For more information see the Synapse User Guide.