Download Data Programmatically from Portals | Synapse Documentation

Download Data Programmatically from Portals

Portals allow you to explore data stored in Synapse, a technology platform that allows researchers to aggregate, organize, analyze and share scientific data, code and insights. Synapse is designed to integrate seamlessly with your analytical workflow. Therefore, options to download data are available in the R client, command line client and Python client.

All entities in Synapse are automatically assigned a globally unique identifier used for reference with the format syn12345678. Often abbreviated to “synID”, the ID of an object never changes, even if the name does. You will use a synID to locate the files you wish to download.

Find Files using Explore

Search the available files via Explore Data or Explore Files in the navigation bar. The Explore section presents several ways to select data files of interest. The top of the page displays pie charts that summarize the number of files based on file annotations of interest, including Assay and Tissue, among others. Selection of one of these chart segments will filter the table below to subset the set of files. Alternatively, access the filters using the facet selection boxes to the left of the table. For this example, you will download the processed data and metadata from the MC-CAA study in the Alzheimer’s Disease (AD) Knowledge Portal.

Download Files

Command Line

The Synapse command line client can be used to download all data and file annotations with a single command.

The command line client is installed with the Synapse Python client, therefore Python 3 is required to install the Synapse command line client. Login to Synapse. If working on your personal computer, you may store your credentials locally by including the --rememberMe argument to allow automatic authentication with future Synapse interactions. This is recommended to prevent a case where you might accidentally share your password while sharing analytical code. In almost all cases, your Synapse API key is more secure than your password and is recommended to be used to login.

synapse login -u <Synapse username> -p <API key> --rememberMe

From Explore Data in the portal, select the Download Options icon and Programmatic Options to visualize the command to download the data subset.

alt text

The command synapse get with the -q argument downloads files from the entirety of the portal data that meet the specified condition. In this example, all processed and metadata files from the MC-CAA study will be downloaded. Execute the following command from the directory where you would like to store the files.

synapse get -q "SELECT * FROM syn11346063 WHERE ( ( "study" = 'MC-CAA' ) AND ( "dataSubtype" = 'processed' OR "dataSubtype" = 'metadata' ) )" 

Also in your working directory, you will find a SYNAPSE_TABLE_QUERY_###.csv file that lists the annotations associated with each downloaded file. Here, you will find helpful experimental details relevant to how the data was processed. Additionally, you will find important details about the file itself including the file version number.

R

In order to download data programmatically with R, you need a list of synIDs that correspond to the files. For downloading a large set of files, we recommend using the Synapse Python client. The Python client has been optimized for multi-threaded download and will provide you with faster download speeds.

Once you have identified the files you want to download from Explore Data, Export Table from Download Options. The table includes annotations associated with each downloaded file.

alt text

You may choose to download the file as a .csv or .tsv. Files are named Job-#### (where # is a long set of numbers). Move this file to your working directory to proceed with the following steps.

Install the Synapse R client synapser to download data from Synapse. Login to Synapse with your API key.

library(synapser)
synLogin("my_username", "api_key")

Read the exported table into R replacing Job-#### with the complete filename of the downloaded table. Create a directory to store files and download data using synGet. If downloadLocation is not specified, the files are downloaded to a hidden directory called ~/.synapseCache.

exported_table <- read.csv("Job-####.csv")
dir.create("files")
lapply(exported_table$id, synGet, downloadLocation = "./files")

The annotations in exported_table include experimental details relevant to how the data was processed.

Python

In order to download data programatically, you need a list of synIDs that correspond to the files.

Once you have identified the files you want to download from Explore Data, Export Table from Download Options. The table includes annotations associated with each downloaded file.

alt text

You may choose to download the file as a .csv or .tsv. Files are named Job-####, where # includes a long set of numbers. Move this file to your working directory to proceed with the following steps.

Install the Synapse Python client synapseclient to download data from Synapse, the pandas library to read a csv file and the os module to make a directory. Login to Synapse with your API key.

import synapseclient, pandas, os
syn = synapseclient.Synapse()

syn.login('my_username', 'api_key')

Read the exported table into R replacing Job-#### with the complete filename of the downloaded table. Create a directory to store files and download data using syn.get. If downloadLocation is not specified, the files are downloaded to a hidden directory called ~/.synapseCache.

exported_table = pandas.read_csv("Job-####.csv")
os.mkdir("files")
[syn.get(x, downloadLocation = "./files") for x in exported_table.id]

The annotations in exported_table include experimental details relevant to how the data was processed.

Need More Help?

Try posting a question to our Forum.

Let us know what was unclear or what has not been covered. Reader feedback is key to making the documentation better, so please let us know or open an issue in our Github repository (Sage-Bionetworks/synapseDocs).