Using AWS Security Token Service (STS), Synapse can securely grant you temporary AWS credentials to access data directly in S3. This can be useful if you want to:
All of which you can now do with minimal overhead from Synapse.
There are a few important considerations when determining whether to enable STS on Synapse managed storage compared to external storage with an S3 bucket. With Synapse managed storage, permissions to access are read-only, thus data is only accessible to download or compute on directly once STS is enabled. Alternatively, read-only and read-write permissions can be granted on external storage allowing for data to be manipulated directly with the AWS command line interface. Subsequently, connections to Synapse can be updated if data is changed. In some cases, this workflow is preferable.
You can create an STS storage location using Synapse storage. Temporary S3 credentials will grant you access to the Files
and Folders
scoped to your STS storage location.
To set up the STS storage location using Synapse storage, first make sure you have an empty Synapse Folder. Note that you will need write access to that Folder. Then run the following code:
# Set storage location
import synapseclient
import json
syn = synapseclient.login()
FOLDER = 'syn12345'
destination = {'uploadType':'S3',
'stsEnabled':True,
'concreteType':'org.sagebionetworks.repo.model.project.S3StorageLocationSetting'}
destination = syn.restPOST('/storageLocation', body=json.dumps(destination))
project_destination ={'concreteType': 'org.sagebionetworks.repo.model.project.UploadDestinationListSetting',
'settingsType': 'upload'}
project_destination['locations'] = [destination['storageLocationId']]
project_destination['projectId'] = FOLDER
project_destination = syn.restPOST('/projectSettings', body = json.dumps(project_destination))
#set storage location
library(synapser)
library(rjson)
synLogin()
folderId <- 'syn12345'
destination <- list(uploadType='S3',
stsEnabled=TRUE,
concreteType='org.sagebionetworks.repo.model.project.S3StorageLocationSetting')
destination <- synRestPOST('/storageLocation', body=toJSON(destination))
projectDestination <- list(concreteType='org.sagebionetworks.repo.model.project.UploadDestinationListSetting',
settingsType='upload')
projectDestination$locations <- list(destination$storageLocationId)
projectDestination$projectId <- folderId
projectDestination <- synRestPOST('/projectSettings', body=toJSON(projectDestination))
Once the Synapse managed STS storage location is set up, you can upload files through the Synapse or the Synapse client of your choice.
You can also create a STS storage location in an external AWS S3 bucket.
There are benefits of creating connections to Synapse from an external bucket. If you already have data stored in S3, or if you have large amounts of data that you want to transfer with the AWS command line interface, you can avoid uploading data to Synapse managed storage by creating connections directly to the S3 bucket. Enabling an STS storage location in the external bucket allows access to the S3 directly for future computing.
Follow the steps in the Custom Storage Locations article to set read-write or read-only permissions on your external S3 bucket and enable cross-origin resource sharing (CORS). You may use AWS cloudformation for set up.
Again, you will need an empty Synapse Folder, and you will need write access to the Synapse Folder.
Instead of setting the S3 bucket as upload location, complete set up by running the following code on your Synapse Folder:
# Set storage location
import synapseclient
import json
syn = synapseclient.login()
FOLDER = 'syn12345'
destination = {'uploadType':'S3',
'stsEnabled':True,
'bucket':'nameofyourbucket',
'baseKey':'nameofyourbasekey',
'concreteType':'org.sagebionetworks.repo.model.project.ExternalS3StorageLocationSetting'}
destination = syn.restPOST('/storageLocation', body=json.dumps(destination))
project_destination ={'concreteType': 'org.sagebionetworks.repo.model.project.UploadDestinationListSetting',
'settingsType': 'upload'}
project_destination['locations'] = [destination['storageLocationId']]
project_destination['projectId'] = FOLDER
project_destination = syn.restPOST('/projectSettings', body = json.dumps(project_destination))
#set storage location
library(synapser)
library(rjson)
synLogin()
folderId <- 'syn12345'
destination <- list(uploadType='S3',
stsEnabled=TRUE,
bucket='nameofyourbucket',
baseKey='nameofyourbasekey',
concreteType='org.sagebionetworks.repo.model.project.ExternalS3StorageLocationSetting')
destination <- synRestPOST('/storageLocation', body=toJSON(destination))
projectDestination <- list(concreteType='org.sagebionetworks.repo.model.project.UploadDestinationListSetting',
settingsType='upload')
projectDestination$locations <- list(destination$storageLocationId)
projectDestination$projectId <- folderId
projectDestination <- synRestPOST('/projectSettings', body=toJSON(projectDestination))
Once your STS storage location is set up on your Synapse Folder, you can add files through the Synapse website or the Synapse client of your choice. If you plan to upload files directly to your S3 bucket, or if you already have files in your S3 bucket, you can add representations of those files to Synapse programmatically. Follow the steps to add files in your S3 bucket to Synapse.
Once your STS storage location is set up, you can use Synapse to request temporary AWS credentials to access your data in S3 directly. These temporary credentials are active for 12 hours.
To get temporary credentials, Python and Java code is provided below. The REST interface is also available to request temporary credentials.
StsCredentials stsCredentials = synapseClient.getTemporaryCredentialsForEntity(folderEntityId, StsPermission.read_only);
AWSCredentials awsCredentials = new BasicSessionCredentials(stsCredentials.getAccessKeyId(), stsCredentials.getSecretAccessKey(), stsCredentials.getSessionToken());
AWSCredentialsProvider awsCredentialsProvider = new AWSStaticCredentialsProvider(awsCredentials);
AmazonS3 s3Client = AmazonS3ClientBuilder.standard().withCredentials(awsCredentialsProvider).build();
import synapseclient
import boto3
syn = synapseclient.login()
sts_credentials = syn.restGET(f"/entity/{FOLDER}/sts?permission=read_only")
client = boto3.client(
's3',
aws_access_key_id=sts_credentials['accessKeyId'],
aws_secret_access_key=sts_credentials['secretAccessKey'],
aws_session_token=sts_credentials['sessionToken'],
)
ent = syn.get("syn12345", downloadFile=False)
client.download_file(ent._file_handle['bucketName'],
ent._file_handle['key'],
ent.name)
# ent._file_handle['bucketName'] -- The name of the Synapse bucket to download from.
# ent._file_handle['key'] -- The path of the file in the Synapse bucket
# ent.name -- The path to the file to download to.
If you have existing data in an S3 bucket, either stand-alone data or data from a previous Synapse Project
, you can use this sample code to migrate your S3 data to a new Synapse Folder with a STS storage location.
If your Synapse Project uses data from multiple S3 buckets, or if the data is in an S3 bucket you don’t own, then you may need to download the data and re-upload it a new Synapse Folder with a STS storage location. Use this sample code to migrate the data in your Project.
Try posting a question to our Forum.
Let us know what was unclear or what has not been covered. Reader feedback is key to making the documentation better, so please let us know or open an issue in our Github repository (Sage-Bionetworks/synapseDocs).