io_data

Module: io_data

Input/output functions to read and write Synapse tables.

Authors:

Copyright 2015, Sage Bionetworks (http://sagebase.org), Apache v2.0 License

Functions

mhealthx.io_data.concatenate_tables_to_synapse_table(tables, synapse_project_id, table_name, username='', password='')

Concatenate multiple table files or dataframes and store as a Synapse table.

Reuse the indices from the original file or DataFrame, increasing the number of columns.

tables : list of strings or pandas DataFrames
DataFrames or paths to files
synapse_project_id : string
Synapse ID for project to which output is to be written
table_name : string
schema name of table
username : string
Synapse username (only needed once on a given machine)
password : string
Synapse password (only needed once on a given machine)
table_data : Pandas DataFrame
output table data
table_name : string
schema name of table
synapse_table_id : string
Synapse ID for table
synapse_project_id : string
Synapse ID for project
>>> import pandas as pd
>>> from mhealthx.synapse_io import concatenate_tables_to_synapse_table
>>> Example with DataFrames:
>>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
>>>                     'B': ['B0', 'B1', 'B2', 'B3'],
>>>                     'C': ['C0', 'C1', 'C2', 'C3'],
>>>                     'D': ['D0', 'D1', 'D2', 'D3']},
>>>                    index=[0, 1, 2, 3])
>>> df2 = pd.DataFrame({'E': ['A4', 'A5', 'A6', 'A7'],
>>>                     'F': ['B4', 'B5', 'B6', 'B7'],
>>>                     'G': ['C4', 'C5', 'C6', 'C7'],
>>>                     'H': ['D4', 'D5', 'D6', 'D7']},
>>>                     index=[0, 1, 2, 3])
>>> tables = [df1, df2]
>>>
>>> Example with csv files:
>>> tables = ['~/csv/test1.csv', '~/csv/test2.csv']
>>>
>>> synapse_project_id = 'syn4899451'
>>> table_name = 'Test to join tables'
>>> username = ''
>>> password = ''
>>> table_data, table_name, synapse_table_id, synapse_project_id = concatenate_tables_to_synapse_table(tables, synapse_project_id, table_name, username, password)
mhealthx.io_data.copy_synapse_table(synapse_table_id, synapse_project_id, table_name='', remove_columns=[], username='', password='')

Copy Synapse table to another Synapse project.

synapse_table_id : string
Synapse ID for table to copy
synapse_project_id : string
copy table to project with this Synapse ID
table_name : string
schema name of table
remove_columns : list of strings
column headers for columns to be removed
username : string
Synapse username (only needed once on a given machine)
password : string
Synapse password (only needed once on a given machine)
table_data : Pandas DataFrame
Synapse table contents
table_name : string
schema name of table
synapse_project_id : string
Synapse ID for project within which table is to be written
>>> from mhealthx.synapse_io import copy_synapse_table
>>> synapse_table_id = 'syn4590865'
>>> synapse_project_id = 'syn4899451'
>>> table_name = 'Copy of ' + synapse_table_id
>>> remove_columns = ['audio_audio.m4a', 'audio_countdown.m4a']
>>> username = ''
>>> password = ''
>>> table_data, table_name, synapse_project_id = copy_synapse_table(synapse_table_id, synapse_project_id, table_name, remove_columns, username, password)
mhealthx.io_data.files_to_synapse_table(in_files, synapse_project_id, table_name, column_name='fileID', username='', password='')

Upload files and file handle IDs to Synapse.

in_files : list of strings
paths to files to upload to Synapse
synapse_project_id : string
Synapse ID for project to which table is to be written
table_name : string
schema name of table
column_name : string
header for column of fileIDs
username : string
Synapse username (only needed once on a given machine)
password : string
Synapse password (only needed once on a given machine)
synapse_project_id : string
Synapse ID for project
>>> from mhealthx.synapse_io import files_to_synapse_table
>>> in_files = ['/Users/arno/Local/wav/test1.wav']
>>> synapse_project_id = 'syn4899451'
>>> table_name = 'Test to store files and file handle IDs'
>>> column_name = 'fileID1'
>>> username = ''
>>> password = ''
>>> table_data, synapse_project_id = files_to_synapse_table(in_files, synapse_project_id, table_name, column_name, username, password)
>>> #column_name = 'fileID2'
>>> #in_files = ['/Users/arno/Local/wav/test2.wav']
>>> #table_data, synapse_project_id = files_to_synapse_table(in_files, synapse_project_id, table_name, column_name, username, password)
mhealthx.io_data.read_synapse_table_files(synapse_table_id, column_names=[], download_limit=None, out_path='.', username='', password='')

Read data from a Synapse table. If column_names specified, download files.

synapse_table_id : string
Synapse ID for table
column_names : list of strings
column headers for columns with fileIDs (if wish to download files)
download_limit : int
limit file downloads to this number of rows (None = all rows)
out_path : string
output path to store column_name files
username : string
Synapse username (only needed once on a given machine)
password : string
Synapse password (only needed once on a given machine)
table_data : Pandas DataFrame
Synapse table contents
downloaded_files : list of lists of strings
files from Synapse table column(s) (full paths to downloaded files)
>>> from mhealthx.synapse_io import read_synapse_table_files
>>> synapse_table_id = 'syn4590865' #'syn4907789'
>>> column_names = ['audio_audio.m4a', 'audio_countdown.m4a']
>>> download_limit = 3  # None = download files from all rows
>>> out_path = '.'
>>> username = ''
>>> password = ''
>>> table_data, downloaded_files = read_synapse_table_files(synapse_table_id, column_names, download_limit, out_path, username, password)
mhealthx.io_data.write_synapse_table(table_data, synapse_project_id, table_name='', username='', password='')

Write data to a Synapse table.

table_data : Pandas DataFrame
Synapse table contents
synapse_project_id : string
Synapse ID for project within which table is to be written
table_name : string
schema name of table
username : string
Synapse username (only needed once on a given machine)
password : string
Synapse password (only needed once on a given machine)
>>> from mhealthx.synapse_io import read_synapse_table_files, write_synapse_table
>>> in_synapse_table_id = 'syn4590865'
>>> synapse_project_id = 'syn4899451'
>>> column_names = []
>>> download_limit = None
>>> out_path = '.'
>>> username = ''
>>> password = ''
>>> table_data, files = read_synapse_table_files(in_synapse_table_id, column_names, download_limit, out_path, username, password)
>>> table_name = 'Contents of ' + in_synapse_table_id
>>> write_synapse_table(table_data, synapse_project_id, table_name, username, password)