io_data¶
Module: io_data
¶
Input/output functions to read and write Synapse tables.
- Authors:
- Arno Klein, 2015 (arno@sagebase.org) http://binarybottle.com
Copyright 2015, Sage Bionetworks (http://sagebase.org), Apache v2.0 License
Functions¶
-
mhealthx.io_data.
concatenate_tables_to_synapse_table
(tables, synapse_project_id, table_name, username='', password='')¶ Concatenate multiple table files or dataframes and store as a Synapse table.
Reuse the indices from the original file or DataFrame, increasing the number of columns.
- tables : list of strings or pandas DataFrames
- DataFrames or paths to files
- synapse_project_id : string
- Synapse ID for project to which output is to be written
- table_name : string
- schema name of table
- username : string
- Synapse username (only needed once on a given machine)
- password : string
- Synapse password (only needed once on a given machine)
- table_data : Pandas DataFrame
- output table data
- table_name : string
- schema name of table
- synapse_table_id : string
- Synapse ID for table
- synapse_project_id : string
- Synapse ID for project
>>> import pandas as pd >>> from mhealthx.synapse_io import concatenate_tables_to_synapse_table >>> Example with DataFrames: >>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], >>> 'B': ['B0', 'B1', 'B2', 'B3'], >>> 'C': ['C0', 'C1', 'C2', 'C3'], >>> 'D': ['D0', 'D1', 'D2', 'D3']}, >>> index=[0, 1, 2, 3]) >>> df2 = pd.DataFrame({'E': ['A4', 'A5', 'A6', 'A7'], >>> 'F': ['B4', 'B5', 'B6', 'B7'], >>> 'G': ['C4', 'C5', 'C6', 'C7'], >>> 'H': ['D4', 'D5', 'D6', 'D7']}, >>> index=[0, 1, 2, 3]) >>> tables = [df1, df2] >>> >>> Example with csv files: >>> tables = ['~/csv/test1.csv', '~/csv/test2.csv'] >>> >>> synapse_project_id = 'syn4899451' >>> table_name = 'Test to join tables' >>> username = '' >>> password = '' >>> table_data, table_name, synapse_table_id, synapse_project_id = concatenate_tables_to_synapse_table(tables, synapse_project_id, table_name, username, password)
-
mhealthx.io_data.
copy_synapse_table
(synapse_table_id, synapse_project_id, table_name='', remove_columns=[], username='', password='')¶ Copy Synapse table to another Synapse project.
- synapse_table_id : string
- Synapse ID for table to copy
- synapse_project_id : string
- copy table to project with this Synapse ID
- table_name : string
- schema name of table
- remove_columns : list of strings
- column headers for columns to be removed
- username : string
- Synapse username (only needed once on a given machine)
- password : string
- Synapse password (only needed once on a given machine)
- table_data : Pandas DataFrame
- Synapse table contents
- table_name : string
- schema name of table
- synapse_project_id : string
- Synapse ID for project within which table is to be written
>>> from mhealthx.synapse_io import copy_synapse_table >>> synapse_table_id = 'syn4590865' >>> synapse_project_id = 'syn4899451' >>> table_name = 'Copy of ' + synapse_table_id >>> remove_columns = ['audio_audio.m4a', 'audio_countdown.m4a'] >>> username = '' >>> password = '' >>> table_data, table_name, synapse_project_id = copy_synapse_table(synapse_table_id, synapse_project_id, table_name, remove_columns, username, password)
-
mhealthx.io_data.
files_to_synapse_table
(in_files, synapse_project_id, table_name, column_name='fileID', username='', password='')¶ Upload files and file handle IDs to Synapse.
- in_files : list of strings
- paths to files to upload to Synapse
- synapse_project_id : string
- Synapse ID for project to which table is to be written
- table_name : string
- schema name of table
- column_name : string
- header for column of fileIDs
- username : string
- Synapse username (only needed once on a given machine)
- password : string
- Synapse password (only needed once on a given machine)
- synapse_project_id : string
- Synapse ID for project
>>> from mhealthx.synapse_io import files_to_synapse_table >>> in_files = ['/Users/arno/Local/wav/test1.wav'] >>> synapse_project_id = 'syn4899451' >>> table_name = 'Test to store files and file handle IDs' >>> column_name = 'fileID1' >>> username = '' >>> password = '' >>> table_data, synapse_project_id = files_to_synapse_table(in_files, synapse_project_id, table_name, column_name, username, password) >>> #column_name = 'fileID2' >>> #in_files = ['/Users/arno/Local/wav/test2.wav'] >>> #table_data, synapse_project_id = files_to_synapse_table(in_files, synapse_project_id, table_name, column_name, username, password)
-
mhealthx.io_data.
read_synapse_table_files
(synapse_table_id, column_names=[], download_limit=None, out_path='.', username='', password='')¶ Read data from a Synapse table. If column_names specified, download files.
- synapse_table_id : string
- Synapse ID for table
- column_names : list of strings
- column headers for columns with fileIDs (if wish to download files)
- download_limit : int
- limit file downloads to this number of rows (None = all rows)
- out_path : string
- output path to store column_name files
- username : string
- Synapse username (only needed once on a given machine)
- password : string
- Synapse password (only needed once on a given machine)
- table_data : Pandas DataFrame
- Synapse table contents
- downloaded_files : list of lists of strings
- files from Synapse table column(s) (full paths to downloaded files)
>>> from mhealthx.synapse_io import read_synapse_table_files >>> synapse_table_id = 'syn4590865' #'syn4907789' >>> column_names = ['audio_audio.m4a', 'audio_countdown.m4a'] >>> download_limit = 3 # None = download files from all rows >>> out_path = '.' >>> username = '' >>> password = '' >>> table_data, downloaded_files = read_synapse_table_files(synapse_table_id, column_names, download_limit, out_path, username, password)
-
mhealthx.io_data.
write_synapse_table
(table_data, synapse_project_id, table_name='', username='', password='')¶ Write data to a Synapse table.
- table_data : Pandas DataFrame
- Synapse table contents
- synapse_project_id : string
- Synapse ID for project within which table is to be written
- table_name : string
- schema name of table
- username : string
- Synapse username (only needed once on a given machine)
- password : string
- Synapse password (only needed once on a given machine)
>>> from mhealthx.synapse_io import read_synapse_table_files, write_synapse_table >>> in_synapse_table_id = 'syn4590865' >>> synapse_project_id = 'syn4899451' >>> column_names = [] >>> download_limit = None >>> out_path = '.' >>> username = '' >>> password = '' >>> table_data, files = read_synapse_table_files(in_synapse_table_id, column_names, download_limit, out_path, username, password) >>> table_name = 'Contents of ' + in_synapse_table_id >>> write_synapse_table(table_data, synapse_project_id, table_name, username, password)