data_io¶
Module: data_io¶
Input/output functions to read and write data files or tables.
See synapse_io.py for reading from and writing to Synapse.org.
- Authors:
 - Arno Klein, 2015 (arno@sagebase.org) http://binarybottle.com
 
Copyright 2015, Sage Bionetworks (http://sagebase.org), Apache v2.0 License
Functions¶
- 
mhealthx.data_io.arff_to_csv(arff_file, output_csv_file=None)¶ Convert an arff file to a row.
Column headers include lines that start with '@attribute ‘, include ‘numeric’, and whose intervening string is not exception_string. The function raises an error if the number of resulting columns does not equal the number of numeric values.
Example input: arff output from openSMILE’s SMILExtract command
Adapted some formatting from: http://biggyani.blogspot.com/2014/08/ converting-back-and-forth-between-weka.html
- arff_file : string
 - arff file (full path)
 - output_csv_file : string or None
 - output table file (full path)
 
- row_data : Pandas Series
 - output table data
 - output_csv_file : string or None
 - output table file (full path)
 
>>> from mhealthx.data_io import arff_to_csv >>> arff_file = '/Users/arno/csv/test1.csv' >>> output_csv_file = None #'test.csv' >>> row_data, output_csv_file = arff_to_csv(arff_file, output_csv_file)
- 
mhealthx.data_io.concatenate_tables_horizontally(tables, output_csv_file=None)¶ Horizontally concatenate multiple table files or pandas DataFrames that have the same number of rows and store as a csv table.
If any one of the members of the tables list is itself a list, call concatenate_tables_vertically() on this list.
- tables : list of strings or pandas DataFrames
 - each component table has the same number of rows
 - output_csv_file : string or None
 - output table file (full path)
 
- table_data : Pandas DataFrame
 - output table data
 - output_csv_file : string or None
 - output table file (full path)
 
>>> import pandas as pd >>> from mhealthx.data_io import concatenate_tables_horizontally >>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], >>> 'B': ['B0', 'B1', 'B2', 'B3'], >>> 'C': ['C0', 'C1', 'C2', 'C3']}, >>> index=[0, 1, 2, 3]) >>> df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'], >>> 'B': ['B4', 'B5', 'B6', 'B7'], >>> 'C': ['C4', 'C5', 'C6', 'C7']}, >>> index=[0, 1, 2, 3]) >>> tables = [df1, df2] >>> output_csv_file = None #'./test.csv' >>> table_data, output_csv_file = concatenate_tables_horizontally(tables, output_csv_file)
- 
mhealthx.data_io.concatenate_tables_vertically(tables, output_csv_file=None)¶ Vertically concatenate multiple table files or pandas DataFrames with the same column names and store as a csv table.
- tables : list of table files or pandas DataFrames
 - each table or dataframe has the same column names
 - output_csv_file : string or None
 - output table file (full path)
 
- table_data : Pandas DataFrame
 - output table data
 - output_csv_file : string or None
 - output table file (full path)
 
>>> import pandas as pd >>> from mhealthx.data_io import concatenate_tables_vertically >>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], >>> 'B': ['B0', 'B1', 'B2', 'B3'], >>> 'C': ['C0', 'C1', 'C2', 'C3']}, >>> index=[0, 1, 2, 3]) >>> df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'], >>> 'B': ['B4', 'B5', 'B6', 'B7'], >>> 'C': ['C4', 'C5', 'C6', 'C7']}, >>> index=[0, 1, 2, 3]) >>> tables = [df1, df2] >>> tables = ['/Users/arno/csv/table1.csv', '/Users/arno/csv/table2.csv'] >>> output_csv_file = None #'./test.csv' >>> table_data, output_csv_file = concatenate_tables_vertically(tables, output_csv_file)
- 
mhealthx.data_io.concatenate_two_tables_horizontally(table1, table2, output_csv_file=None)¶ Horizontally concatenate two table files or pandas DataFrames that have the same number of rows and store as a csv table.
If either of the tables is itself a list, concatenate_two_tables_horizontally() will call concatenate_tables_vertically() on this list.
table1 : string or pandas DataFrame table2 : string or pandas DataFrame
same number of rows as table1- output_csv_file : string or None
 - output table file (full path)
 
- table_data : Pandas DataFrame
 - output table data
 - output_csv_file : string or None
 - output table file (full path)
 
>>> import pandas as pd >>> from mhealthx.data_io import concatenate_two_tables_horizontally >>> table1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], >>> 'B': ['B0', 'B1', 'B2', 'B3'], >>> 'C': ['C0', 'C1', 'C2', 'C3']}, >>> index=[0, 1, 2, 3]) >>> table2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'], >>> 'B': ['B4', 'B5', 'B6', 'B7'], >>> 'C': ['C4', 'C5', 'C6', 'C7']}, >>> index=[0, 1, 2, 3]) >>> output_csv_file = None #'./test.csv' >>> table_data, output_csv_file = concatenate_two_tables_horizontally(table1, table2, output_csv_file)
- 
mhealthx.data_io.convert_audio_file(old_file, new_file, command='ffmpeg', input_args='-i', output_args='-ac 2')¶ Convert audio file to new format.
- old_file : string
 - full path to the input file
 - new_file : string
 - full path to the output file
 - command : string
 - executable command without arguments
 - input_args : string
 - arguments preceding input file name in command
 - output_args : string
 - arguments preceding output file name in command
 
- new_file : string
 - full path to the output file
 
>>> from mhealthx.data_io import convert_audio_file >>> old_file = '/Users/arno/mhealthx_cache/mhealthx/feature_files/test.m4a' >>> new_file = 'test.wav' >>> command = 'ffmpeg' >>> input_args = '-i' >>> output_args = '-ac 2' >>> new_file = convert_audio_file(old_file, file_append, new_file, command, input_args, output_args)
- 
mhealthx.data_io.get_convert_audio(synapse_table, row, column_name, convert_file_append='', convert_command='ffmpeg', convert_input_args='-i', convert_output_args='-ac 2', out_path=None, username='', password='')¶ Read data from a row of a Synapse table and convert audio file.
- Calls ::
 - from mhealthx.synapse_io import read_files_from_row from mhealthx.data_io import convert_audio_file
 
- synapse_table : string or Schema
 - a synapse ID or synapse table Schema object
 - row : pandas Series or string
 - row of a Synapse table converted to a Series or csv file
 - column_name : string
 - name of file handle column
 - convert_file_append : string
 - append to file name to indicate converted file format (e.g., ‘.wav’)
 - convert_command : string
 - executable command without arguments
 - convert_input_args : string
 - arguments preceding input file name for convert_command
 - convert_output_args : string
 - arguments preceding output file name for convert_command
 - out_path : string or None
 - a local path in which to store downloaded files. If None, stores them in (~/.synapseCache)
 - username : string
 - Synapse username (only needed once on a given machine)
 - password : string
 - Synapse password (only needed once on a given machine)
 
- row : pandas Series
 - same as passed in: row of a Synapse table as a file or Series
 - new_file : string
 - full path to the converted file
 
>>> from mhealthx.data_io import get_convert_audio >>> from mhealthx.synapse_io import extract_rows, read_files_from_row >>> import synapseclient >>> syn = synapseclient.Synapse() >>> syn.login() >>> synapse_table = 'syn4590865' >>> row_series, row_files = extract_rows(synapse_table, save_path='.', limit=3, username='', password='') >>> column_name = 'audio_audio.m4a' #, 'audio_countdown.m4a'] >>> convert_file_append = '.wav' >>> convert_command = 'ffmpeg' >>> convert_input_args = '-i' >>> convert_output_args = '-ac 2' >>> out_path = '.' >>> username = '' >>> password = '' >>> for i in range(1): >>> row = row_series[i] >>> row, filepath = read_files_from_row(synapse_table, row, >>> column_name, out_path, username, password) >>> print(row) >>> row, new_file = get_convert_audio(synapse_table, >>> row, column_name, >>> convert_file_append, >>> convert_command, >>> convert_input_args, >>> convert_output_args, >>> out_path, username, password)
- 
mhealthx.data_io.row_to_table(row_data, output_table)¶ Add row to table using nipype (thread-safe in multi-processor execution).
(Requires Python module lockfile)
- row_data : pandas Series
 - row of data
 - output_table : string
 - add row to this table file
 
>>> import pandas as pd >>> from mhealthx.data_io import row_to_table >>> row_data = pd.Series({'A': ['A0'], 'B': ['B0'], 'C': ['C0']}) >>> output_table = 'test.csv' >>> row_to_table(row_data, output_table)