data_io¶

Module: `data_io`¶

Input/output functions to read and write data files or tables.

See synapse_io.py for reading from and writing to Synapse.org.

Authors:

Arno Klein, 2015 (arno@sagebase.org) http://binarybottle.com

Functions¶

mhealthx.data_io.arff_to_csv(arff_file, output_csv_file=None)¶

Convert an arff file to a row.

Column headers include lines that start with '@attribute ‘, include ‘numeric’, and whose intervening string is not exception_string. The function raises an error if the number of resulting columns does not equal the number of numeric values.

Example input: arff output from openSMILE’s SMILExtract command

Adapted some formatting from: http://biggyani.blogspot.com/2014/08/ converting-back-and-forth-between-weka.html

arff_file : string: arff file (full path)
output_csv_file : string or None: output table file (full path)

row_data : Pandas Series: output table data
output_csv_file : string or None: output table file (full path)

>>> from mhealthx.data_io import arff_to_csv
>>> arff_file = '/Users/arno/csv/test1.csv'
>>> output_csv_file = None #'test.csv'
>>> row_data, output_csv_file = arff_to_csv(arff_file, output_csv_file)

mhealthx.data_io.concatenate_tables_horizontally(tables, output_csv_file=None)¶

Horizontally concatenate multiple table files or pandas DataFrames that have the same number of rows and store as a csv table.

If any one of the members of the tables list is itself a list, call concatenate_tables_vertically() on this list.

tables : list of strings or pandas DataFrames: each component table has the same number of rows
output_csv_file : string or None: output table file (full path)

table_data : Pandas DataFrame: output table data
output_csv_file : string or None: output table file (full path)

>>> import pandas as pd
>>> from mhealthx.data_io import concatenate_tables_horizontally
>>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
>>>                     'B': ['B0', 'B1', 'B2', 'B3'],
>>>                     'C': ['C0', 'C1', 'C2', 'C3']},
>>>                    index=[0, 1, 2, 3])
>>> df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
>>>                     'B': ['B4', 'B5', 'B6', 'B7'],
>>>                     'C': ['C4', 'C5', 'C6', 'C7']},
>>>                     index=[0, 1, 2, 3])
>>> tables = [df1, df2]
>>> output_csv_file = None #'./test.csv'
>>> table_data, output_csv_file = concatenate_tables_horizontally(tables, output_csv_file)

mhealthx.data_io.concatenate_tables_vertically(tables, output_csv_file=None)¶

Vertically concatenate multiple table files or pandas DataFrames with the same column names and store as a csv table.

tables : list of table files or pandas DataFrames: each table or dataframe has the same column names
output_csv_file : string or None: output table file (full path)

table_data : Pandas DataFrame: output table data
output_csv_file : string or None: output table file (full path)

>>> import pandas as pd
>>> from mhealthx.data_io import concatenate_tables_vertically
>>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
>>>                     'B': ['B0', 'B1', 'B2', 'B3'],
>>>                     'C': ['C0', 'C1', 'C2', 'C3']},
>>>                    index=[0, 1, 2, 3])
>>> df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
>>>                     'B': ['B4', 'B5', 'B6', 'B7'],
>>>                     'C': ['C4', 'C5', 'C6', 'C7']},
>>>                     index=[0, 1, 2, 3])
>>> tables = [df1, df2]
>>> tables = ['/Users/arno/csv/table1.csv', '/Users/arno/csv/table2.csv']
>>> output_csv_file = None #'./test.csv'
>>> table_data, output_csv_file = concatenate_tables_vertically(tables, output_csv_file)

mhealthx.data_io.concatenate_two_tables_horizontally(table1, table2, output_csv_file=None)¶

Horizontally concatenate two table files or pandas DataFrames that have the same number of rows and store as a csv table.

If either of the tables is itself a list, concatenate_two_tables_horizontally() will call concatenate_tables_vertically() on this list.

table1 : string or pandas DataFrame table2 : string or pandas DataFrame

same number of rows as table1

output_csv_file : string or None: output table file (full path)

table_data : Pandas DataFrame: output table data
output_csv_file : string or None: output table file (full path)

>>> import pandas as pd
>>> from mhealthx.data_io import concatenate_two_tables_horizontally
>>> table1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
>>>                     'B': ['B0', 'B1', 'B2', 'B3'],
>>>                     'C': ['C0', 'C1', 'C2', 'C3']},
>>>                    index=[0, 1, 2, 3])
>>> table2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
>>>                     'B': ['B4', 'B5', 'B6', 'B7'],
>>>                     'C': ['C4', 'C5', 'C6', 'C7']},
>>>                     index=[0, 1, 2, 3])
>>> output_csv_file = None #'./test.csv'
>>> table_data, output_csv_file = concatenate_two_tables_horizontally(table1, table2, output_csv_file)

mhealthx.data_io.convert_audio_file(old_file, new_file, command='ffmpeg', input_args='-i', output_args='-ac 2')¶

Convert audio file to new format.

old_file : string: full path to the input file
new_file : string: full path to the output file
command : string: executable command without arguments
input_args : string: arguments preceding input file name in command
output_args : string: arguments preceding output file name in command

new_file : string: full path to the output file

>>> from mhealthx.data_io import convert_audio_file
>>> old_file = '/Users/arno/mhealthx_cache/mhealthx/feature_files/test.m4a'
>>> new_file = 'test.wav'
>>> command = 'ffmpeg'
>>> input_args = '-i'
>>> output_args = '-ac 2'
>>> new_file = convert_audio_file(old_file, file_append, new_file, command, input_args, output_args)

mhealthx.data_io.get_convert_audio(synapse_table, row, column_name, convert_file_append='', convert_command='ffmpeg', convert_input_args='-i', convert_output_args='-ac 2', out_path=None, username='', password='')¶

Read data from a row of a Synapse table and convert audio file.

Calls ::: from mhealthx.synapse_io import read_files_from_row from mhealthx.data_io import convert_audio_file

synapse_table : string or Schema: a synapse ID or synapse table Schema object
row : pandas Series or string: row of a Synapse table converted to a Series or csv file
column_name : string: name of file handle column
convert_file_append : string: append to file name to indicate converted file format (e.g., ‘.wav’)
convert_command : string: executable command without arguments
convert_input_args : string: arguments preceding input file name for convert_command
convert_output_args : string: arguments preceding output file name for convert_command
out_path : string or None: a local path in which to store downloaded files. If None, stores them in (~/.synapseCache)
username : string: Synapse username (only needed once on a given machine)
password : string: Synapse password (only needed once on a given machine)

row : pandas Series: same as passed in: row of a Synapse table as a file or Series
new_file : string: full path to the converted file

>>> from mhealthx.data_io import get_convert_audio
>>> from mhealthx.synapse_io import extract_rows, read_files_from_row
>>> import synapseclient
>>> syn = synapseclient.Synapse()
>>> syn.login()
>>> synapse_table = 'syn4590865'
>>> row_series, row_files = extract_rows(synapse_table, save_path='.', limit=3, username='', password='')
>>> column_name = 'audio_audio.m4a' #, 'audio_countdown.m4a']
>>> convert_file_append = '.wav'
>>> convert_command = 'ffmpeg'
>>> convert_input_args = '-i'
>>> convert_output_args = '-ac 2'
>>> out_path = '.'
>>> username = ''
>>> password = ''
>>> for i in range(1):
>>>     row = row_series[i]
>>>     row, filepath = read_files_from_row(synapse_table, row,
>>>         column_name, out_path, username, password)
>>>     print(row)
>>>     row, new_file = get_convert_audio(synapse_table,
>>>                                       row, column_name,
>>>                                       convert_file_append,
>>>                                       convert_command,
>>>                                       convert_input_args,
>>>                                       convert_output_args,
>>>                                       out_path, username, password)

mhealthx.data_io.row_to_table(row_data, output_table)¶

Add row to table using nipype (thread-safe in multi-processor execution).

(Requires Python module lockfile)

row_data : pandas Series: row of data
output_table : string: add row to this table file

>>> import pandas as pd
>>> from mhealthx.data_io import row_to_table
>>> row_data = pd.Series({'A': ['A0'], 'B': ['B0'], 'C': ['C0']})
>>> output_table = 'test.csv'
>>> row_to_table(row_data, output_table)

data_io¶

Module: data_io¶

Functions¶

Module: `data_io`¶