data_io

Module: data_io

Input/output functions to read and write data files or tables.

See synapse_io.py for reading from and writing to Synapse.org.

Authors:

Copyright 2015, Sage Bionetworks (http://sagebase.org), Apache v2.0 License

Functions

mhealthx.data_io.arff_to_csv(arff_file, output_csv_file=None)

Convert an arff file to a row.

Column headers include lines that start with '@attribute ‘, include ‘numeric’, and whose intervening string is not exception_string. The function raises an error if the number of resulting columns does not equal the number of numeric values.

Example input: arff output from openSMILE’s SMILExtract command

Adapted some formatting from: http://biggyani.blogspot.com/2014/08/ converting-back-and-forth-between-weka.html

arff_file : string
arff file (full path)
output_csv_file : string or None
output table file (full path)
row_data : Pandas Series
output table data
output_csv_file : string or None
output table file (full path)
>>> from mhealthx.data_io import arff_to_csv
>>> arff_file = '/Users/arno/csv/test1.csv'
>>> output_csv_file = None #'test.csv'
>>> row_data, output_csv_file = arff_to_csv(arff_file, output_csv_file)
mhealthx.data_io.concatenate_tables_horizontally(tables, output_csv_file=None)

Horizontally concatenate multiple table files or pandas DataFrames that have the same number of rows and store as a csv table.

If any one of the members of the tables list is itself a list, call concatenate_tables_vertically() on this list.

tables : list of strings or pandas DataFrames
each component table has the same number of rows
output_csv_file : string or None
output table file (full path)
table_data : Pandas DataFrame
output table data
output_csv_file : string or None
output table file (full path)
>>> import pandas as pd
>>> from mhealthx.data_io import concatenate_tables_horizontally
>>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
>>>                     'B': ['B0', 'B1', 'B2', 'B3'],
>>>                     'C': ['C0', 'C1', 'C2', 'C3']},
>>>                    index=[0, 1, 2, 3])
>>> df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
>>>                     'B': ['B4', 'B5', 'B6', 'B7'],
>>>                     'C': ['C4', 'C5', 'C6', 'C7']},
>>>                     index=[0, 1, 2, 3])
>>> tables = [df1, df2]
>>> output_csv_file = None #'./test.csv'
>>> table_data, output_csv_file = concatenate_tables_horizontally(tables, output_csv_file)
mhealthx.data_io.concatenate_tables_vertically(tables, output_csv_file=None)

Vertically concatenate multiple table files or pandas DataFrames with the same column names and store as a csv table.

tables : list of table files or pandas DataFrames
each table or dataframe has the same column names
output_csv_file : string or None
output table file (full path)
table_data : Pandas DataFrame
output table data
output_csv_file : string or None
output table file (full path)
>>> import pandas as pd
>>> from mhealthx.data_io import concatenate_tables_vertically
>>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
>>>                     'B': ['B0', 'B1', 'B2', 'B3'],
>>>                     'C': ['C0', 'C1', 'C2', 'C3']},
>>>                    index=[0, 1, 2, 3])
>>> df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
>>>                     'B': ['B4', 'B5', 'B6', 'B7'],
>>>                     'C': ['C4', 'C5', 'C6', 'C7']},
>>>                     index=[0, 1, 2, 3])
>>> tables = [df1, df2]
>>> tables = ['/Users/arno/csv/table1.csv', '/Users/arno/csv/table2.csv']
>>> output_csv_file = None #'./test.csv'
>>> table_data, output_csv_file = concatenate_tables_vertically(tables, output_csv_file)
mhealthx.data_io.concatenate_two_tables_horizontally(table1, table2, output_csv_file=None)

Horizontally concatenate two table files or pandas DataFrames that have the same number of rows and store as a csv table.

If either of the tables is itself a list, concatenate_two_tables_horizontally() will call concatenate_tables_vertically() on this list.

table1 : string or pandas DataFrame table2 : string or pandas DataFrame

same number of rows as table1
output_csv_file : string or None
output table file (full path)
table_data : Pandas DataFrame
output table data
output_csv_file : string or None
output table file (full path)
>>> import pandas as pd
>>> from mhealthx.data_io import concatenate_two_tables_horizontally
>>> table1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
>>>                     'B': ['B0', 'B1', 'B2', 'B3'],
>>>                     'C': ['C0', 'C1', 'C2', 'C3']},
>>>                    index=[0, 1, 2, 3])
>>> table2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
>>>                     'B': ['B4', 'B5', 'B6', 'B7'],
>>>                     'C': ['C4', 'C5', 'C6', 'C7']},
>>>                     index=[0, 1, 2, 3])
>>> output_csv_file = None #'./test.csv'
>>> table_data, output_csv_file = concatenate_two_tables_horizontally(table1, table2, output_csv_file)
mhealthx.data_io.convert_audio_file(old_file, new_file, command='ffmpeg', input_args='-i', output_args='-ac 2')

Convert audio file to new format.

old_file : string
full path to the input file
new_file : string
full path to the output file
command : string
executable command without arguments
input_args : string
arguments preceding input file name in command
output_args : string
arguments preceding output file name in command
new_file : string
full path to the output file
>>> from mhealthx.data_io import convert_audio_file
>>> old_file = '/Users/arno/mhealthx_cache/mhealthx/feature_files/test.m4a'
>>> new_file = 'test.wav'
>>> command = 'ffmpeg'
>>> input_args = '-i'
>>> output_args = '-ac 2'
>>> new_file = convert_audio_file(old_file, file_append, new_file, command, input_args, output_args)
mhealthx.data_io.get_convert_audio(synapse_table, row, column_name, convert_file_append='', convert_command='ffmpeg', convert_input_args='-i', convert_output_args='-ac 2', out_path=None, username='', password='')

Read data from a row of a Synapse table and convert audio file.

Calls ::
from mhealthx.synapse_io import read_files_from_row from mhealthx.data_io import convert_audio_file
synapse_table : string or Schema
a synapse ID or synapse table Schema object
row : pandas Series or string
row of a Synapse table converted to a Series or csv file
column_name : string
name of file handle column
convert_file_append : string
append to file name to indicate converted file format (e.g., ‘.wav’)
convert_command : string
executable command without arguments
convert_input_args : string
arguments preceding input file name for convert_command
convert_output_args : string
arguments preceding output file name for convert_command
out_path : string or None
a local path in which to store downloaded files. If None, stores them in (~/.synapseCache)
username : string
Synapse username (only needed once on a given machine)
password : string
Synapse password (only needed once on a given machine)
row : pandas Series
same as passed in: row of a Synapse table as a file or Series
new_file : string
full path to the converted file
>>> from mhealthx.data_io import get_convert_audio
>>> from mhealthx.synapse_io import extract_rows, read_files_from_row
>>> import synapseclient
>>> syn = synapseclient.Synapse()
>>> syn.login()
>>> synapse_table = 'syn4590865'
>>> row_series, row_files = extract_rows(synapse_table, save_path='.', limit=3, username='', password='')
>>> column_name = 'audio_audio.m4a' #, 'audio_countdown.m4a']
>>> convert_file_append = '.wav'
>>> convert_command = 'ffmpeg'
>>> convert_input_args = '-i'
>>> convert_output_args = '-ac 2'
>>> out_path = '.'
>>> username = ''
>>> password = ''
>>> for i in range(1):
>>>     row = row_series[i]
>>>     row, filepath = read_files_from_row(synapse_table, row,
>>>         column_name, out_path, username, password)
>>>     print(row)
>>>     row, new_file = get_convert_audio(synapse_table,
>>>                                       row, column_name,
>>>                                       convert_file_append,
>>>                                       convert_command,
>>>                                       convert_input_args,
>>>                                       convert_output_args,
>>>                                       out_path, username, password)
mhealthx.data_io.row_to_table(row_data, output_table)

Add row to table using nipype (thread-safe in multi-processor execution).

(Requires Python module lockfile)

row_data : pandas Series
row of data
output_table : string
add row to this table file
>>> import pandas as pd
>>> from mhealthx.data_io import row_to_table
>>> row_data = pd.Series({'A': ['A0'], 'B': ['B0'], 'C': ['C0']})
>>> output_table = 'test.csv'
>>> row_to_table(row_data, output_table)