data_io¶
Module: data_io
¶
Input/output functions to read and write data files or tables.
See synapse_io.py for reading from and writing to Synapse.org.
- Authors:
- Arno Klein, 2015 (arno@sagebase.org) http://binarybottle.com
Copyright 2015, Sage Bionetworks (http://sagebase.org), Apache v2.0 License
Functions¶
-
mhealthx.data_io.
arff_to_csv
(arff_file, output_csv_file=None)¶ Convert an arff file to a row.
Column headers include lines that start with '@attribute ‘, include ‘numeric’, and whose intervening string is not exception_string. The function raises an error if the number of resulting columns does not equal the number of numeric values.
Example input: arff output from openSMILE’s SMILExtract command
Adapted some formatting from: http://biggyani.blogspot.com/2014/08/ converting-back-and-forth-between-weka.html
- arff_file : string
- arff file (full path)
- output_csv_file : string or None
- output table file (full path)
- row_data : Pandas Series
- output table data
- output_csv_file : string or None
- output table file (full path)
>>> from mhealthx.data_io import arff_to_csv >>> arff_file = '/Users/arno/csv/test1.csv' >>> output_csv_file = None #'test.csv' >>> row_data, output_csv_file = arff_to_csv(arff_file, output_csv_file)
-
mhealthx.data_io.
concatenate_tables_horizontally
(tables, output_csv_file=None)¶ Horizontally concatenate multiple table files or pandas DataFrames that have the same number of rows and store as a csv table.
If any one of the members of the tables list is itself a list, call concatenate_tables_vertically() on this list.
- tables : list of strings or pandas DataFrames
- each component table has the same number of rows
- output_csv_file : string or None
- output table file (full path)
- table_data : Pandas DataFrame
- output table data
- output_csv_file : string or None
- output table file (full path)
>>> import pandas as pd >>> from mhealthx.data_io import concatenate_tables_horizontally >>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], >>> 'B': ['B0', 'B1', 'B2', 'B3'], >>> 'C': ['C0', 'C1', 'C2', 'C3']}, >>> index=[0, 1, 2, 3]) >>> df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'], >>> 'B': ['B4', 'B5', 'B6', 'B7'], >>> 'C': ['C4', 'C5', 'C6', 'C7']}, >>> index=[0, 1, 2, 3]) >>> tables = [df1, df2] >>> output_csv_file = None #'./test.csv' >>> table_data, output_csv_file = concatenate_tables_horizontally(tables, output_csv_file)
-
mhealthx.data_io.
concatenate_tables_vertically
(tables, output_csv_file=None)¶ Vertically concatenate multiple table files or pandas DataFrames with the same column names and store as a csv table.
- tables : list of table files or pandas DataFrames
- each table or dataframe has the same column names
- output_csv_file : string or None
- output table file (full path)
- table_data : Pandas DataFrame
- output table data
- output_csv_file : string or None
- output table file (full path)
>>> import pandas as pd >>> from mhealthx.data_io import concatenate_tables_vertically >>> df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], >>> 'B': ['B0', 'B1', 'B2', 'B3'], >>> 'C': ['C0', 'C1', 'C2', 'C3']}, >>> index=[0, 1, 2, 3]) >>> df2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'], >>> 'B': ['B4', 'B5', 'B6', 'B7'], >>> 'C': ['C4', 'C5', 'C6', 'C7']}, >>> index=[0, 1, 2, 3]) >>> tables = [df1, df2] >>> tables = ['/Users/arno/csv/table1.csv', '/Users/arno/csv/table2.csv'] >>> output_csv_file = None #'./test.csv' >>> table_data, output_csv_file = concatenate_tables_vertically(tables, output_csv_file)
-
mhealthx.data_io.
concatenate_two_tables_horizontally
(table1, table2, output_csv_file=None)¶ Horizontally concatenate two table files or pandas DataFrames that have the same number of rows and store as a csv table.
If either of the tables is itself a list, concatenate_two_tables_horizontally() will call concatenate_tables_vertically() on this list.
table1 : string or pandas DataFrame table2 : string or pandas DataFrame
same number of rows as table1- output_csv_file : string or None
- output table file (full path)
- table_data : Pandas DataFrame
- output table data
- output_csv_file : string or None
- output table file (full path)
>>> import pandas as pd >>> from mhealthx.data_io import concatenate_two_tables_horizontally >>> table1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], >>> 'B': ['B0', 'B1', 'B2', 'B3'], >>> 'C': ['C0', 'C1', 'C2', 'C3']}, >>> index=[0, 1, 2, 3]) >>> table2 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'], >>> 'B': ['B4', 'B5', 'B6', 'B7'], >>> 'C': ['C4', 'C5', 'C6', 'C7']}, >>> index=[0, 1, 2, 3]) >>> output_csv_file = None #'./test.csv' >>> table_data, output_csv_file = concatenate_two_tables_horizontally(table1, table2, output_csv_file)
-
mhealthx.data_io.
convert_audio_file
(old_file, new_file, command='ffmpeg', input_args='-i', output_args='-ac 2')¶ Convert audio file to new format.
- old_file : string
- full path to the input file
- new_file : string
- full path to the output file
- command : string
- executable command without arguments
- input_args : string
- arguments preceding input file name in command
- output_args : string
- arguments preceding output file name in command
- new_file : string
- full path to the output file
>>> from mhealthx.data_io import convert_audio_file >>> old_file = '/Users/arno/mhealthx_cache/mhealthx/feature_files/test.m4a' >>> new_file = 'test.wav' >>> command = 'ffmpeg' >>> input_args = '-i' >>> output_args = '-ac 2' >>> new_file = convert_audio_file(old_file, file_append, new_file, command, input_args, output_args)
-
mhealthx.data_io.
get_convert_audio
(synapse_table, row, column_name, convert_file_append='', convert_command='ffmpeg', convert_input_args='-i', convert_output_args='-ac 2', out_path=None, username='', password='')¶ Read data from a row of a Synapse table and convert audio file.
- Calls ::
- from mhealthx.synapse_io import read_files_from_row from mhealthx.data_io import convert_audio_file
- synapse_table : string or Schema
- a synapse ID or synapse table Schema object
- row : pandas Series or string
- row of a Synapse table converted to a Series or csv file
- column_name : string
- name of file handle column
- convert_file_append : string
- append to file name to indicate converted file format (e.g., ‘.wav’)
- convert_command : string
- executable command without arguments
- convert_input_args : string
- arguments preceding input file name for convert_command
- convert_output_args : string
- arguments preceding output file name for convert_command
- out_path : string or None
- a local path in which to store downloaded files. If None, stores them in (~/.synapseCache)
- username : string
- Synapse username (only needed once on a given machine)
- password : string
- Synapse password (only needed once on a given machine)
- row : pandas Series
- same as passed in: row of a Synapse table as a file or Series
- new_file : string
- full path to the converted file
>>> from mhealthx.data_io import get_convert_audio >>> from mhealthx.synapse_io import extract_rows, read_files_from_row >>> import synapseclient >>> syn = synapseclient.Synapse() >>> syn.login() >>> synapse_table = 'syn4590865' >>> row_series, row_files = extract_rows(synapse_table, save_path='.', limit=3, username='', password='') >>> column_name = 'audio_audio.m4a' #, 'audio_countdown.m4a'] >>> convert_file_append = '.wav' >>> convert_command = 'ffmpeg' >>> convert_input_args = '-i' >>> convert_output_args = '-ac 2' >>> out_path = '.' >>> username = '' >>> password = '' >>> for i in range(1): >>> row = row_series[i] >>> row, filepath = read_files_from_row(synapse_table, row, >>> column_name, out_path, username, password) >>> print(row) >>> row, new_file = get_convert_audio(synapse_table, >>> row, column_name, >>> convert_file_append, >>> convert_command, >>> convert_input_args, >>> convert_output_args, >>> out_path, username, password)
-
mhealthx.data_io.
row_to_table
(row_data, output_table)¶ Add row to table using nipype (thread-safe in multi-processor execution).
(Requires Python module lockfile)
- row_data : pandas Series
- row of data
- output_table : string
- add row to this table file
>>> import pandas as pd >>> from mhealthx.data_io import row_to_table >>> row_data = pd.Series({'A': ['A0'], 'B': ['B0'], 'C': ['C0']}) >>> output_table = 'test.csv' >>> row_to_table(row_data, output_table)