extract¶
Module: extract
¶
Functions that run feature extraction programs and save feature tables.
- Authors:
- Arno Klein, 2015 (arno@sagebase.org) http://binarybottle.com
Copyright 2015, Sage Bionetworks (http://sagebase.org), Apache v2.0 License
Functions¶
-
mhealthx.extract.
make_row_table
(file_path, table_stem, save_rows, row, row_data, feature_row=None)¶ Function to store feature row to a table.
- file_path : string
- path to accelerometer file (from row)
- table_stem : string
- prepend to output table file
- save_rows : Boolean
- save individual rows rather than write to a single feature table?
- row : pandas Series
- row to prepend, unaltered, to feature row (if feature_row is None)
- row_data : pandas Series (if feature_row is None)
- feature row
- feature_row : pandas Series
- feature row (skip feature row construction)
- feature_row : pandas Series
- row combining the original row with a row of feature values
- feature_table : string
- output table file (full path)
-
mhealthx.extract.
run_openSMILE
(audio_file, command, flag1, flags, flagn, args, closing, row, table_stem, save_rows)¶ Run openSMILE to process audio file and store feature row to a table.
- Steps ::
- Run openSMILE’s SMILExtract audio feature extraction command.
- Construct a feature row from the original and openSMILE rows.
- Write the feature row to a table or append to a feature table.
- audio_file : string
- full path to the input audio file
- command : string
- name of command: “SMILExtract”
- flag1 : string
- optional first command line flag
- flags : string or list of strings
- command line flags precede their respective args: [“-C”, “-I”, “-O”]
- flagn : string
- optional last command line flag
- args : string or list of strings
- command line arguments: [“config.conf”, “input.wav”, “output.csv”]
- closing : string
- closing string in command
- row : pandas Series
- row to prepend, unaltered, to feature row
- table_stem : string
- prepend to output table file
- save_rows : Boolean
- save individual rows rather than write to a single feature table?
- feature_row : pandas Series
- row combining the original row with a row of openSMILE feature values
- feature_table : string
- output table file (full path)
>>> # openSMILE setup, with examples below: >>> >>> import os >>> from mhealthx.extract import run_openSMILE >>> command = 'SMILExtract' >>> flag1 = '-I' >>> flags = '-C' >>> flagn = '-csvoutput' >>> args = os.path.join('/software', 'openSMILE-2.1.0', 'config', >>> 'IS13_ComParE.conf') >>> closing = '-nologfile 1' >>> save_rows = True >>> >>> # Example: phonation data >>> >>> from mhealthx.xio import get_convert_audio >>> from mhealthx.xio import extract_synapse_rows, read_file_from_synapse_table >>> import synapseclient >>> syn = synapseclient.Synapse() >>> syn.login() >>> synapse_table = 'syn4590865' >>> row_series, row_files = extract_synapse_rows(synapse_table, save_path='.', limit=1, username='', password='') >>> column_name = 'audio_audio.m4a' #, 'audio_countdown.m4a'] >>> convert_file_append = '.wav' >>> convert_command = 'ffmpeg' >>> convert_input_args = '-y -i' >>> convert_output_args = '-ac 2' >>> out_path = '.' >>> username = '' >>> password = '' >>> for i in range(1): >>> row = row_series[i] >>> row, filepath = read_file_from_synapse_table(synapse_table, row, >>> column_name, out_path, username, password) >>> print(row) >>> row, audio_file = get_convert_audio(synapse_table, >>> row, column_name, >>> convert_file_append, >>> convert_command, >>> convert_input_args, >>> convert_output_args, >>> out_path, username, password) >>> table_stem = './phonation' >>> feature_row, feature_table = run_openSMILE(audio_file, command, >>> flag1, flags, flagn, args, closing, >>> row, table_stem, save_rows)
-
mhealthx.extract.
run_pyGait
(data, t, sample_rate, duration, threshold, order, cutoff, distance, row, file_path, table_stem, save_rows=False)¶ Run pyGait (replication of iGAIT) accelerometer feature extraction code.
- Steps ::
- Run pyGait accelerometer feature extraction.
- Construct a feature row from the original and pyGait rows.
- Write the feature row to a table or append to a feature table.
- data : numpy array
- accelerometer data along any (preferably forward walking) axis
- t : list or numpy array
- accelerometer time points
- sample_rate : float
- sample rate of accelerometer reading (Hz)
- duration : float
- duration of accelerometer reading (s)
- threshold : float
- ratio to the maximum value of the anterior-posterior acceleration
- order : integer
- order of the Butterworth filter
- cutoff : integer
- cutoff frequency of the Butterworth filter (Hz)
- distance : float
- estimate of distance traversed
- row : pandas Series
- row to prepend, unaltered, to feature row
- file_path : string
- path to accelerometer file (from row)
- table_stem : string
- prepend to output table file
- save_rows : Boolean
- save individual rows rather than write to a single feature table?
- feature_row : pandas Series
- row combining the original row with a row of pyGait feature values
- feature_table : string
- output table file (full path)
>>> import pandas as pd >>> from mhealthx.xio import read_accel_json >>> from mhealthx.extract import run_pyGait >>> from mhealthx.extractors.pyGait import project_on_walking_direction >>> input_file = '/Users/arno/DriveWork/mhealthx/mpower_sample_data/accel_walking_outbound.json.items-6dc4a144-55c3-4e6d-982c-19c7a701ca243282023468470322798.tmp' >>> start = 150 >>> device_motion = False >>> t, axyz, gxyz, uxyz, rxyz, sample_rate, duration = read_accel_json(input_file, start, device_motion) >>> ax, ay, az = axyz >>> stride_fraction = 1.0/8.0 >>> threshold0 = 0.5 >>> threshold = 0.2 >>> order = 4 >>> cutoff = max([1, sample_rate/10]) >>> distance = None >>> row = pd.Series({'a':[1], 'b':[2], 'c':[3]}) >>> file_path = '/fake/path' >>> table_stem = './walking' >>> save_rows = True >>> px, py, pz = project_on_walking_direction(ax, ay, az, t, sample_rate, stride_fraction, threshold0, order, cutoff) >>> feature_row, feature_table = run_pyGait(py, t, sample_rate, duration, threshold, order, cutoff, distance, row, file_path, table_stem, save_rows)
-
mhealthx.extract.
run_quality
(gx, gy, gz, row, file_path, table_stem, save_rows=False)¶ Extract various features from time series data.
- gx : list
- x-axis gravity acceleration
- gy : list
- y-axis gravity acceleration
- gz : list
- z-axis gravity acceleration
- row : pandas Series
- row to prepend, unaltered, to feature row
- file_path : string
- path to accelerometer file (from row)
- table_stem : string
- prepend to output table file
- save_rows : Boolean
- save individual rows rather than write to a single feature table?
- feature_row : pandas Series
- row combining the original row with a row of quality measures
- feature_table : string
- output table file (full path)
>>> import pandas as pd >>> from mhealthx.xio import read_accel_json >>> from mhealthx.extract import run_quality >>> input_file = '/Users/arno/DriveWork/mhealthx/mpower_sample_data/deviceMotion_walking_outbound.json.items-a2ab9333-6d63-4676-977a-08591a5d837f5221783798792869048.tmp' >>> device_motion = True >>> start = 150 >>> t, axyz, gxyz, uxyz, rxyz, sample_rate, duration = read_accel_json(input_file, start, device_motion) >>> #ax, ay, az = axyz >>> gx, gy, gz = gxyz >>> #rx, ry, rz = rxyz >>> #uw, ux, uy, uz = wxyz >>> row = pd.Series({'a':[1], 'b':[2], 'c':[3]}) >>> file_path = '.' >>> table_stem = './walking' >>> save_rows = True >>> feature_row, feature_table = run_quality(gx, gy, gz, row, file_path, table_stem, save_rows)
-
mhealthx.extract.
run_sdf_features
(data, number_of_symbols, row, file_path, table_stem, save_rows)¶ Extract symbolic dynamic filtering features.
data : numpy array number_of_symbols : integer
number of symbols for symbolic dynamic filtering method- feature_row : pandas Series
- row combining the original row with a row of SDF feature values
- feature_table : string
- output table file (full path)
>>> import pandas as pd >>> from mhealthx.xio import read_accel_json >>> from mhealthx.extract import run_sdf_features >>> input_file = '/Users/arno/DriveWork/mhealthx/mpower_sample_data/accel_walking_outbound.json.items-6dc4a144-55c3-4e6d-982c-19c7a701ca243282023468470322798.tmp' >>> start = 150 >>> device_motion = False >>> t, axyz, gxyz, uxyz, rxyz, sample_rate, duration = read_accel_json(input_file, start, device_motion) >>> ax, data, az = axyz >>> number_of_symbols = 4 >>> row = pd.Series({'a':[1], 'b':[2], 'c':[3]}) >>> file_path = '.' >>> table_stem = './walking' >>> save_rows = True >>> feature_row, feature_table = run_sdf_features(data, number_of_symbols, row, file_path, table_stem, save_rows)
-
mhealthx.extract.
run_signal_features
(data, row, file_path, table_stem, save_rows=False)¶ Extract various features from time series data.
- data : numpy array of floats
- time series data
- row : pandas Series
- row to prepend, unaltered, to feature row
- file_path : string
- path to accelerometer file (from row)
- table_stem : string
- prepend to output table file
- save_rows : Boolean
- save individual rows rather than write to a single feature table?
- feature_row : pandas Series
- row combining the original row with a row of signal feature values
- feature_table : string
- output table file (full path)
>>> import pandas as pd >>> from mhealthx.xio import read_accel_json >>> from mhealthx.extract import run_signal_features >>> input_file = '/Users/arno/DriveWork/mhealthx/mpower_sample_data/accel_walking_outbound.json.items-6dc4a144-55c3-4e6d-982c-19c7a701ca243282023468470322798.tmp' >>> start = 150 >>> device_motion = False >>> t, axyz, gxyz, uxyz, rxyz, sample_rate, duration = read_accel_json(input_file, start, device_motion) >>> ax, data, az = axyz >>> row = pd.Series({'a':[1], 'b':[2], 'c':[3]}) >>> file_path = '.' >>> table_stem = './walking' >>> save_rows = True >>> feature_row, feature_table = run_signal_features(data, row, file_path, table_stem, save_rows)
-
mhealthx.extract.
run_tap_features
(xtaps, ytaps, t, threshold, row, file_path, table_stem, save_rows=False)¶ Run touch screen tap feature extraction methods.
- xtaps : numpy array of integers
- x coordinates of touch screen where tapped
- ytaps : numpy array of integers
- y coordinates of touch screen where tapped
- t : numpy array of floats
- time points of taps
- threshold : integer
- x offset threshold for left/right press event (pixels)
- row : pandas Series
- row to prepend, unaltered, to feature row
- file_path : string
- path to accelerometer file (from row)
- table_stem : string
- prepend to output table file
- save_rows : Boolean
- save individual rows rather than write to a single feature table?
- feature_row : pandas Series
- row combining the original row with a row of tap feature values
- feature_table : string
- output table file (full path)
>>> import numpy as np >>> import pandas as pd >>> from mhealthx.extract import run_tap_features >>> xtaps = np.round(200 * np.random.random(100)) >>> ytaps = np.round(300 * np.random.random(100)) >>> t = np.linspace(1, 100, 100) / 30.0 >>> threshold = 20 >>> row = pd.Series({'a':[1], 'b':[2], 'c':[3]}) >>> file_path = '.' >>> table_stem = './tapping' >>> save_rows = True >>> feature_row, feature_table = run_tap_features(xtaps, ytaps, t, threshold, row, file_path, table_stem, save_rows)