extract

Module: extract

Functions that run feature extraction programs and save feature tables.

Authors:

Copyright 2015, Sage Bionetworks (http://sagebase.org), Apache v2.0 License

Functions

mhealthx.extract.make_row_table(file_path, table_stem, save_rows, row, row_data, feature_row=None)

Function to store feature row to a table.

file_path : string
path to accelerometer file (from row)
table_stem : string
prepend to output table file
save_rows : Boolean
save individual rows rather than write to a single feature table?
row : pandas Series
row to prepend, unaltered, to feature row (if feature_row is None)
row_data : pandas Series (if feature_row is None)
feature row
feature_row : pandas Series
feature row (skip feature row construction)
feature_row : pandas Series
row combining the original row with a row of feature values
feature_table : string
output table file (full path)
mhealthx.extract.run_openSMILE(audio_file, command, flag1, flags, flagn, args, closing, row, table_stem, save_rows)

Run openSMILE to process audio file and store feature row to a table.

Steps ::
  1. Run openSMILE’s SMILExtract audio feature extraction command.
  2. Construct a feature row from the original and openSMILE rows.
  3. Write the feature row to a table or append to a feature table.
audio_file : string
full path to the input audio file
command : string
name of command: “SMILExtract”
flag1 : string
optional first command line flag
flags : string or list of strings
command line flags precede their respective args: [“-C”, “-I”, “-O”]
flagn : string
optional last command line flag
args : string or list of strings
command line arguments: [“config.conf”, “input.wav”, “output.csv”]
closing : string
closing string in command
row : pandas Series
row to prepend, unaltered, to feature row
table_stem : string
prepend to output table file
save_rows : Boolean
save individual rows rather than write to a single feature table?
feature_row : pandas Series
row combining the original row with a row of openSMILE feature values
feature_table : string
output table file (full path)
>>> # openSMILE setup, with examples below:
>>>
>>> import os
>>> from mhealthx.extract import run_openSMILE
>>> command = 'SMILExtract'
>>> flag1 = '-I'
>>> flags = '-C'
>>> flagn = '-csvoutput'
>>> args = os.path.join('/software', 'openSMILE-2.1.0', 'config',
>>>                     'IS13_ComParE.conf')
>>> closing = '-nologfile 1'
>>> save_rows = True
>>>
>>> # Example: phonation data
>>>
>>> from mhealthx.xio import get_convert_audio
>>> from mhealthx.xio import extract_synapse_rows, read_file_from_synapse_table
>>> import synapseclient
>>> syn = synapseclient.Synapse()
>>> syn.login()
>>> synapse_table = 'syn4590865'
>>> row_series, row_files = extract_synapse_rows(synapse_table, save_path='.', limit=1, username='', password='')
>>> column_name = 'audio_audio.m4a' #, 'audio_countdown.m4a']
>>> convert_file_append = '.wav'
>>> convert_command = 'ffmpeg'
>>> convert_input_args = '-y -i'
>>> convert_output_args = '-ac 2'
>>> out_path = '.'
>>> username = ''
>>> password = ''
>>> for i in range(1):
>>>     row = row_series[i]
>>>     row, filepath = read_file_from_synapse_table(synapse_table, row,
>>>         column_name, out_path, username, password)
>>>     print(row)
>>>     row, audio_file = get_convert_audio(synapse_table,
>>>                                         row, column_name,
>>>                                         convert_file_append,
>>>                                         convert_command,
>>>                                         convert_input_args,
>>>                                         convert_output_args,
>>>                                         out_path, username, password)
>>> table_stem = './phonation'
>>> feature_row, feature_table = run_openSMILE(audio_file, command,
>>>                      flag1, flags, flagn, args, closing,
>>>                      row, table_stem, save_rows)
mhealthx.extract.run_pyGait(data, t, sample_rate, duration, threshold, order, cutoff, distance, row, file_path, table_stem, save_rows=False)

Run pyGait (replication of iGAIT) accelerometer feature extraction code.

Steps ::
  1. Run pyGait accelerometer feature extraction.
  2. Construct a feature row from the original and pyGait rows.
  3. Write the feature row to a table or append to a feature table.
data : numpy array
accelerometer data along any (preferably forward walking) axis
t : list or numpy array
accelerometer time points
sample_rate : float
sample rate of accelerometer reading (Hz)
duration : float
duration of accelerometer reading (s)
threshold : float
ratio to the maximum value of the anterior-posterior acceleration
order : integer
order of the Butterworth filter
cutoff : integer
cutoff frequency of the Butterworth filter (Hz)
distance : float
estimate of distance traversed
row : pandas Series
row to prepend, unaltered, to feature row
file_path : string
path to accelerometer file (from row)
table_stem : string
prepend to output table file
save_rows : Boolean
save individual rows rather than write to a single feature table?
feature_row : pandas Series
row combining the original row with a row of pyGait feature values
feature_table : string
output table file (full path)
>>> import pandas as pd
>>> from mhealthx.xio import read_accel_json
>>> from mhealthx.extract import run_pyGait
>>> from mhealthx.extractors.pyGait import project_on_walking_direction
>>> input_file = '/Users/arno/DriveWork/mhealthx/mpower_sample_data/accel_walking_outbound.json.items-6dc4a144-55c3-4e6d-982c-19c7a701ca243282023468470322798.tmp'
>>> start = 150
>>> device_motion = False
>>> t, axyz, gxyz, uxyz, rxyz, sample_rate, duration = read_accel_json(input_file, start, device_motion)
>>> ax, ay, az = axyz
>>> stride_fraction = 1.0/8.0
>>> threshold0 = 0.5
>>> threshold = 0.2
>>> order = 4
>>> cutoff = max([1, sample_rate/10])
>>> distance = None
>>> row = pd.Series({'a':[1], 'b':[2], 'c':[3]})
>>> file_path = '/fake/path'
>>> table_stem = './walking'
>>> save_rows = True
>>> px, py, pz = project_on_walking_direction(ax, ay, az, t, sample_rate, stride_fraction, threshold0, order, cutoff)
>>> feature_row, feature_table = run_pyGait(py, t, sample_rate, duration, threshold, order, cutoff, distance, row, file_path, table_stem, save_rows)
mhealthx.extract.run_quality(gx, gy, gz, row, file_path, table_stem, save_rows=False)

Extract various features from time series data.

gx : list
x-axis gravity acceleration
gy : list
y-axis gravity acceleration
gz : list
z-axis gravity acceleration
row : pandas Series
row to prepend, unaltered, to feature row
file_path : string
path to accelerometer file (from row)
table_stem : string
prepend to output table file
save_rows : Boolean
save individual rows rather than write to a single feature table?
feature_row : pandas Series
row combining the original row with a row of quality measures
feature_table : string
output table file (full path)
>>> import pandas as pd
>>> from mhealthx.xio import read_accel_json
>>> from mhealthx.extract import run_quality
>>> input_file = '/Users/arno/DriveWork/mhealthx/mpower_sample_data/deviceMotion_walking_outbound.json.items-a2ab9333-6d63-4676-977a-08591a5d837f5221783798792869048.tmp'
>>> device_motion = True
>>> start = 150
>>> t, axyz, gxyz, uxyz, rxyz, sample_rate, duration = read_accel_json(input_file, start, device_motion)
>>> #ax, ay, az = axyz
>>> gx, gy, gz = gxyz
>>> #rx, ry, rz = rxyz
>>> #uw, ux, uy, uz = wxyz
>>> row = pd.Series({'a':[1], 'b':[2], 'c':[3]})
>>> file_path = '.'
>>> table_stem = './walking'
>>> save_rows = True
>>> feature_row, feature_table = run_quality(gx, gy, gz, row, file_path, table_stem, save_rows)
mhealthx.extract.run_sdf_features(data, number_of_symbols, row, file_path, table_stem, save_rows)

Extract symbolic dynamic filtering features.

data : numpy array number_of_symbols : integer

number of symbols for symbolic dynamic filtering method
feature_row : pandas Series
row combining the original row with a row of SDF feature values
feature_table : string
output table file (full path)
>>> import pandas as pd
>>> from mhealthx.xio import read_accel_json
>>> from mhealthx.extract import run_sdf_features
>>> input_file = '/Users/arno/DriveWork/mhealthx/mpower_sample_data/accel_walking_outbound.json.items-6dc4a144-55c3-4e6d-982c-19c7a701ca243282023468470322798.tmp'
>>> start = 150
>>> device_motion = False
>>> t, axyz, gxyz, uxyz, rxyz, sample_rate, duration = read_accel_json(input_file, start, device_motion)
>>> ax, data, az = axyz
>>> number_of_symbols = 4
>>> row = pd.Series({'a':[1], 'b':[2], 'c':[3]})
>>> file_path = '.'
>>> table_stem = './walking'
>>> save_rows = True
>>> feature_row, feature_table = run_sdf_features(data, number_of_symbols, row, file_path, table_stem, save_rows)
mhealthx.extract.run_signal_features(data, row, file_path, table_stem, save_rows=False)

Extract various features from time series data.

data : numpy array of floats
time series data
row : pandas Series
row to prepend, unaltered, to feature row
file_path : string
path to accelerometer file (from row)
table_stem : string
prepend to output table file
save_rows : Boolean
save individual rows rather than write to a single feature table?
feature_row : pandas Series
row combining the original row with a row of signal feature values
feature_table : string
output table file (full path)
>>> import pandas as pd
>>> from mhealthx.xio import read_accel_json
>>> from mhealthx.extract import run_signal_features
>>> input_file = '/Users/arno/DriveWork/mhealthx/mpower_sample_data/accel_walking_outbound.json.items-6dc4a144-55c3-4e6d-982c-19c7a701ca243282023468470322798.tmp'
>>> start = 150
>>> device_motion = False
>>> t, axyz, gxyz, uxyz, rxyz, sample_rate, duration = read_accel_json(input_file, start, device_motion)
>>> ax, data, az = axyz
>>> row = pd.Series({'a':[1], 'b':[2], 'c':[3]})
>>> file_path = '.'
>>> table_stem = './walking'
>>> save_rows = True
>>> feature_row, feature_table = run_signal_features(data, row, file_path, table_stem, save_rows)
mhealthx.extract.run_tap_features(xtaps, ytaps, t, threshold, row, file_path, table_stem, save_rows=False)

Run touch screen tap feature extraction methods.

xtaps : numpy array of integers
x coordinates of touch screen where tapped
ytaps : numpy array of integers
y coordinates of touch screen where tapped
t : numpy array of floats
time points of taps
threshold : integer
x offset threshold for left/right press event (pixels)
row : pandas Series
row to prepend, unaltered, to feature row
file_path : string
path to accelerometer file (from row)
table_stem : string
prepend to output table file
save_rows : Boolean
save individual rows rather than write to a single feature table?
feature_row : pandas Series
row combining the original row with a row of tap feature values
feature_table : string
output table file (full path)
>>> import numpy as np
>>> import pandas as pd
>>> from mhealthx.extract import run_tap_features
>>> xtaps = np.round(200 * np.random.random(100))
>>> ytaps = np.round(300 * np.random.random(100))
>>> t = np.linspace(1, 100, 100) / 30.0
>>> threshold = 20
>>> row = pd.Series({'a':[1], 'b':[2], 'c':[3]})
>>> file_path = '.'
>>> table_stem = './tapping'
>>> save_rows = True
>>> feature_row, feature_table = run_tap_features(xtaps, ytaps, t, threshold, row, file_path, table_stem, save_rows)