transform
genie.transform
¶
This module contains all the transformation functions used throughout the GENIE package
Functions¶
_col_name_to_titlecase(string)
¶
Convert strings to titlecase. Supports strings separated by _.
PARAMETER | DESCRIPTION |
---|---|
string
|
A string
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
str
|
A string converted to title case
TYPE:
|
Source code in genie/transform.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
_convert_col_with_nas_to_str(df, col)
¶
This converts a column into str while preserving NAs
Source code in genie/transform.py
32 33 34 35 |
|
_convert_float_col_with_nas_to_int(df, col)
¶
This converts int column that was turned into a float col because pandas does that with int values that have NAs back into an int col with NAs intact
Source code in genie/transform.py
38 39 40 41 42 43 44 45 46 |
|
_convert_df_with_mixed_dtypes(read_csv_params)
¶
This checks if a dataframe read in normally comes out with mixed data types (which happens when low_memory = True because read_csv parses in chunks and guesses dtypes by chunk) and converts a dataframe with mixed datatypes to one datatype.
PARAMETER | DESCRIPTION |
---|---|
read_csv_params
|
of input params and values to pandas's read_csv function. needs to include filepath to dataset to be read in
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
pd.DataFrame : The dataset read in |
Source code in genie/transform.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|
_convert_values_to_na(input_df, values_to_replace, columns_to_convert)
¶
Converts given values to NA in an input dataset
PARAMETER | DESCRIPTION |
---|---|
input_df
|
input dataset
TYPE:
|
values_to_replace
|
string values to replace with na
TYPE:
|
columns_to_convert
|
subset of columns to convert with na in
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataFrame
|
pd.DataFrame: dataset with specified values replaced with NAs |
Source code in genie/transform.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
|