DataEncoder¶

from eflow.data_pipeline_segments.data_encoder import DataEncoder

class DataEncoder(segment_id=None, create_file=True)[source]¶

Attempts to convert features to the correct types. Will update the dataframe and df_features.

apply_value_representation(df, df_features, _add_to_que=True)[source]¶

Translate features into most understandable/best representation

Args:

df: pd.Dataframe: Pandas dataframe.
df_features: DataFrameTypes from eflow: DataFrameTypes object.
_add_to_que: bool: Hidden variable to determine if the function should be pushed to the pipeline segment.

decode_data(df, df_features, apply_value_representation=True, _add_to_que=True)[source]¶

Decode the data into non-numerical values for more descriptive analysis.

Args:

df: pd.Dataframe: Pandas dataframe.
df_features: DataFrameTypes from eflow: DataFrameTypes object.
apply_value_representation: bool: Translate features into most understandable/best representation/
_add_to_que: bool: Hidden variable to determine if the function should be pushed to the pipeline segment.

encode_data(df, df_features, apply_value_representation=True, _add_to_que=True)[source]¶

Encode the data into numerical values for machine learning processes.

Args:

df: pd.Dataframe: Pandas dataframe.
df_features: DataFrameTypes from eflow: DataFrameTypes object.
apply_value_representation: bool: Translate features into most understandable/best representation/
_add_to_que: bool: Hidden variable to determine if the function should be pushed to the pipeline segment.

make_dummies(df, df_features, qualitative_features=[], _feature_values_dict=None, _add_to_que=True)[source]¶

Create dummies features of based on qualtative feature data and removes the original feature.

Note
_feature_values_dict does not need to be init. Used for backend resource.

Args:

df: pd.Dataframe: Pandas dataframe.
df_features: DataFrameTypes from eflow: DataFrameTypes object.
qualtative_features: collection of strings: Feature names to convert the feature data into dummy features.
_add_to_que: bool: Hidden variable to determine if the function should be pushed to the pipeline segment.

make_values_bool(df, df_features, _add_to_que=True)[source]¶

Convert all string bools to numeric bool value

Args:

df: pd.Dataframe: Pandas dataframe.
df_features: DataFrameTypes from eflow: DataFrameTypes object.
_add_to_que: bool: Hidden variable to determine if the function should be pushed to the pipeline segment.

revert_dummies(df, df_features, qualitative_features=[], _add_to_que=True)[source]¶

Convert dummies features back to the original feature.

Args:

df: pd.Dataframe: Pandas dataframe.
df_features: DataFrameTypes from eflow: DataFrameTypes object.
qualitative_features: collection of strings: Feature names to convert the dummy features into original feature data.
_add_to_que: bool: Hidden variable to determine if the function should be pushed to the pipeline segment.

revert_value_representation(df, df_features, _add_to_que=True)[source]¶

Translate features back into worst representation

Args:

df: pd.Dataframe: Pandas dataframe.
df_features: DataFrameTypes from eflow: DataFrameTypes object.
_add_to_que: bool: Hidden variable to determine if the function should be pushed to the pipeline segment.