DataEncoder¶
from eflow.data_pipeline_segments.data_encoder import DataEncoder
-
class
DataEncoder
(segment_id=None, create_file=True)[source]¶ Attempts to convert features to the correct types. Will update the dataframe and df_features.
-
apply_value_representation
(df, df_features, _add_to_que=True)[source]¶ Translate features into most understandable/best representation
- Args:
- df: pd.Dataframe
Pandas dataframe.
- df_features: DataFrameTypes from eflow
DataFrameTypes object.
- _add_to_que: bool
Hidden variable to determine if the function should be pushed to the pipeline segment.
-
decode_data
(df, df_features, apply_value_representation=True, _add_to_que=True)[source]¶ Decode the data into non-numerical values for more descriptive analysis.
- Args:
- df: pd.Dataframe
Pandas dataframe.
- df_features: DataFrameTypes from eflow
DataFrameTypes object.
- apply_value_representation: bool
Translate features into most understandable/best representation/
- _add_to_que: bool
Hidden variable to determine if the function should be pushed to the pipeline segment.
-
encode_data
(df, df_features, apply_value_representation=True, _add_to_que=True)[source]¶ Encode the data into numerical values for machine learning processes.
- Args:
- df: pd.Dataframe
Pandas dataframe.
- df_features: DataFrameTypes from eflow
DataFrameTypes object.
- apply_value_representation: bool
Translate features into most understandable/best representation/
- _add_to_que: bool
Hidden variable to determine if the function should be pushed to the pipeline segment.
-
make_dummies
(df, df_features, qualitative_features=[], _feature_values_dict=None, _add_to_que=True)[source]¶ Create dummies features of based on qualtative feature data and removes the original feature.
- Note
_feature_values_dict does not need to be init. Used for backend resource.
- Args:
- df: pd.Dataframe
Pandas dataframe.
- df_features: DataFrameTypes from eflow
DataFrameTypes object.
- qualtative_features: collection of strings
Feature names to convert the feature data into dummy features.
- _add_to_que: bool
Hidden variable to determine if the function should be pushed to the pipeline segment.
-
make_values_bool
(df, df_features, _add_to_que=True)[source]¶ Convert all string bools to numeric bool value
- Args:
- df: pd.Dataframe
Pandas dataframe.
- df_features: DataFrameTypes from eflow
DataFrameTypes object.
- _add_to_que: bool
Hidden variable to determine if the function should be pushed to the pipeline segment.
-
revert_dummies
(df, df_features, qualitative_features=[], _add_to_que=True)[source]¶ Convert dummies features back to the original feature.
- Args:
- df: pd.Dataframe
Pandas dataframe.
- df_features: DataFrameTypes from eflow
DataFrameTypes object.
- qualitative_features: collection of strings
Feature names to convert the dummy features into original feature data.
- _add_to_que: bool
Hidden variable to determine if the function should be pushed to the pipeline segment.
-
revert_value_representation
(df, df_features, _add_to_que=True)[source]¶ Translate features back into worst representation
- Args:
- df: pd.Dataframe
Pandas dataframe.
- df_features: DataFrameTypes from eflow
DataFrameTypes object.
- _add_to_que: bool
Hidden variable to determine if the function should be pushed to the pipeline segment.
-