FeatureDataCleaner¶
from eflow.data_pipeline_segments.feature_data_cleaner import FeatureDataCleaner
-
class
FeatureDataCleaner
(segment_id=None, create_file=True)[source]¶ Designed for a multipurpose data cleaner.
-
drop_feature
(df, df_features, feature_name, _add_to_que=True)[source]¶ Drop a feature in the dataframe.
- Args:
- df: pd.Dataframe
Pandas Dataframe
- df_features: DataFrameType from eflow
Organizes feature types into groups.
- feature_name: string
Name of the feature in the datatframe
- _add_to_que: bool
Pushes the function to pipeline segment parent if set to ‘True’.
-
fill_nan_by_distribution
(df, df_features, feature_name, percentile, z_score=None, _add_to_que=True)[source]¶ Fill nan by the distribution of data.
- Args:
- df: pd.Dataframe
Pandas Dataframe
- df_features: DataFrameType from eflow
Organizes feature types into groups.
- feature_name: string
Name of the feature in the datatframe
percentile: float or int
z_score:
- _add_to_que: bool
Pushes the function to pipeline segment parent if set to ‘True’.
-
ignore_feature
(df, df_features, feature_name, _add_to_que=True)[source]¶ Ignore the given feature.
- Args:
- df: pd.Dataframe
Pandas Dataframe
- df_features: DataFrameType from eflow
Organizes feature types into groups.
- feature_name: string
Name of the feature in the datatframe
- _add_to_que: bool
Pushes the function to pipeline segment parent if set to ‘True’.
-
make_nan_assertions
(df, df_features, feature_name, _add_to_que=True)[source]¶ Make nan assertions for boolean features.
- Args:
- df: pd.Dataframe
Pandas Dataframe
- df_features: DataFrameType from eflow
Organizes feature types into groups.
- feature_name: string
Name of the feature in the datatframe
- _add_to_que: bool
Pushes the function to pipeline segment parent if set to ‘True’.
-
remove_nans
(df, df_features, feature_name, _add_to_que=True)[source]¶ Remove rows of data based on the given feature.
- Args:
- df: pd.Dataframe
Pandas Dataframe
- df_features: DataFrameType from eflow
Organizes feature types into groups.
- feature_name: string
Name of the feature in the datatframe
- _add_to_que: bool
Pushes the function to pipeline segment parent if set to ‘True’.
-