FeatureDataCleaner

from eflow.data_pipeline_segments.feature_data_cleaner import FeatureDataCleaner

class FeatureDataCleaner(segment_id=None, create_file=True)[source]

Designed for a multipurpose data cleaner.

drop_feature(df, df_features, feature_name, _add_to_que=True)[source]

Drop a feature in the dataframe.

Args:
df: pd.Dataframe

Pandas Dataframe

df_features: DataFrameType from eflow

Organizes feature types into groups.

feature_name: string

Name of the feature in the datatframe

_add_to_que: bool

Pushes the function to pipeline segment parent if set to ‘True’.

fill_nan_by_distribution(df, df_features, feature_name, percentile, z_score=None, _add_to_que=True)[source]

Fill nan by the distribution of data.

Args:
df: pd.Dataframe

Pandas Dataframe

df_features: DataFrameType from eflow

Organizes feature types into groups.

feature_name: string

Name of the feature in the datatframe

percentile: float or int

z_score:

_add_to_que: bool

Pushes the function to pipeline segment parent if set to ‘True’.

ignore_feature(df, df_features, feature_name, _add_to_que=True)[source]

Ignore the given feature.

Args:
df: pd.Dataframe

Pandas Dataframe

df_features: DataFrameType from eflow

Organizes feature types into groups.

feature_name: string

Name of the feature in the datatframe

_add_to_que: bool

Pushes the function to pipeline segment parent if set to ‘True’.

make_nan_assertions(df, df_features, feature_name, _add_to_que=True)[source]

Make nan assertions for boolean features.

Args:
df: pd.Dataframe

Pandas Dataframe

df_features: DataFrameType from eflow

Organizes feature types into groups.

feature_name: string

Name of the feature in the datatframe

_add_to_que: bool

Pushes the function to pipeline segment parent if set to ‘True’.

remove_nans(df, df_features, feature_name, _add_to_que=True)[source]

Remove rows of data based on the given feature.

Args:
df: pd.Dataframe

Pandas Dataframe

df_features: DataFrameType from eflow

Organizes feature types into groups.

feature_name: string

Name of the feature in the datatframe

_add_to_que: bool

Pushes the function to pipeline segment parent if set to ‘True’.

run_widget(df, df_features, nan_feature_names=[])[source]
df:

A pandas dataframe object

df_features:

DataFrameTypes object; organizes feature types into groups.

Returns:

Returns a UI widget to create a JSON file for cleaning.