eflow.data_pipeline_segments.feature_data_cleaner¶
Functions
|
Checks if feature exists in the dataframe. |
|
Creates required directory structures inside the parent |
|
Creates a string of a random Hexadecimal value. |
|
Writes a dict to a json file. |
|
Display a Python object in all frontends. |
|
Gets all filenames with the provided path. |
|
Get a the parameters of a given function definition |
|
Returns back the dictionary from of a json file. |
|
Returns back a boolean value of whether or not the conditional the |
|
Writes the object’s string representation to a text file. |
Classes
|
|
|
Separates the features based off of dtypes to better keep track of feature types and helps make type assertions. |
|
Holds the function name’s and arguments to be pushed to a json file. |
|
Designed for a multipurpose data cleaner. |
|
Ensures a folder is part of eflow’s main output stream and creates a sub directory based on the arg dataset_name. |
|
Layout specification |
|
alias of |
|
deque([iterable[, maxlen]]) –> deque object |
Exceptions
|
|
|
-
class
FeatureDataCleaner
(segment_id=None, create_file=True)[source]¶ Designed for a multipurpose data cleaner.
-
drop_feature
(df, df_features, feature_name, _add_to_que=True)[source]¶ Drop a feature in the dataframe.
- Args:
- df: pd.Dataframe
Pandas Dataframe
- df_features: DataFrameType from eflow
Organizes feature types into groups.
- feature_name: string
Name of the feature in the datatframe
- _add_to_que: bool
Pushes the function to pipeline segment parent if set to ‘True’.
-
fill_nan_by_distribution
(df, df_features, feature_name, percentile, z_score=None, _add_to_que=True)[source]¶ Fill nan by the distribution of data.
- Args:
- df: pd.Dataframe
Pandas Dataframe
- df_features: DataFrameType from eflow
Organizes feature types into groups.
- feature_name: string
Name of the feature in the datatframe
percentile: float or int
z_score:
- _add_to_que: bool
Pushes the function to pipeline segment parent if set to ‘True’.
-
ignore_feature
(df, df_features, feature_name, _add_to_que=True)[source]¶ Ignore the given feature.
- Args:
- df: pd.Dataframe
Pandas Dataframe
- df_features: DataFrameType from eflow
Organizes feature types into groups.
- feature_name: string
Name of the feature in the datatframe
- _add_to_que: bool
Pushes the function to pipeline segment parent if set to ‘True’.
-
make_nan_assertions
(df, df_features, feature_name, _add_to_que=True)[source]¶ Make nan assertions for boolean features.
- Args:
- df: pd.Dataframe
Pandas Dataframe
- df_features: DataFrameType from eflow
Organizes feature types into groups.
- feature_name: string
Name of the feature in the datatframe
- _add_to_que: bool
Pushes the function to pipeline segment parent if set to ‘True’.
-
remove_nans
(df, df_features, feature_name, _add_to_que=True)[source]¶ Remove rows of data based on the given feature.
- Args:
- df: pd.Dataframe
Pandas Dataframe
- df_features: DataFrameType from eflow
Organizes feature types into groups.
- feature_name: string
Name of the feature in the datatframe
- _add_to_que: bool
Pushes the function to pipeline segment parent if set to ‘True’.
-