eflow.data_pipeline_segments.feature_data_cleaner¶

Functions

`check_if_feature_exists`(df, feature_name)	Checks if feature exists in the dataframe.
`create_dir_structure`(directory_path, …)	Creates required directory structures inside the parent
`create_hex_decimal_string`([string_len])	Creates a string of a random Hexadecimal value.
`dict_to_json_file`(dict_obj, directory_path, …)	Writes a dict to a json file.
`display`(*objs[, include, exclude, metadata, …])	Display a Python object in all frontends.
`get_all_files_from_path`(directory_path[, …])	Gets all filenames with the provided path.
`get_parameters`(f)	Get a the parameters of a given function definition
`json_file_to_dict`(filepath)	Returns back the dictionary from of a json file.
`string_condtional`(given_val, full_condtional)	Returns back a boolean value of whether or not the conditional the
`write_object_text_to_file`(obj, …[, …])	Writes the object’s string representation to a text file.

Classes

`DataCleaningWidget`([require_input, …])
`DataFrameTypes`([df, target_feature, …])	Separates the features based off of dtypes to better keep track of feature types and helps make type assertions.
`DataPipelineSegment`(object_type[, …])	Holds the function name’s and arguments to be pushed to a json file.
`FeatureDataCleaner`([segment_id, create_file])	Designed for a multipurpose data cleaner.
`FileOutput`(dataset_name[, overwrite_full_path])	Ensures a folder is part of eflow’s main output stream and creates a sub directory based on the arg dataset_name.
`Layout`(**kwargs)	Layout specification
`SYS_CONSTANTS`	alias of `eflow._hidden.constants.Enum`
`deque`	deque([iterable[, maxlen]]) –> deque object

Exceptions

`PipelineSegmentError`([error_message])
`UnsatisfiedRequirments`([error_message])

class FeatureDataCleaner(segment_id=None, create_file=True)[source]¶

Designed for a multipurpose data cleaner.

drop_feature(df, df_features, feature_name, _add_to_que=True)[source]¶

Drop a feature in the dataframe.

Args:

df: pd.Dataframe: Pandas Dataframe
df_features: DataFrameType from eflow: Organizes feature types into groups.
feature_name: string: Name of the feature in the datatframe
_add_to_que: bool: Pushes the function to pipeline segment parent if set to ‘True’.

fill_nan_by_distribution(df, df_features, feature_name, percentile, z_score=None, _add_to_que=True)[source]¶

Fill nan by the distribution of data.

Args:

percentile: float or int

z_score:

_add_to_que: bool: Pushes the function to pipeline segment parent if set to ‘True’.

ignore_feature(df, df_features, feature_name, _add_to_que=True)[source]¶

Ignore the given feature.

Args:

df: pd.Dataframe: Pandas Dataframe
df_features: DataFrameType from eflow: Organizes feature types into groups.
feature_name: string: Name of the feature in the datatframe
_add_to_que: bool: Pushes the function to pipeline segment parent if set to ‘True’.

make_nan_assertions(df, df_features, feature_name, _add_to_que=True)[source]¶

Make nan assertions for boolean features.

Args:

df: pd.Dataframe: Pandas Dataframe
df_features: DataFrameType from eflow: Organizes feature types into groups.
feature_name: string: Name of the feature in the datatframe
_add_to_que: bool: Pushes the function to pipeline segment parent if set to ‘True’.

remove_nans(df, df_features, feature_name, _add_to_que=True)[source]¶

Remove rows of data based on the given feature.

Args:

df: pd.Dataframe: Pandas Dataframe
df_features: DataFrameType from eflow: Organizes feature types into groups.
feature_name: string: Name of the feature in the datatframe
_add_to_que: bool: Pushes the function to pipeline segment parent if set to ‘True’.

run_widget(df, df_features, nan_feature_names=[])[source]¶