eflow.data_pipeline_segments.feature_data_cleaner

Functions

check_if_feature_exists(df, feature_name)

Checks if feature exists in the dataframe.

create_dir_structure(directory_path, …)

Creates required directory structures inside the parent

create_hex_decimal_string([string_len])

Creates a string of a random Hexadecimal value.

dict_to_json_file(dict_obj, directory_path, …)

Writes a dict to a json file.

display(*objs[, include, exclude, metadata, …])

Display a Python object in all frontends.

get_all_files_from_path(directory_path[, …])

Gets all filenames with the provided path.

get_parameters(f)

Get a the parameters of a given function definition

json_file_to_dict(filepath)

Returns back the dictionary from of a json file.

string_condtional(given_val, full_condtional)

Returns back a boolean value of whether or not the conditional the

write_object_text_to_file(obj, …[, …])

Writes the object’s string representation to a text file.

Classes

DataCleaningWidget([require_input, …])

DataFrameTypes([df, target_feature, …])

Separates the features based off of dtypes to better keep track of feature types and helps make type assertions.

DataPipelineSegment(object_type[, …])

Holds the function name’s and arguments to be pushed to a json file.

FeatureDataCleaner([segment_id, create_file])

Designed for a multipurpose data cleaner.

FileOutput(dataset_name[, overwrite_full_path])

Ensures a folder is part of eflow’s main output stream and creates a sub directory based on the arg dataset_name.

Layout(**kwargs)

Layout specification

SYS_CONSTANTS

alias of eflow._hidden.constants.Enum

deque

deque([iterable[, maxlen]]) –> deque object

Exceptions

PipelineSegmentError([error_message])

UnsatisfiedRequirments([error_message])

class FeatureDataCleaner(segment_id=None, create_file=True)[source]

Designed for a multipurpose data cleaner.

drop_feature(df, df_features, feature_name, _add_to_que=True)[source]

Drop a feature in the dataframe.

Args:
df: pd.Dataframe

Pandas Dataframe

df_features: DataFrameType from eflow

Organizes feature types into groups.

feature_name: string

Name of the feature in the datatframe

_add_to_que: bool

Pushes the function to pipeline segment parent if set to ‘True’.

fill_nan_by_distribution(df, df_features, feature_name, percentile, z_score=None, _add_to_que=True)[source]

Fill nan by the distribution of data.

Args:
df: pd.Dataframe

Pandas Dataframe

df_features: DataFrameType from eflow

Organizes feature types into groups.

feature_name: string

Name of the feature in the datatframe

percentile: float or int

z_score:

_add_to_que: bool

Pushes the function to pipeline segment parent if set to ‘True’.

ignore_feature(df, df_features, feature_name, _add_to_que=True)[source]

Ignore the given feature.

Args:
df: pd.Dataframe

Pandas Dataframe

df_features: DataFrameType from eflow

Organizes feature types into groups.

feature_name: string

Name of the feature in the datatframe

_add_to_que: bool

Pushes the function to pipeline segment parent if set to ‘True’.

make_nan_assertions(df, df_features, feature_name, _add_to_que=True)[source]

Make nan assertions for boolean features.

Args:
df: pd.Dataframe

Pandas Dataframe

df_features: DataFrameType from eflow

Organizes feature types into groups.

feature_name: string

Name of the feature in the datatframe

_add_to_que: bool

Pushes the function to pipeline segment parent if set to ‘True’.

remove_nans(df, df_features, feature_name, _add_to_que=True)[source]

Remove rows of data based on the given feature.

Args:
df: pd.Dataframe

Pandas Dataframe

df_features: DataFrameType from eflow

Organizes feature types into groups.

feature_name: string

Name of the feature in the datatframe

_add_to_que: bool

Pushes the function to pipeline segment parent if set to ‘True’.

run_widget(df, df_features, nan_feature_names=[])[source]
df:

A pandas dataframe object

df_features:

DataFrameTypes object; organizes feature types into groups.

Returns:

Returns a UI widget to create a JSON file for cleaning.