

display(*objs[, include, exclude, metadata, …])

Display a Python object in all frontends.

generate_meta_data(df, output_folder_path, …)

Creates files representing the shape and feature types of the dataframe.


Creates a pandas dataframe based on the missing data inside the


DataAnalysis(dataset_name[, overwrite_full_path])

All objects in data_analysis folder of eflow are related to this object.

DataFrameSnapshot([compare_shape, …])

Attempts to get a “snapshot” of a dataframe by extracting varying data of the pandas dataframe object; then generates a file to later compare in a set directory.

FeatureAnalysis(df_features[, …])

Analyzes the feature data of a pandas Dataframe object.


alias of eflow._hidden.constants.Enum

NullAnalysis(df_features[, dataset_sub_dir, …])

Analyzes a pandas dataframe’s object for null data; creates visuals like graphs and tables.



class NullAnalysis(df_features, dataset_sub_dir='', dataset_name='Default Dataset Name', overwrite_full_path=None, notebook_mode=False)[source]

Analyzes a pandas dataframe’s object for null data; creates visuals like graphs and tables.

feature_analysis_of_null_data(df, dataset_name, target_features=None, display_visuals=True, display_print=True, save_file=True, suppress_runtime_errors=True, aggregate_target_feature=True, selected_features=None, extra_tables=True, statistical_analysis_on_aggregates=True, nan_features=[])[source]

Performs all public methods that generate visualizations/insights that feature analysis uses on an aggregation of null data in a feature.


Pretty much my personal lazy button for running the entire object without specifying any method in particular.

df: pd.Dataframe

Pandas dataframe object

dataset_name: string

The dataset’s name; this will create a sub-directory in which your generated graph will be inner-nested in.

target_features: collection of string or None

A feature name that both exists in the init df_features and the passed dataframe.


If init to ‘None’ then df_features will try to extract out the target feature.

display_visuals: bool

Boolean value to whether or not to display visualizations.

display_print: bool

Determines whether or not to print function’s embedded print statements.

save_file: bool

Boolean value to whether or not to save the file.

suppress_runtime_errors: bool

If set to true; when generating any graphs will suppress any runtime errors so the program can keep running.

extra_tables: bool
When handling two types of features if set to true this will

generate any extra tables that might be helpful. Note -

These graphics may create duplicates if you already applied an aggregation in ‘perform_analysis’

statistical_analysis_on_aggregates: bool

If set to true then the function ‘statistical_analysis_on_aggregates’ will run; which aggregates the data of the target feature either by discrete values or by binning/labeling continuous data.

aggregate_target_feature: bool

Aggregate the data of the target feature if the data is non-continuous data.


In the future I will have this also working with continuous data.

selected_features: collection object of features

Will only focus on these selected feature’s and will ignore the other given features.

nan_features: collection of strings

Features names that must contain nan data to aggregate on.


If an empty dataframe is passed to this function or if the same dataframe is passed to it raise error.

missing_values_table(df, dataset_name, display_visuals=True, filename=None, sub_dir=None, save_file=True, dataframe_snapshot=True, suppress_runtime_errors=True, display_print=True)[source]

Creates/Saves a Pandas DataFrame object giving the percentage of the null data for the original DataFrame columns.

df: pd.Dataframe

Pandas DataFrame object

dataset_name: string

The dataset’s name; this will create a sub-directory in which your generated graph will be inner-nested in.

display_visuals: bool

Boolean value to whether or not to display visualizations.

display_print: bool

Determines whether or not to print function’s embedded print statements.

filename: string

If set to ‘None’ will default to a pre-defined string; unless it is set to an actual filename.

save_file: bool

Boolean value to whether or not to save the file.

dataframe_snapshot: bool

Boolean value to determine whether or not generate and compare a snapshot of the dataframe in the dataset’s directory structure. Helps ensure that data generated in that directory is correctly associated to a dataframe.

suppress_runtime_errors: bool

If set to true; when generating any graphs will suppress any runtime errors so the program can keep running.

perform_analysis(df, dataset_name, display_visuals=True, save_file=True, dataframe_snapshot=True, suppress_runtime_errors=True, display_print=True, null_features_only=False)[source]

Perform all public methods of the NullAnalysis object. Except for feature_analysis_of_null_data.

df: pd.Dataframe

Pandas Dataframe object.

dataset_name: string

The dataset’s name; this will create a sub-directory in which your generated graph will be inner-nested in.

display_visuals: bool

Boolean value to whether or not to display visualizations.

display_print: bool

Determines whether or not to print function’s embedded print statements.

save_file: bool

Boolean value to whether or not to save the file.

dataframe_snapshot: bool

Boolean value to determine whether or not generate and compare a snapshot of the dataframe in the dataset’s directory structure. Helps ensure that data generated in that directory is correctly associated to a dataframe.

suppress_runtime_errors: bool

If set to true; when generating any graphs will suppress any runtime errors so the program can keep running.

null_features_only: bool

Dataframe will pass on null features for the visualizations

plot_null_bar_graph(df, dataset_name, display_visuals=True, filename=None, sub_dir=None, save_file=True, dataframe_snapshot=True, suppress_runtime_errors=True, display_print=True, null_features_only=False, figsize=(24, 10), fontsize=16, labels=None, log=False, color='#072F5F', inline=False, filter=False, n=0, p=0, sort=None)[source]
Desc (Taken from missingno):

A bar graph visualization of the nullity of the given DataFrame then pushes the image to output folder.

df: pd.Dataframe

Pandas dataframe object

dataset_name: string

The dataset’s name; this will create a sub-directory in which your generated graph will be inner-nested in.

display_visuals: bool

Boolean value to whether or not to display visualizations.

display_print: bool

Determines whether or not to print function’s embedded print statements.

filename: string

If set to ‘None’ will default to a pre-defined string; unless it is set to an actual filename.

save_file: bool

Boolean value to whether or not to save the file.

dataframe_snapshot: bool

Boolean value to determine whether or not generate and compare a snapshot of the dataframe in the dataset’s directory structure. Helps ensure that data generated in that directory is correctly associated to a dataframe.

suppress_runtime_errors: bool

If set to true; when generating any graphs will suppress any runtime errors so the program can keep running.

null_features_only: bool

Dataframe will pass on null features for the visualizations

Please read the offical documentation for more about the parameters: Link - https://github.com/ResidentMario/missingno

Note -

Changed the default color of the bar graph because I thought it was ugly.

plot_null_dendrogram_graph(df, dataset_name, display_visuals=True, filename=None, sub_dir=None, save_file=True, dataframe_snapshot=True, suppress_runtime_errors=True, display_print=True, null_features_only=False, method='average', filter=None, n=0, p=0, orientation=None, figsize=(24, 10), fontsize=16, inline=False)[source]
Desc (Taken from missingno):

Fits a scipy hierarchical clustering algorithm to the given DataFrame’s variables and visualizes the results as a scipy dendrogram.


Pandas dataframe object

dataset_name: string

The dataset’s name; this will create a sub-directory in which your generated graph will be inner-nested in.

display_visuals: bool

Boolean value to whether or not to display visualizations.

display_print: bool

Determines whether or not to print function’s embedded print statements.

filename: string

If set to ‘None’ will default to a pre-defined string; unless it is set to an actual filename.

save_file: bool

Boolean value to whether or not to save the file.

dataframe_snapshot: bool

Boolean value to determine whether or not generate and compare a snapshot of the dataframe in the dataset’s directory structure. Helps ensure that data generated in that directory is correctly associated to a dataframe.

suppress_runtime_errors: bool

If set to true; when generating any graphs will suppress any runtime errors so the program can keep running.

null_features_only: bool

Dataframe will pass on only null features for the visualizations

Please read the offical documentation for more about the parameters: Link: https://github.com/ResidentMario/missingno

plot_null_heatmap_graph(df, dataset_name, display_visuals=True, filename=None, sub_dir=None, save_file=True, dataframe_snapshot=True, suppress_runtime_errors=True, display_print=True, inline=False, filter=None, n=0, p=0, sort=None, figsize=(24, 10), fontsize=16, labels=True, cmap='RdBu', vmin=-1, vmax=1, cbar=True)[source]
Desc (Taken from missingno):

Presents a seaborn heatmap visualization of nullity correlation in the given DataFrame.

df: pd.Dataframe

Pandas dataframe object

dataset_name: string

The dataset’s name; this will create a sub-directory in which your generated graph will be inner-nested in.

display_visuals: bool

Boolean value to whether or not to display visualizations.

display_print: bool

Determines whether or not to print function’s embedded print statements.

filename: string

If set to ‘None’ will default to a pre-defined string; unless it is set to an actual filename.

save_file: bool

Boolean value to whether or not to save the file.

dataframe_snapshot: bool

Boolean value to determine whether or not generate and compare a snapshot of the dataframe in the dataset’s directory structure. Helps ensure that data generated in that directory is correctly associated to a dataframe.

suppress_runtime_errors: bool

If set to true; when generating any graphs will suppress any runtime errors so the program can keep running.

Please read the offical documentation for more about the parameters: Link: https://github.com/ResidentMario/missingno


Changed the default color of the bar graph because I thought it was ugly.

plot_null_matrix_graph(df, dataset_name, display_visuals=True, display_print=True, filename=None, sub_dir=None, save_file=True, dataframe_snapshot=True, suppress_runtime_errors=True, null_features_only=False, filter=None, n=0, p=0, sort=None, figsize=(24, 10), width_ratios=(15, 1), color=(0.027, 0.184, 0.373), fontsize=16, labels=None, sparkline=True, inline=False, freq=None)[source]
Desc (Taken from missingno):

A matrix visualization of the nullity of the given DataFrame then pushes the image to output folder.

df: pd.Dataframe

Pandas dataframe object

dataset_name: string

The dataset’s name; this will create a sub-directory in which your generated graph will be inner-nested in.

display_visuals: bool

Boolean value to whether or not to display visualizations.

display_print: bool

Determines whether or not to print function’s embedded print statements.

save_file: bool

Boolean value to whether or not to save the file.

filename: string

If set to ‘None’ will default to a pre-defined string; unless it is set to an actual filename.

sub_dir: string

Specify the sub directory to append to the pre-defined folder path.

dataframe_snapshot: bool

Boolean value to determine whether or not generate and compare a snapshot of the dataframe in the dataset’s directory structure. Helps ensure that data generated in that directory is correctly associated to a dataframe.

suppress_runtime_errors: bool

If set to true; when generating any graphs will suppress any runtime errors so the program can keep running.

null_features_only: bool

Dataframe will pass on null features for the visualizations

Please read the offical documentation at for more about the parameters: Link: https://github.com/ResidentMario/missingno


Changed the default color of the bar graph because I thought it was ugly.