eflow.model_analysis.regression_analysis

Functions

check_if_directory_exists(directory_path)

Checks if the given directory path exists. Raises an error if doesn’t

convert_to_filename(filename[, …])

Attempts to make the filename string valid.

correct_directory_path(directory_path)

Attempts to convert the directory path to a proper one by removing

create_dir_structure(directory_path, …)

Creates required directory structures inside the parent

create_plt_png

create_unique_directory(directory_path, …)

Creates a unique folder in the proper directory structure.

df_to_image(df, directory_path, sub_dir, …)

dict_to_json_file(dict_obj, directory_path, …)

Writes a dict to a json file.

display(*objs[, include, exclude, metadata, …])

Display a Python object in all frontends.

explained_variance_score(y_true, y_pred[, …])

Explained variance regression score function

get_all_directories_from_path(directory_path)

Gets directories names with the provided path.

get_all_files_from_path(directory_path[, …])

Gets all filenames with the provided path.

get_unique_directory_path(directory_path, …)

Iterate through directory structure until a unique folder name can be

json_file_to_dict(filepath)

Returns back the dictionary from of a json file.

load_pickle_object(file_path)

max_error

mean_absolute_error(y_true, y_pred[, …])

Mean absolute error regression loss

mean_squared_error(y_true, y_pred[, …])

Mean squared error regression loss

mean_squared_log_error(y_true, y_pred[, …])

Mean squared logarithmic error regression loss

median_absolute_error(y_true, y_pred[, …])

Median absolute error regression loss

pickle_object_to_file(obj, directory_path, …)

Writes the object to a pickle file.

r2_score(y_true, y_pred[, sample_weight, …])

R^2 (coefficient of determination) regression score function.

write_object_text_to_file(obj, …[, …])

Writes the object’s string representation to a text file.

Classes

FeatureAnalysis(df_features[, …])

Analyzes the feature data of a pandas Dataframe object.

GRAPH_DEFAULTS

ModelAnalysis(dataset_name[, …])

All objects in model_analysis folder of eflow are related to this object.

RegressionAnalysis(dataset_name, model, …)

Analyzes a classification model’s result’s based on the prediction function(s) passed to it.

Exceptions

ProbasNotPossible

RequiresPredictionMethods

UnsatisfiedRequirments

class RegressionAnalysis(dataset_name, model, model_name, feature_order, target_feature, pred_funcs_dict, df_features, project_sub_dir='Regression Analysis', overwrite_full_path=None, save_model=True, notebook_mode=False)[source]

Analyzes a classification model’s result’s based on the prediction function(s) passed to it. Creates graphs and tables to be saved in directory structure.

perform_analysis(X, y, dataset_name, regression_error_analysis=False, regression_correct_analysis=False, ignore_metrics=[], custom_metrics_dict={}, display_visuals=True, mse_score=None)[source]

Runs all available analysis functions on the models predicted data.

Args:
X:

Feature matrix.

y:

Target data vector.

dataset_name:

The dataset’s name; this will create a sub-directory in which your generated graph will be inner-nested in.

regression_error_analysis: bool

Perform feature analysis on data that was incorrectly predicted.

regression_correct_analysis: bool

Perform feature analysis on data that was correctly predicted.

ignore_metrics:

Specify the default metrics to not apply to the classification data_analysis.

custom_metrics_dict:

Pass the name of metric(s) with the function definition(s) in a dictionary.

display_visuals:

Controls visual display of error error data_analysis if it is able to run.

Returns:

Performs all classification functionality with the provided feature data and target data.

  • plot_precision_recall_curve

  • classification_evaluation

  • plot_confusion_matrix

regression_correct_analysis(X, y, pred_name, dataset_name, mse_score, display_visuals=True, save_file=True, display_print=True, suppress_runtime_errors=True, aggregate_target_feature=True, selected_features=None, extra_tables=True, statistical_analysis_on_aggregates=True)[source]

Compares the actual target value to the predicted value and performs analysis of all the data.

Args:
X: np.matrix or lists of lists

Feature matrix.

y: collection object

Target data vector.

pred_name: str

The name of the prediction function in questioned stored in ‘self.__pred_funcs_dict’

dataset_name: str

The dataset’s name; this will create a sub-directory in which your generated graph will be inner-nested in.

feature_order: collection object

Features names in proper order to re-create the pandas dataframe.

display_visuals: bool

Boolean value to whether or not to display visualizations.

display_print: bool

Determines whether or not to print function’s embedded print statements.

save_file: bool

Boolean value to whether or not to save the file.

dataframe_snapshot: bool

Boolean value to determine whether or not generate and compare a snapshot of the dataframe in the dataset’s directory structure. Helps ensure that data generated in that directory is correctly associated to a dataframe.

suppress_runtime_errors: bool

If set to true; when generating any graphs will suppress any runtime errors so the program can keep running.

extra_tables: bool
When handling two types of features if set to true this will

generate any extra tables that might be helpful. Note -

These graphics may create duplicates if you already applied an aggregation in ‘perform_analysis’

aggregate_target_feature: bool

Aggregate the data of the target feature if the data is non-continuous data.

Note

In the future I will have this also working with continuous data.

selected_features: collection object of features

Will only focus on these selected feature’s and will ignore the other given features.

statistical_analysis_on_aggregates: bool

If set to true then the function ‘statistical_analysis_on_aggregates’ will run; which aggregates the data of the target feature either by discrete values or by binning/labeling continuous data.

regression_error_analysis(X, y, pred_name, dataset_name, mse_score, display_visuals=True, save_file=True, display_print=True, suppress_runtime_errors=True, aggregate_target_feature=True, selected_features=None, extra_tables=True, statistical_analysis_on_aggregates=True)[source]

Compares the actual target value to the predicted value and performs analysis of all the data.

Args:
X: np.matrix or lists of lists

Feature matrix.

y: collection object

Target data vector.

pred_name: str

The name of the prediction function in questioned stored in ‘self.__pred_funcs_dict’

dataset_name: str

The dataset’s name; this will create a sub-directory in which your generated graph will be inner-nested in.

feature_order: collection object

Features names in proper order to re-create the pandas dataframe.

thresholds:

If the model outputs a probability list/numpy array then we apply thresholds to the ouput of the model. For classification only; will not affect the direct output of the probabilities.

display_visuals: bool

Boolean value to whether or not to display visualizations.

display_print: bool

Determines whether or not to print function’s embedded print statements.

save_file: bool

Boolean value to whether or not to save the file.

dataframe_snapshot: bool

Boolean value to determine whether or not generate and compare a snapshot of the dataframe in the dataset’s directory structure. Helps ensure that data generated in that directory is correctly associated to a dataframe.

suppress_runtime_errors: bool

If set to true; when generating any graphs will suppress any runtime errors so the program can keep running.

extra_tables: bool
When handling two types of features if set to true this will

generate any extra tables that might be helpful. Note -

These graphics may create duplicates if you already applied an aggregation in ‘perform_analysis’

aggregate_target_feature: bool

Aggregate the data of the target feature if the data is non-continuous data.

Note

In the future I will have this also working with continuous data.

selected_features: collection object of features

Will only focus on these selected feature’s and will ignore the other given features.

statistical_analysis_on_aggregates: bool

If set to true then the function ‘statistical_analysis_on_aggregates’ will run; which aggregates the data of the target feature either by discrete values or by binning/labeling continuous data.

regression_metrics(X, y, pred_name, dataset_name, display_visuals=True, save_file=True, title='', custom_metrics_dict={}, ignore_metrics=[], multioutput=[None, 'uniform_average', 'variance_weighted'])[source]

Creates a dataframe based on the prediction metrics of the feature matrix and target vector.

Args:
X:

Feature matrix.

y:

Target data vector.

pred_name:

The name of the prediction function in questioned stored in ‘self.__pred_funcs_dict’

dataset_name:

The dataset’s name; this will create a sub-directory in which your generated graph will be inner-nested in.

thresholds:

If the model outputs a probability list/numpy array then we apply thresholds to the ouput of the model. For classification only; will not affect the direct output of the probabilities.

display_visuals:

Display tables.

save_file:

Determines whether or not to save the generated document.

title:

Adds to the column ‘Metric Score’.

custom_metrics_dict:

Pass the name of metric(s) and the function definition(s) in a dictionary.

ignore_metrics:

Specify the default metrics to not apply to the classification data_analysis.

  • Precision

  • MCC

  • Recall

  • F1-Score

  • Accuracy

average_scoring:
Determines the type of averaging performed on the data.
  • micro

  • macro

  • weighted

Returns:

Return a dataframe object of the metrics value.