linreg_ally.models

Functions

run_linear_regression(dataframe, target_column, ...[, ...])

Performs linear regression with preprocessing using sklearn and outputs evaluation scoring metrics.

preprocess(numeric_feats, categorical_feats, drop_feats)

fit_predict(pipeline, X_train, X_test, y_train, ...)

Module Contents

linreg_ally.models.run_linear_regression(dataframe, target_column, numeric_feats, categorical_feats, drop_feats=None, test_size=0.2, random_state=None, scoring_metrics=['r2', 'neg_mean_squared_error'])[source]

Performs linear regression with preprocessing using sklearn and outputs evaluation scoring metrics.

Parameters:
  • dataframe (pandas.DataFrame) – full dataset including features and target.

  • target_column (string) – name of the target variable column.

  • numeric_feats (list) – columns to apply StandardScaler.

  • categorical_feats (list) – columns to apply OneHotEncoder.

  • drop_feats (list, optional) – columns to drop (default None).

  • test_size (float, optional) – proportion of the dataset to include in the test split (default 0.2).

  • random_state (int, optional) – controls the shuffling applied to the data before the split (default None).

  • scoring_metrics (list, optional) – scoring metrics to evaluate the model (default ‘r2’, ‘neg_mean_squared_error’).

Returns:

the fitted model DataFrames for the training and test features Series for the training and test labels dictionary of metric scores with metric names as keys

Return type:

tuple

Raises:
  • ValueError – When dataframe, target_column, test_size or scoring_metrics is not within the range of acceptable values

  • TypeError – When dataframe, random_state or scoring_metrics is not the expected type

Examples

>>> import pandas as pd
>>> from linreg_ally.linreg_ally import run_linear_regression
>>> df = pd.DataFrame({
...     "feature_1": [1, 2, 3, 4],
...     "feature_2": [0.5, 0.1, 0.4, 0.9],
...     "category": ["a", "b", "a", "b"],
...     "target": [1.0, 2.5, 3.4, 4.3]
... })
>>> target_column = 'target'
>>> numeric_feats = ['feature_1', 'feature_2']
>>> categorical_feats = ['category']
>>> drop_feats = []
>>> best_model, X_train, X_test, y_train, y_test, scores = run_linear_regression(
...     df, target_column, numeric_feats, categorical_feats, drop_feats, scoring_metrics=['r2', 'neg_mean_squared_error']
... )
>>> scores
{'r2': 0.52, 'neg_mean_squared_error': 1.23}
linreg_ally.models.preprocess(numeric_feats, categorical_feats, drop_feats)[source]
linreg_ally.models.fit_predict(pipeline, X_train, X_test, y_train, y_test, scoring_metrics)[source]