linreg_ally.models
==================

.. py:module:: linreg_ally.models


Functions
---------

.. autoapisummary::

   linreg_ally.models.run_linear_regression
   linreg_ally.models.preprocess
   linreg_ally.models.fit_predict


Module Contents
---------------

.. py:function:: run_linear_regression(dataframe, target_column, numeric_feats, categorical_feats, drop_feats=None, test_size=0.2, random_state=None, scoring_metrics=['r2', 'neg_mean_squared_error'])

   Performs linear regression with preprocessing using sklearn and outputs evaluation scoring metrics.

   :param dataframe: full dataset including features and target.
   :type dataframe: `pandas.DataFrame`
   :param target_column: name of the target variable column.
   :type target_column: `string`
   :param numeric_feats: columns to apply StandardScaler.
   :type numeric_feats: `list`
   :param categorical_feats: columns to apply OneHotEncoder.
   :type categorical_feats: `list`
   :param drop_feats: columns to drop (default None).
   :type drop_feats: `list`, optional
   :param test_size: proportion of the dataset to include in the test split (default 0.2).
   :type test_size: `float`, optional
   :param random_state: controls the shuffling applied to the data before the split (default None).
   :type random_state: `int`, optional
   :param scoring_metrics: scoring metrics to evaluate the model (default 'r2', 'neg_mean_squared_error').
   :type scoring_metrics: `list`, optional

   :returns: the fitted model
             DataFrames for the training and test features
             Series for the training and test labels
             dictionary of metric scores with metric names as keys
   :rtype: tuple

   :raises ValueError: When `dataframe`, `target_column`, `test_size` or `scoring_metrics` is not within the range of acceptable values
   :raises TypeError: When `dataframe`, `random_state` or `scoring_metrics` is not the expected type

   .. rubric:: Examples

   >>> import pandas as pd
   >>> from linreg_ally.linreg_ally import run_linear_regression
   >>> df = pd.DataFrame({
   ...     "feature_1": [1, 2, 3, 4],
   ...     "feature_2": [0.5, 0.1, 0.4, 0.9],
   ...     "category": ["a", "b", "a", "b"],
   ...     "target": [1.0, 2.5, 3.4, 4.3]
   ... })
   >>> target_column = 'target'
   >>> numeric_feats = ['feature_1', 'feature_2']
   >>> categorical_feats = ['category']
   >>> drop_feats = []
   >>> best_model, X_train, X_test, y_train, y_test, scores = run_linear_regression(
   ...     df, target_column, numeric_feats, categorical_feats, drop_feats, scoring_metrics=['r2', 'neg_mean_squared_error']
   ... )
   >>> scores
   {'r2': 0.52, 'neg_mean_squared_error': 1.23}


.. py:function:: preprocess(numeric_feats, categorical_feats, drop_feats)

.. py:function:: fit_predict(pipeline, X_train, X_test, y_train, y_test, scoring_metrics)