linreg_ally.models ================== .. py:module:: linreg_ally.models Functions --------- .. autoapisummary:: linreg_ally.models.run_linear_regression linreg_ally.models.preprocess linreg_ally.models.fit_predict Module Contents --------------- .. py:function:: run_linear_regression(dataframe, target_column, numeric_feats, categorical_feats, drop_feats=None, test_size=0.2, random_state=None, scoring_metrics=['r2', 'neg_mean_squared_error']) Performs linear regression with preprocessing using sklearn and outputs evaluation scoring metrics. :param dataframe: full dataset including features and target. :type dataframe: `pandas.DataFrame` :param target_column: name of the target variable column. :type target_column: `string` :param numeric_feats: columns to apply StandardScaler. :type numeric_feats: `list` :param categorical_feats: columns to apply OneHotEncoder. :type categorical_feats: `list` :param drop_feats: columns to drop (default None). :type drop_feats: `list`, optional :param test_size: proportion of the dataset to include in the test split (default 0.2). :type test_size: `float`, optional :param random_state: controls the shuffling applied to the data before the split (default None). :type random_state: `int`, optional :param scoring_metrics: scoring metrics to evaluate the model (default 'r2', 'neg_mean_squared_error'). :type scoring_metrics: `list`, optional :returns: the fitted model DataFrames for the training and test features Series for the training and test labels dictionary of metric scores with metric names as keys :rtype: tuple :raises ValueError: When `dataframe`, `target_column`, `test_size` or `scoring_metrics` is not within the range of acceptable values :raises TypeError: When `dataframe`, `random_state` or `scoring_metrics` is not the expected type .. rubric:: Examples >>> import pandas as pd >>> from linreg_ally.linreg_ally import run_linear_regression >>> df = pd.DataFrame({ ... "feature_1": [1, 2, 3, 4], ... "feature_2": [0.5, 0.1, 0.4, 0.9], ... "category": ["a", "b", "a", "b"], ... "target": [1.0, 2.5, 3.4, 4.3] ... }) >>> target_column = 'target' >>> numeric_feats = ['feature_1', 'feature_2'] >>> categorical_feats = ['category'] >>> drop_feats = [] >>> best_model, X_train, X_test, y_train, y_test, scores = run_linear_regression( ... df, target_column, numeric_feats, categorical_feats, drop_feats, scoring_metrics=['r2', 'neg_mean_squared_error'] ... ) >>> scores {'r2': 0.52, 'neg_mean_squared_error': 1.23} .. py:function:: preprocess(numeric_feats, categorical_feats, drop_feats) .. py:function:: fit_predict(pipeline, X_train, X_test, y_train, y_test, scoring_metrics)