linreg_ally.multicollinearity
Functions
|
Detects multicollinearity in the training dataset by computing the variance inflation factor (‘VIF’) and pairwise Pearson Correlation for each numeric feature. |
Module Contents
- linreg_ally.multicollinearity.check_multicollinearity(train_df: pandas.DataFrame, threshold=None, vif_only=False)[source]
Detects multicollinearity in the training dataset by computing the variance inflation factor (‘VIF’) and pairwise Pearson Correlation for each numeric feature.
- Parameters:
train_df (pd.DataFrame) – Training dataset
threshold (int) – Minimum threshold of VIF for a feature to be included in the returned dataframe. Default is None.
vif_only (Boolean) – If true, only a dataframe containing the VIF scores will be returned. Otherwise, the correlation chart is also returned.
- Returns:
pd.DataFrame
A dataframe containing the VIF of all numeric features in train_df.
alt.Chart – A chart that shows the pairwise Pearson Correlations of all numeric columns in train_df.
- Raises:
TypeError – If train_df is not a pandas DataFrame.
Examples
>>> from linreg_ally.multicollinearity import check_multicollinearity >>> vif_df, corr_chart = check_multicollinearity(train_df) >>> vif_df = check_multicollinearity(train_df, threshold = 5, vif_only = True)