linreg_ally.multicollinearity

Functions

check_multicollinearity(train_df[, threshold, vif_only])

Detects multicollinearity in the training dataset by computing the variance inflation factor (‘VIF’) and pairwise Pearson Correlation for each numeric feature.

Module Contents

linreg_ally.multicollinearity.check_multicollinearity(train_df: pandas.DataFrame, threshold=None, vif_only=False)[source]

Detects multicollinearity in the training dataset by computing the variance inflation factor (‘VIF’) and pairwise Pearson Correlation for each numeric feature.

Parameters:
  • train_df (pd.DataFrame) – Training dataset

  • threshold (int) – Minimum threshold of VIF for a feature to be included in the returned dataframe. Default is None.

  • vif_only (Boolean) – If true, only a dataframe containing the VIF scores will be returned. Otherwise, the correlation chart is also returned.

Returns:

  • pd.DataFrame

  • A dataframe containing the VIF of all numeric features in train_df.

  • alt.Chart – A chart that shows the pairwise Pearson Correlations of all numeric columns in train_df.

Raises:

TypeError – If train_df is not a pandas DataFrame.

Examples

>>> from linreg_ally.multicollinearity import check_multicollinearity
>>> vif_df, corr_chart = check_multicollinearity(train_df)
>>> vif_df = check_multicollinearity(train_df, threshold = 5, vif_only = True)