欢迎加入 TidyFriday !

现在加入即可享受 8 折优惠! See: https://czxa.top/posts/34814/

This is my notes for learning Check your (Mixed) Model for Multicollinearity with ‘performance’.

The goal of `performance`

is to provide lightweight tools to assess and check the quality of your model. It includes functions such as `r2()`

for many models (including logistic mixed and Bayesian models), `icc()`

or helpers to `check_convergence()`

, `check_overdipsersion()`

or `check_zero_inflation()`

.

In this posting, we want to focus on multicollinearity. Multicollinearity is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from others, i.e. two or more predictors are more or less strongly correlated (also described as non-independent covariates). Multicollinearity may lead to severly biased regression coefficients and standard errors.

`check_collinearity()`

checks your model predictors for collinearity, The function works for simple models, but also for mixed models, including zero-inflated mixed models fitted with **glmmTMB** or **GLMMadapative** packages. The function provides a nice `print()`

and `plot()`

method, and examples are shown below:

First, we fit a simple linear model:

1 | `# devtools::install_github("easystats/performance")` |

Now, let’s check the model. Below you see two columns in the output, one indicating the variance inflation factor, VIF.

1 | `check_collinearity(model)` |

The variance inflation factor is a measure to analyze the magnitude of multicollinearity of model terms. A VIF less than 5 indicates a low correlation of that predictor with other predictors. A value between 5 and 10 indicates a moderate correlation, while VIF values larger than 10 are a sign for high, not tolerate correlation of model predictors.

The `Increased SE`

column in the output indicates how much larger the standard error is due to the correlation with other predictors.

Now let’s plot the results.

1 | x <- check_collinearity(model) |

For models with zero-inflation component, multicollinearity may happen both in the count as well as the zero-inflation component. By default, `check_collinearity()`

checks the complete model, however, you can check only certain components of the model using the component-argument. In the following example, we will focus on the complete model.

1 | `library(glmmTMB)` |

零膨胀模型适用于观察事件发生数中含有大量的零值。 例如保险索赔。

1 | `# fit mixed model with zero-inflation` |

As we can see, the print() method separates the results into the count and zero-inflated model components for a clear output. Similar, plot() produces facets for each components, so it’s easier to understand.

1 | plot(check_collinearity(model)) + |

Multicollinearity can have different reasons. Probably in many cases it will help to center or standardize the predictors. Sometimes the only way to avoid multicollinearity is to remove one of the predictors with a very high VIF value. Collecting more data may also help, but this is of course not always possible.

#
R

Update your browser to view this website correctly. Update my browser now