Logistic regression via generalized estimating equations (GEE)
Source:R/logistic_reg_gee.R
details_logistic_reg_gee.Rd
gee::gee()
uses generalized least squares to fit different types of models
with errors that are not independent.
Details
For this engine, there is a single mode: classification
Tuning Parameters
This model has no formal tuning parameters. It may be beneficial to determine the appropriate correlation structure to use, but this typically does not affect the predicted value of the model. It does have an effect on the inferential results and parameter covariance values.
Translation from parsnip to the original package
The multilevelmod extension package is required to fit this model.
library(multilevelmod)
logistic_reg() %>%
set_engine("gee") %>%
translate()
## Logistic Regression Model Specification (classification)
##
## Computational engine: gee
##
## Model fit template:
## multilevelmod::gee_fit(formula = missing_arg(), data = missing_arg(),
## family = binomial)
multilevelmod::gee_fit()
is a wrapper model around gee::gee()
.
Preprocessing requirements
There are no specific preprocessing needs. However, it is helpful to keep the clustering/subject identifier column as factor or character (instead of making them into dummy variables). See the examples in the next section.
Other details
The model cannot accept case weights.
Both gee:gee()
and gee:geepack()
specify the id/cluster variable
using an argument id
that requires a vector. parsnip doesn’t work that
way so we enable this model to be fit using a artificial function
id_var()
to be used in the formula. So, in the original package, the
call would look like:
gee(breaks ~ tension, id = wool, data = warpbreaks, corstr = "exchangeable")
With parsnip, we suggest using the formula method when fitting:
library(tidymodels)
data("toenail", package = "HSAUR3")
logistic_reg() %>%
set_engine("gee", corstr = "exchangeable") %>%
fit(outcome ~ treatment * visit + id_var(patientID), data = toenail)
When using tidymodels infrastructure, it may be better to use a
workflow. In this case, you can add the appropriate columns using
add_variables()
then supply the GEE formula when adding the model:
library(tidymodels)
gee_spec <-
logistic_reg() %>%
set_engine("gee", corstr = "exchangeable")
gee_wflow <-
workflow() %>%
# The data are included as-is using:
add_variables(outcomes = outcome, predictors = c(treatment, visit, patientID)) %>%
add_model(gee_spec, formula = outcome ~ treatment * visit + id_var(patientID))
fit(gee_wflow, data = toenail)
The gee::gee()
function always prints out warnings and output even
when silent = TRUE
. The parsnip "gee"
engine, by contrast, silences
all console output coming from gee::gee()
, even if silent = FALSE
.
Also, because of issues with the gee()
function, a supplementary call
to glm()
is needed to get the rank and QR decomposition objects so
that predict()
can be used.