pscl::zeroinfl()
uses maximum likelihood estimation to fit a model for
count data that has separate model terms for predicting the counts and for
predicting the probability of a zero count.
Details
For this engine, there is a single mode: regression
Translation from parsnip to the underlying model call (regression)
The poissonreg extension package is required to fit this model.
library(poissonreg)
poisson_reg() %>%
set_engine("zeroinfl") %>%
translate()
## Poisson Regression Model Specification (regression)
##
## Computational engine: zeroinfl
##
## Model fit template:
## pscl::zeroinfl(formula = missing_arg(), data = missing_arg(),
## weights = missing_arg())
Preprocessing and special formulas for zero-inflated Poisson models
Factor/categorical predictors need to be converted to numeric values
(e.g., dummy or indicator variables) for this engine. When using the
formula method via fit()
, parsnip
will convert factor columns to indicators.
For this particular model, a special formula is used to specify which
columns affect the counts and which affect the model for the probability
of zero counts. These sets of terms are separated by a bar. For example,
y ~ x | z
. This type of formula is not used by the base R
infrastructure (e.g. model.matrix()
)
When fitting a parsnip model with this engine directly, the formula method is required and the formula is just passed through. For example:
library(tidymodels)
tidymodels_prefer()
data("bioChemists", package = "pscl")
poisson_reg() %>%
set_engine("zeroinfl") %>%
fit(art ~ fem + mar | ment, data = bioChemists)
## parsnip model object
##
##
## Call:
## pscl::zeroinfl(formula = art ~ fem + mar | ment, data = data)
##
## Count model coefficients (poisson with log link):
## (Intercept) femWomen marMarried
## 0.82840 -0.21365 0.02576
##
## Zero-inflation model coefficients (binomial with logit link):
## (Intercept) ment
## -0.363 -0.166
However, when using a workflow, the best approach is to avoid using
workflows::add_formula()
and use
workflows::add_variables()
in
conjunction with a model formula:
data("bioChemists", package = "pscl")
spec <-
poisson_reg() %>%
set_engine("zeroinfl")
workflow() %>%
add_variables(outcomes = c(art), predictors = c(fem, mar, ment)) %>%
add_model(spec, formula = art ~ fem + mar | ment) %>%
fit(data = bioChemists)
## ══ Workflow [trained] ══════════════════════════════════════════════════════════
## Preprocessor: Variables
## Model: poisson_reg()
##
## ── Preprocessor ────────────────────────────────────────────────────────────────
## Outcomes: c(art)
## Predictors: c(fem, mar, ment)
##
## ── Model ───────────────────────────────────────────────────────────────────────
##
## Call:
## pscl::zeroinfl(formula = art ~ fem + mar | ment, data = data)
##
## Count model coefficients (poisson with log link):
## (Intercept) femWomen marMarried
## 0.82840 -0.21365 0.02576
##
## Zero-inflation model coefficients (binomial with logit link):
## (Intercept) ment
## -0.363 -0.166
The reason for this is that
workflows::add_formula()
will try to
create the model matrix and either fail or create dummy variables
prematurely.