logistic_reg() is a way to generate a specification of a model before fitting and allows the model to be created using different packages in R, Stan, keras, or via Spark. The main arguments for the model are:

  • penalty: The total amount of regularization in the model. Note that this must be zero for some engines.

  • mixture: The mixture amounts of different types of regularization (see below). Note that this will be ignored for some engines.

These arguments are converted to their specific names at the time that the model is fit. Other options and arguments can be set using set_engine(). If left to their defaults here (NULL), the values are taken from the underlying model functions. If parameters need to be modified, update() can be used in lieu of recreating the object from scratch.

logistic_reg(mode = "classification", penalty = NULL, mixture = NULL)

Arguments

mode

A single character string for the type of model. The only possible value for this model is "classification".

penalty

A non-negative number representing the total amount of regularization (glmnet, LiblineaR, keras, and spark only). For keras models, this corresponds to purely L2 regularization (aka weight decay) while the other models can be either or a combination of L1 and L2 (depending on the value of mixture).

mixture

A number between zero and one (inclusive) that is the proportion of L1 regularization (i.e. lasso) in the model. When mixture = 1, it is a pure lasso model while mixture = 0 indicates that ridge regression is being used. (glmnet, LiblineaR, and spark only). For LiblineaR models, mixture must be exactly 0 or 1 only.

Details

For logistic_reg(), the mode will always be "classification".

The model can be created using the fit() function using the following engines:

  • R: "glm" (the default), "glmnet", or "LiblineaR"

  • Stan: "stan"

  • Spark: "spark"

  • keras: "keras"

For this model, other packages may add additional engines. Use show_engines() to see the current set of engines.

Note

For models created using the spark engine, there are several differences to consider. First, only the formula interface to via fit() is available; using fit_xy() will generate an error. Second, the predictions will always be in a spark table format. The names will be the same as documented but without the dots. Third, there is no equivalent to factor columns in spark tables so class predictions are returned as character columns. Fourth, to retain the model object for a new R session (via save()), the model$fit element of the parsnip object should be serialized via ml_save(object$fit) and separately saved to disk. In a new session, the object can be reloaded and reattached to the parsnip object.

Engine Details

Engines may have pre-set default arguments when executing the model fit call. For this type of model, the template of the fit calls are below.

glm

logistic_reg() %>% 
  set_engine("glm") %>% 
  translate()

## Logistic Regression Model Specification (classification)
## 
## Computational engine: glm 
## 
## Model fit template:
## stats::glm(formula = missing_arg(), data = missing_arg(), weights = missing_arg(), 
##     family = stats::binomial)

glmnet

logistic_reg(penalty = 0.1) %>% 
  set_engine("glmnet") %>% 
  translate()

## Logistic Regression Model Specification (classification)
## 
## Main Arguments:
##   penalty = 0.1
## 
## Computational engine: glmnet 
## 
## Model fit template:
## glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(), 
##     family = "binomial")

The glmnet engine requires a single value for the penalty argument (a number or tune()), but the full regularization path is always fit regardless of the value given to penalty. To pass in a custom sequence of values for glmnet’s lambda, use the argument path_values in set_engine(). This will assign the value of the glmnet lambda parameter without disturbing the value given of logistic_reg(penalty). For example:

logistic_reg(penalty = .1) %>% 
  set_engine("glmnet", path_values = c(0, 10^seq(-10, 1, length.out = 20))) %>% 
  translate()

## Logistic Regression Model Specification (classification)
## 
## Main Arguments:
##   penalty = 0.1
## 
## Computational engine: glmnet 
## 
## Model fit template:
## glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(), 
##     lambda = c(0, 10^seq(-10, 1, length.out = 20)), family = "binomial")

When fitting a pure ridge regression model (i.e., penalty = 0), we strongly suggest that you pass in a vector for path_values that includes zero. See issue #431 for a discussion.

When using predict(), the single penalty value used for prediction is the one specified in logistic_reg().

To predict on multiple penalties, use the multi_predict() function. This function returns a tibble with a list column called .pred containing all of the penalty results.

LiblineaR

logistic_reg() %>% 
  set_engine("LiblineaR") %>% 
  translate()

## Logistic Regression Model Specification (classification)
## 
## Computational engine: LiblineaR 
## 
## Model fit template:
## LiblineaR::LiblineaR(x = missing_arg(), y = missing_arg(), wi = missing_arg(), 
##     verbose = FALSE)

For LiblineaR models, the value for mixture can either be 0 (for ridge) or 1 (for lasso) but not other intermediate values. In the LiblineaR documentation, these correspond to types 0 (L2-regularized) and 6 (L1-regularized).

Be aware that the LiblineaR engine regularizes the intercept. Other regularized regression models do not, which will result in different parameter estimates.

stan

logistic_reg() %>% 
  set_engine("stan") %>% 
  translate()

## Logistic Regression Model Specification (classification)
## 
## Computational engine: stan 
## 
## Model fit template:
## rstanarm::stan_glm(formula = missing_arg(), data = missing_arg(), 
##     weights = missing_arg(), family = stats::binomial, refresh = 0)

Note that the refresh default prevents logging of the estimation process. Change this value in set_engine() to show the logs.

For prediction, the stan engine can compute posterior intervals analogous to confidence and prediction intervals. In these instances, the units are the original outcome and when std_error = TRUE, the standard deviation of the posterior distribution (or posterior predictive distribution as appropriate) is returned.

spark

logistic_reg() %>% 
  set_engine("spark") %>% 
  translate()

## Logistic Regression Model Specification (classification)
## 
## Computational engine: spark 
## 
## Model fit template:
## sparklyr::ml_logistic_regression(x = missing_arg(), formula = missing_arg(), 
##     weight_col = missing_arg(), family = "binomial")

keras

logistic_reg() %>% 
  set_engine("keras") %>% 
  translate()

## Logistic Regression Model Specification (classification)
## 
## Computational engine: keras 
## 
## Model fit template:
## parsnip::keras_mlp(x = missing_arg(), y = missing_arg(), hidden_units = 1, 
##     act = "linear")

Parameter translations

The standardized parameter names in parsnip can be mapped to their original names in each engine that has main parameters. Each engine typically has a different default value (shown in parentheses) for each parameter.

parsnipglmnetLiblineaRsparkkeras
penaltylambdacostreg_param (0)penalty (0)
mixturealpha (1)type (0)elastic_net_param (0)NA

See also

Examples

show_engines("logistic_reg")
#> # A tibble: 6 x 2 #> engine mode #> <chr> <chr> #> 1 glm classification #> 2 glmnet classification #> 3 LiblineaR classification #> 4 spark classification #> 5 keras classification #> 6 stan classification
logistic_reg()
#> Logistic Regression Model Specification (classification) #>
# Parameters can be represented by a placeholder: logistic_reg(penalty = varying())
#> Logistic Regression Model Specification (classification) #> #> Main Arguments: #> penalty = varying() #>