linear_reg()
is a way to generate a specification of a model
before fitting and allows the model to be created using
different packages in R, Stan, keras, or via Spark. The main
arguments for the model are:
penalty
: The total amount of regularization
in the model. Note that this must be zero for some engines.
mixture
: The mixture amounts of different types of
regularization (see below). Note that this will be ignored for some engines.
These arguments are converted to their specific names at the
time that the model is fit. Other options and argument can be
set using set_engine()
. If left to their defaults
here (NULL
), the values are taken from the underlying model
functions. If parameters need to be modified, update()
can be used
in lieu of recreating the object from scratch.
linear_reg(mode = "regression", penalty = NULL, mixture = NULL) # S3 method for linear_reg update( object, parameters = NULL, penalty = NULL, mixture = NULL, fresh = FALSE, ... )
mode  A single character string for the type of model. The only possible value for this model is "regression". 

penalty  A nonnegative number representing the total
amount of regularization ( 
mixture  A number between zero and one (inclusive) that is the
proportion of L1 regularization (i.e. lasso) in the model. When

object  A linear regression model specification. 
parameters  A 1row tibble or named list with main
parameters to update. If the individual arguments are used,
these will supersede the values in 
fresh  A logical for whether the arguments should be modified inplace of or replaced wholesale. 
...  Not used for 
The data given to the function are not saved and are only used
to determine the mode of the model. For linear_reg()
, the
mode will always be "regression".
The model can be created using the fit()
function using the
following engines:
R: "lm"
(the default) or "glmnet"
Stan: "stan"
Spark: "spark"
keras: "keras"
For models created using the spark engine, there are
several differences to consider. First, only the formula
interface to via fit()
is available; using fit_xy()
will
generate an error. Second, the predictions will always be in a
spark table format. The names will be the same as documented but
without the dots. Third, there is no equivalent to factor
columns in spark tables so class predictions are returned as
character columns. Fourth, to retain the model object for a new
R session (via save()
), the model$fit
element of the parsnip
object should be serialized via ml_save(object$fit)
and
separately saved to disk. In a new session, the object can be
reloaded and reattached to the parsnip
object.
Engines may have preset default arguments when executing the model fit call. For this type of model, the template of the fit calls are below.
linear_reg() %>% set_engine("lm") %>% set_mode("regression") %>% translate()
## Linear Regression Model Specification (regression) ## ## Computational engine: lm ## ## Model fit template: ## stats::lm(formula = missing_arg(), data = missing_arg(), weights = missing_arg())
linear_reg() %>% set_engine("glmnet") %>% set_mode("regression") %>% translate()
## Linear Regression Model Specification (regression) ## ## Computational engine: glmnet ## ## Model fit template: ## glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(), ## family = "gaussian")
For glmnet
models, the full regularization path is always fit
regardless of the value given to penalty
. Also, there is the option to
pass multiple values (or no values) to the penalty
argument. When
using the predict()
method in these cases, the return value depends on
the value of penalty
. When using predict()
, only a single value of
the penalty can be used. When predicting on multiple penalties, the
multi_predict()
function can be used. It returns a tibble with a list
column called .pred
that contains a tibble with all of the penalty
results.
linear_reg() %>% set_engine("stan") %>% set_mode("regression") %>% translate()
## Linear Regression Model Specification (regression) ## ## Computational engine: stan ## ## Model fit template: ## rstanarm::stan_glm(formula = missing_arg(), data = missing_arg(), ## weights = missing_arg(), family = stats::gaussian, refresh = 0)
Note that the refresh
default prevents logging of the estimation
process. Change this value in set_engine()
will show the logs.
For prediction, the stan
engine can compute posterior intervals
analogous to confidence and prediction intervals. In these instances,
the units are the original outcome and when std_error = TRUE
, the
standard deviation of the posterior distribution (or posterior
predictive distribution as appropriate) is returned.
linear_reg() %>% set_engine("spark") %>% set_mode("regression") %>% translate()
## Linear Regression Model Specification (regression) ## ## Computational engine: spark ## ## Model fit template: ## sparklyr::ml_linear_regression(x = missing_arg(), formula = missing_arg(), ## weight_col = missing_arg())
linear_reg() %>% set_engine("keras") %>% set_mode("regression") %>% translate()
## Linear Regression Model Specification (regression) ## ## Computational engine: keras ## ## Model fit template: ## parsnip::keras_mlp(x = missing_arg(), y = missing_arg(), hidden_units = 1, ## act = "linear")
The standardized parameter names in parsnip can be mapped to their original names in each engine that has main parameters. Each engine typically has a different default value (shown in parentheses) for each parameter.
parsnip  glmnet  spark  keras 
penalty  lambda  reg_param (0)  penalty (0) 
mixture  alpha (1)  elastic_net_param (0)  NA 
#> Linear Regression Model Specification (regression) #>#> Linear Regression Model Specification (regression) #> #> Main Arguments: #> penalty = varying() #>#> Linear Regression Model Specification (regression) #> #> Main Arguments: #> penalty = 10 #> mixture = 0.1 #>#> Linear Regression Model Specification (regression) #> #> Main Arguments: #> penalty = 1 #> mixture = 0.1 #>#> Linear Regression Model Specification (regression) #> #> Main Arguments: #> penalty = 1 #>