A model function (
gen_additive_mod()) was added for generalized additive models.
Each model now has a default engine that is used when the model is defined. The default for each model is listed in the help documents. This also adds functionality to declare an engine in the model specification function.
set_engine() is still required if engine-specific arguments need to be added. (#513)
parsnip now checks for a valid combination of engine and mode (#529)
The default engine for
multinom_reg() was changed to
The helper functions
.convert_xy_to_form_new() for converting between formula and matrix interface are now exported for developer use (#508).
New article “Fitting and Predicting with parsnip” which contains examples for various combinations of model type and engine. ( #527)
A new linear SVM model
svm_linear() is now available with the
LiblineaR engine (#424) and the
kernlab engine (#438), and the
LiblineaR engine is available for
logistic_reg() as well (#429). These models can use sparse matrices via
fit_xy() (#447) and have a
tidy method (#474).
For models with
penalty(either a single numeric value or a value of
path_valuescan be used to set the
lambdapath as a specific set of numbers (independent of the value of
penalty). A pure ridge regression models (i.e.,
mixture = 1) will generate incorrect values if the path does not include zero. See issue #431 for discussion (#486).
The xgboost engine for boosted trees was translating
mtry to xgboost’s
colsample_bytree. We now map
colsample_bynode since that is more consistent with how random forest works.
colsample_bytree can still be optimized by passing it in as an engine argument.
colsample_bynode was added to xgboost after the
parsnip package code was written. (#495)
colsample_bytree can be passed as integer counts or proportions, while
validation should always be proportions.
xgb_train() now has a new option
FALSE) that states which scale for
colsample_bytree is being used. (#461)
Re-licensed package from GPL-2 to MIT. See consent from copyright holders here.
Re-organized model documentation:
updatemethods were moved out of the model help files (#479).
generics::required_pkgs() was extended for
Prediction functions now give a consistent error when a user uses an unavailable value of
xgboost engines now respect the
event_level option for predictions (#460).
An RStudio add-in is available that makes writing multiple
parsnip model specifications to the source window. It can be accessed via the IDE addin menus or by calling
Changes to test for cases when CRAN cannot get
xgboost to work on their Solaris configuration.
There is now an
augument() method for fitted models. See
There is now an
event_level argument for the
xgboost engine. (#420)
New mode “censored regression” and new prediction types “linear_pred”, “time”, “survival”, “hazard”. (#396)
show_engines() will provide information on the current set for a model.
Some added protections were added for function arguments that are dependent on the data dimensions (e.g.,
min_n, etc). (#184)
Infrastructure was improved for running
parsnip models in parallel using PSOCK clusters on Windows.
parsnipnow has options to set specific types of predictor encodings for different models. For example,
rangermodels run using
workflowsdo the same thing by not creating indicator variables. These encodings can be overridden using the
workflows. As a consequence, it is possible to get a different model fit that previous versions of
parsnip. More details about specific encoding changes are below. (#326)
tidyr >= 1.0.0 is now required.
SVM models produced by
kernlab now use the formula method (see breaking change notice above). This change was due to how
ksvm() made indicator variables for factor predictors (with one-hot encodings). Since the ordinary formula method did not do this, the data are passed as-is to
ksvm() so that the results are closer to what one would get if
ksmv() were called directly.
MARS models produced by
earth now use the formula method.
xgboost, a one-hot encoding is used when indicator variables are created.
Under-the-hood changes were made so that non-standard data arguments in the modeling packages can be accommodated. (#315)
A new main argument was added to
stop_iter for early stopping. The
xgb_train() function gained arguments for early stopping and a percentage of data to leave out for a validation set.
fit() is used and the underlying model uses a formula, the actual formula is pass to the model (instead of a placeholder). This makes the model call better.
A function named
repair_call() was added. This can help change the underlying models
call object to better reflect what they would have obtained if the model function had been used directly (instead of via
parsnip). This is only useful when the user chooses a formula interface and the model uses a formula interface. It will also be of limited use when a recipes is used to construct the feature set in
tidy()was broken on R 4.0.
glmnet was removed as a dependency since the new version depends on 3.6.0 or greater. Keeping it would constrain
parsnip to that same requirement. All
glmnet tests are run locally.
A set of internal functions are now exported. These are helpful when creating a new package that registers new model specifications.
parsnipand the underlying model function) for
sparkboosted trees and some
kerasmodels. See 897c927.
The time elapsed during model fitting is stored in the
$elapsed slot of the parsnip model object, and is printed when the model object is printed.
Some default parameter ranges were updated for SVM, KNN, and MARS models.
udpate() methods gained a
parameters argument for cases when the parameters are contained in a tibble or list.
A bug was fixed standardizing the output column types of
A bug was fixed related to the column names generated by
multi_predict(). The top-level tibble will always have a column named
.pred and this list column contains tibbles across sub-models. The column names for these sub-model tibbles will have names consistent with
predict() (which was previously incorrect). See 43c15db.
A bug was fixed standardizing the column names of
nnet class probability predictions.
Unplanned release based on CRAN requirements for Solaris.
The method that
parsnip stores the model information has changed. Any custom models from previous versions will need to use the new method for registering models. The methods are detailed in
?get_model_env and the package vignette for adding models.
The mode needs to be declared for models that can be used for more than one mode prior to fitting and/or translation.
surv_reg(), the engine that uses the
survival package is now called
survival instead of
glmnet models, the full regularization path is always fit regardless of the value given to
penalty. Previously, the model was fit with passing
lambda argument and the model could only make predictions at those specific values. (#195)
add_rowindex() can create a column called
.row to a data frame.
If a computational engine is not explicitly set, a default will be used. Each default is documented on the corresponding model page. A warning is issued at fit time unless verbosity is zero.
A suite of internal functions were added to help with upcoming model tuning features.
parsnip object always saved the name(s) of the outcome variable(s) for proper naming of the predicted values.
Small release driven by changes in
sample() in the current r-devel.
A “null model” is now available that fits a predictor-free model (using the mean of the outcome for regression or the mode for classification).
fit_xy() can take a single column data frame or matrix for
y without error
varying_args() now has a
full argument to control whether the full set of possible varying arguments is returned (as opposed to only the arguments that are actually varying).
fit_control() not returns an S3 method.
For classification models, an error occurs if the outcome data are not encoded as factors (#115).
The prediction modules (e.g.
predict_numeric, etc) were de-exported. These were internal functions that were not to be used by the users and the users were using them.
An event time data set (
check_times) was included that is the time (in seconds) to run
R CMD check using the "r-devel-windows-ix86+x86_64` flavor. Packages that errored are censored.
varying_args() now uses the version from the
generics package. This means that the first argument,
x, has been renamed to
object to align with generics.
find_varying(), the internal function for detecting varying arguments, now returns correct results when a size 0 argument is provided. It can also now detect varying arguments nested deeply into a call (#131, #134).
For multinomial regression, the
.pred_ prefix is now only added to prediction column names once (#107).
Confidence and prediction intervals for logistic regression were only computed the intervals for a single level. Both are now computed. (#156)
set_engine(). There is no
othershas been replaced by
regularizationwas changed to
penaltyin a few models to be consistent with this change.
earthpackage will need to be attached to be fully operational.
newdatawas changed to
predict_rawmethod was added.