show_best() displays the top sub-models and their performance estimates.

show_best(x, metric, n = 5, ...)

select_best(x, metric, ...)

select_by_pct_loss(x, ..., metric, limit = 2)

select_by_one_std_err(x, ..., metric)

Arguments

x

The results of tune_grid() or tune_bayes().

metric

A character value for the metric that will be used to sort the models. (See https://tidymodels.github.io/yardstick/articles/metric-types.html for more details). Not required if a single metric exists in x.

n

An integer for the number of top results/rows to return.

...

For select_by_one_std_err() and select_by_pct_loss(), this argument is passed directly to dplyr::arrange() so that the user can sort the models from most simple to most complex. See the examples below. At least one term is required for these two functions.

limit

The limit of loss of performance that is acceptable (in percent units). See details below.

Value

A tibble with columns for the parameters. show_best() also includes columns for performance metrics.

Details

select_best() finds the tuning parameter combination with the best performance values.

select_by_one_std_err() uses the "one-standard error rule" (Breiman _el at, 1984) that selects the most simple model that is within one standard error of the numerically optimal results.

select_by_pct_loss() selects the most simple model whose loss of performance is within some acceptable limit.

For percent loss, suppose the best model has an RMSE of 0.75 and a simpler model has an RMSE of 1. The percent loss would be (1.00 - 0.75)/1.00 * 100, or 25 percent. Note that loss will always be non-negative.

References

Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and Regression Trees. Monterey, CA: Wadsworth.

Examples

# \donttest{ data("example_ames_knn") show_best(ames_iter_search, metric = "rmse")
#> # A tibble: 5 x 11 #> K weight_func dist_power lon lat .iter .metric .estimator mean n #> <int> <chr> <dbl> <int> <int> <dbl> <chr> <chr> <dbl> <int> #> 1 33 triweight 0.325 10 3 0 rmse standard 0.0733 10 #> 2 21 cos 0.415 1 4 0 rmse standard 0.0744 10 #> 3 5 rank 0.245 2 7 0 rmse standard 0.0747 10 #> 4 12 epanechnik… 1.13 4 7 0 rmse standard 0.0753 10 #> 5 9 optimal 1.00 2 8 8 rmse standard 0.0755 10 #> # … with 1 more variable: std_err <dbl>
select_best(ames_iter_search, metric = "rsq")
#> # A tibble: 1 x 5 #> K weight_func dist_power lon lat #> <int> <chr> <dbl> <int> <int> #> 1 33 triweight 0.325 10 3
# To find the least complex model within one std error of the numerically # optimal model, the number of nearest neighbors are sorted from the largest # number of neighbors (the least complex class boundary) to the smallest # (corresponding to the most complex model). select_by_one_std_err(ames_grid_search, metric = "rmse", desc(K))
#> # A tibble: 1 x 12 #> K weight_func dist_power lon lat .metric .estimator mean n #> <int> <chr> <dbl> <int> <int> <chr> <chr> <dbl> <int> #> 1 33 triweight 0.325 10 3 rmse standard 0.0733 10 #> # … with 3 more variables: std_err <dbl>, .best <dbl>, .bound <dbl>
# Now find the least complex model that has no more than a 5% loss of RMSE: select_by_pct_loss(ames_grid_search, metric = "rmse", limit = 5, desc(K))
#> # A tibble: 1 x 12 #> K weight_func dist_power lon lat .metric .estimator mean n #> <int> <chr> <dbl> <int> <int> <chr> <chr> <dbl> <int> #> 1 33 triweight 0.325 10 3 rmse standard 0.0733 10 #> # … with 3 more variables: std_err <dbl>, .best <dbl>, .loss <dbl>
# }