--- title: "MetricsWeighted" date: "`r Sys.Date()`" bibliography: "biblio.bib" link-citations: true output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{MetricsWeighted} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE, fig.width = 5.5, fig.height = 4.5 ) ``` ## Overview {MetricsWeighted} provides weighted versions of different machine learning metrics and performance measures. They all take at least four arguments: 1. `actual`: Actual observed values. 2. `predicted`: Predicted values. 3. `w`: Optional vector with case weights. 4. `...`: Further arguments. ## Installation ```r # From CRAN install.packages("MetricsWeighted") # Development version devtools::install_github("mayer79/MetricsWeighted") ``` ## Usage ### Regression ```{r} library(MetricsWeighted) # The data y_num <- iris[["Sepal.Length"]] fit_num <- lm(Sepal.Length ~ ., data = iris) pred_num <- fit_num$fitted weights <- seq_len(nrow(iris)) # Performance metrics rmse(y_num, pred_num) rmse(y_num, pred_num, w = rep(1, length(y_num))) # same rmse(y_num, pred_num, w = weights) # different mae(y_num, pred_num) medae(y_num, pred_num, w = weights) # MSE = mean normal deviance = mean Tweedie deviance with p = 0 mse(y_num, pred_num) deviance_normal(y_num, pred_num) deviance_tweedie(y_num, pred_num, tweedie_p = 0) # Mean Poisson deviance equals mean Tweedie deviance with parameter 1 deviance_poisson(y_num, pred_num) deviance_tweedie(y_num, pred_num, tweedie_p = 1) # Mean Gamma deviance equals mean Tweedie deviance with parameter 2 deviance_gamma(y_num, pred_num) deviance_tweedie(y_num, pred_num, tweedie_p = 2) ``` ### Binary classification ```{r} # The data y_cat <- iris[["Species"]] == "setosa" fit_cat <- glm(y_cat ~ Sepal.Length, data = iris, family = binomial()) pred_cat <- predict(fit_cat, type = "response") # Performance metrics AUC(y_cat, pred_cat) # unweighted AUC(y_cat, pred_cat, w = weights) # weighted logLoss(y_cat, pred_cat) # Log loss = binary cross-entropy deviance_bernoulli(y_cat, pred_cat) # Log Loss * 2 ``` ### Generalized R-squared Furthermore, we provide a generalization of R-squared, defined as the proportion of deviance explained, i.e., one minus the ratio of residual deviance and intercept-only deviance, see [@cohen]. For out-of-sample calculations, the null deviance is ideally calculated from the average in the training data. This can be controlled by setting `reference_mean` to the (possibly weighted) average in the training data. ```{r} summary(fit_num)$r.squared # Same r_squared(y_num, pred_num) r_squared(y_num, pred_num, deviance_function = deviance_tweedie, tweedie_p = 0) ``` ### Pipe In order to facilitate the use of these metrics with the pipe, use the function `performance()`: Starting from a data set with actual and predicted values (and optional case weights), it calculates one or more metrics. The resulting values are returned as a `data.frame`. ```r library(dplyr) fit_num <- lm(Sepal.Length ~ ., data = iris) # Regression with `Sepal.Length` as response iris %>% mutate(pred = predict(fit_num, data = .)) %>% performance("Sepal.Length", "pred") > metric value > rmse 0.300627 # Multiple measures iris %>% mutate(pred = predict(fit_num, data = .)) %>% performance( "Sepal.Length", "pred", metrics = list(rmse = rmse, mae = mae, `R-squared` = r_squared) ) > metric value > rmse 0.3006270 > mae 0.2428628 > R-squared 0.8673123 ``` ### Parametrized scoring functions Some scoring functions depend on a further parameter $p$: - `tweedie_deviance()`: depends on `tweedie_p`. - `elementary_score_expectile()`, `elementary_score_quantile()`: depend on `theta`. - `prop_within()`: Depends on `tol`. It might be of key relevance to evaluate such function for varying $p$. That is where the function `multi_metric()` shines. ```{r} ir <- iris ir$pred <- predict(fit_num, data = ir) # Create multiple Tweedie deviance functions multi_Tweedie <- multi_metric(deviance_tweedie, tweedie_p = c(0, seq(1, 3, by = 0.2))) perf <- performance( ir, actual = "Sepal.Length", predicted = "pred", metrics = multi_Tweedie, key = "Tweedie_p", value = "deviance" ) head(perf) # Deviance against p plot(deviance ~ as.numeric(as.character(Tweedie_p)), data = perf, type = "s") ``` ### Murphy diagrams The same logic as in the last example can be used to create so-called *Murphy diagrams* [@gneiting]. The function `murphy_diagram()` wraps above calls and allows to get elementary scores for one or multiple models across a range of theta values, see also R package [murphydiagram](https://CRAN.R-project.org/package=murphydiagram). ```{r} y <- 1:10 two_models <- cbind(m1 = 1.1 * y, m2 = 1.2 * y) murphy_diagram(y, two_models, theta = seq(0.9, 1.3, by = 0.01)) ``` ## References