Title: | Effect Plots |
---|---|
Description: | High-performance implementation of various effect plots useful for regression and probabilistic classification tasks. The package includes partial dependence plots (Friedman, 2021, <doi:10.1214/aos/1013203451>), accumulated local effect plots and M-plots (both from Apley and Zhu, 2016, <doi:10.1111/rssb.12377>), as well as plots that describe the statistical associations between model response and features. It supports visualizations with either 'ggplot2' or 'plotly', and is compatible with most models, including 'Tidymodels', models wrapped in 'DALEX' explainers, or models with case weights. |
Authors: | Michael Mayer [aut, cre] |
Maintainer: | Michael Mayer <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.2.1 |
Built: | 2025-01-11 22:26:30 UTC |
Source: | https://github.com/mayer79/effectplots |
This is a barebone implementation of Apley's ALE.
Per bin, the local effect is calculated, and then accumulated over bins.
equals the difference between the partial dependence at the
lower and upper bin breaks using only observations within bin.
To plot the values, we can make a line plot of the resulting vector against
upper bin breaks. Alternatively, the vector can be extended
from the left by the value 0, and then plotted against all breaks.
.ale( object, v, data, breaks, right = TRUE, pred_fun = stats::predict, trafo = NULL, which_pred = NULL, bin_size = 200L, w = NULL, g = NULL, ... )
.ale( object, v, data, breaks, right = TRUE, pred_fun = stats::predict, trafo = NULL, which_pred = NULL, bin_size = 200L, w = NULL, g = NULL, ... )
object |
Fitted model. |
v |
Variable name in |
data |
Matrix or data.frame. |
breaks |
Bin breaks. |
right |
Should bins be right-closed?
The default is |
pred_fun |
Prediction function, by default |
trafo |
How should predictions be transformed?
A function or |
which_pred |
If the predictions are multivariate: which column to pick
(integer or column name). By default |
bin_size |
Maximal number of observations used per bin. If there are more
observations in a bin, |
w |
Optional vector with case weights. |
g |
For internal use. The result of |
... |
Further arguments passed to |
Vector representing one ALE per bin.
Apley, Daniel W., and Jingyu Zhu. 2020. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82 (4): 1059–1086. doi:10.1111/rssb.12377.
fit <- lm(Sepal.Length ~ ., data = iris) v <- "Sepal.Width" .ale(fit, v, data = iris, breaks = seq(2, 4, length.out = 5))
fit <- lm(Sepal.Length ~ ., data = iris) v <- "Sepal.Width" .ale(fit, v, data = iris, breaks = seq(2, 4, length.out = 5))
This is a barebone implementation of Friedman's partial dependence
intended for developers. To get more information on partial dependence, see
partial_dependence()
.
.pd( object, v, data, grid, pred_fun = stats::predict, trafo = NULL, which_pred = NULL, w = NULL, ... )
.pd( object, v, data, grid, pred_fun = stats::predict, trafo = NULL, which_pred = NULL, w = NULL, ... )
object |
Fitted model. |
v |
Variable name in |
data |
Matrix or data.frame. |
grid |
Vector or factor of values to calculate partial dependence for. |
pred_fun |
Prediction function, by default |
trafo |
How should predictions be transformed?
A function or |
which_pred |
If the predictions are multivariate: which column to pick
(integer or column name). By default |
w |
Optional vector with case weights. |
... |
Further arguments passed to |
Vector of partial dependence values in the same order as grid
.
Friedman, Jerome H. 2001, Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics 29 (5): 1189-1232. doi:10.1214/aos/1013203451.
fit <- lm(Sepal.Length ~ ., data = iris) .pd(fit, "Sepal.Width", data = iris, grid = hist(iris$Sepal.Width)$mids) .pd(fit, "Species", data = iris, grid = levels(iris$Species))
fit <- lm(Sepal.Length ~ ., data = iris) .pd(fit, "Sepal.Width", data = iris, grid = hist(iris$Sepal.Width)$mids) .pd(fit, "Species", data = iris, grid = levels(iris$Species))
Calculates ALE for one or multiple continuous features specified by X
.
The concept of ALE was introduced in Apley et al. (2020) as an alternative to partial dependence (PD). The Ceteris Paribus clause behind PD is a blessing and a curse at the same time:
Blessing: The interpretation is easy and similar to what we know from linear regression (just averaging out interaction effects).
Curse: The model is applied to very unlikely or even impossible feature combinations, especially with strongly dependent features.
ALE fixes the curse as follows: Per bin, the local effect is calculated as the partial dependence difference between lower and upper bin break, using only observations falling into this bin. This is repeated for all bins, and the values are accumulated.
ALE values are plotted against right bin breaks.
ale(object, ...) ## Default S3 method: ale( object, v, data, pred_fun = stats::predict, trafo = NULL, which_pred = NULL, w = NULL, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, ale_n = 50000L, ale_bin_size = 200L, seed = NULL, ... ) ## S3 method for class 'ranger' ale( object, v, data, pred_fun = NULL, trafo = NULL, which_pred = NULL, w = NULL, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, ale_n = 50000L, ale_bin_size = 200L, seed = NULL, ... ) ## S3 method for class 'explainer' ale( object, v = colnames(data), data = object$data, pred_fun = object$predict_function, trafo = NULL, which_pred = NULL, w = object$weights, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, ale_n = 50000L, ale_bin_size = 200L, seed = NULL, ... ) ## S3 method for class 'H2OModel' ale( object, data, v = object@parameters$x, pred_fun = NULL, trafo = NULL, which_pred = NULL, w = object@parameters$weights_column$column_name, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, ale_n = 50000L, ale_bin_size = 200L, seed = NULL, ... )
ale(object, ...) ## Default S3 method: ale( object, v, data, pred_fun = stats::predict, trafo = NULL, which_pred = NULL, w = NULL, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, ale_n = 50000L, ale_bin_size = 200L, seed = NULL, ... ) ## S3 method for class 'ranger' ale( object, v, data, pred_fun = NULL, trafo = NULL, which_pred = NULL, w = NULL, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, ale_n = 50000L, ale_bin_size = 200L, seed = NULL, ... ) ## S3 method for class 'explainer' ale( object, v = colnames(data), data = object$data, pred_fun = object$predict_function, trafo = NULL, which_pred = NULL, w = object$weights, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, ale_n = 50000L, ale_bin_size = 200L, seed = NULL, ... ) ## S3 method for class 'H2OModel' ale( object, data, v = object@parameters$x, pred_fun = NULL, trafo = NULL, which_pred = NULL, w = object@parameters$weights_column$column_name, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, ale_n = 50000L, ale_bin_size = 200L, seed = NULL, ... )
object |
Fitted model. |
... |
Further arguments passed to |
v |
Variable names to calculate statistics for. |
data |
Matrix or data.frame. |
pred_fun |
Prediction function, by default |
trafo |
How should predictions be transformed?
A function or |
which_pred |
If the predictions are multivariate: which column to pick
(integer or column name). By default |
w |
Optional vector with case weights. Can also be a column name in |
breaks |
An integer, vector, or "Sturges" (the default) used to determine
bin breaks of continuous features. Values outside the total bin range are placed
in the outmost bins. To allow varying values of |
right |
Should bins be right-closed? The default is |
discrete_m |
Numeric features with up to this number of unique values are treated as discrete and are therefore dropped from the calculations. |
outlier_iqr |
If |
ale_n |
Size of the data used for calculating ALE.
The default is 50000. For larger |
ale_bin_size |
Maximal number of observations used per bin for ALE calculations.
If there are more observations in a bin, |
seed |
Optional integer random seed used for:
|
The function is a convenience wrapper around feature_effects()
, which calls
the barebone implementation .ale()
to calculate ALE.
A list (of class "EffectData") with a data.frame per feature having columns:
bin_mid
: Bin mid points. In the plots, the bars are centered around these.
bin_width
: Absolute width of the bin. In the plots, these equal the bar widths.
bin_mean
: For continuous features, the (possibly weighted) average feature
value within bin. For discrete features equivalent to bin_mid
.
N
: The number of observations within bin.
weight
: The weight sum within bin. When w = NULL
, equivalent to N
.
Different statistics, depending on the function call.
Use single bracket subsetting to select part of the output. Note that each data.frame contains an attribute "discrete" with the information whether the feature is discrete or continuous. This attribute might be lost when you manually modify the data.frames.
ale(default)
: Default method.
ale(ranger)
: Method for ranger models.
ale(explainer)
: Method for DALEX explainers
ale(H2OModel)
: Method for H2O models
Apley, Daniel W., and Jingyu Zhu. 2020. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82 (4): 1059–1086. doi:10.1111/rssb.12377.
fit <- lm(Sepal.Length ~ ., data = iris) M <- ale(fit, v = "Petal.Length", data = iris) M |> plot() M2 <- ale(fit, v = colnames(iris)[-1], data = iris, breaks = 5) plot(M2, share_y = "all") # Only continuous variables shown
fit <- lm(Sepal.Length ~ ., data = iris) M <- ale(fit, v = "Petal.Length", data = iris) M |> plot() M2 <- ale(fit, v = colnames(iris)[-1], data = iris, breaks = 5) plot(M2, share_y = "all") # Only continuous variables shown
Calculates average observed response over the values of one or multiple
variables specified by X
. This describes the statistical association between the
response y
and potential model features.
average_observed( X, y, w = NULL, x_name = "x", breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, seed = NULL, ... )
average_observed( X, y, w = NULL, x_name = "x", breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, seed = NULL, ... )
X |
A vector, matrix, or data.frame with features. |
y |
A numeric vector representing observed response values. |
w |
An optional numeric vector of weights. Having observations with non-positive weight is equivalent to excluding them. |
x_name |
If |
breaks |
An integer, vector, or "Sturges" (the default) used to determine
bin breaks of continuous features. Values outside the total bin range are placed
in the outmost bins. To allow varying values of |
right |
Should bins be right-closed? The default is |
discrete_m |
Numeric features with up to this number of unique values should not
be binned but rather treated as discrete. The default is 13. Vectorized over |
outlier_iqr |
If |
seed |
Optional integer random seed used for calculating breaks: The bin range is determined without values outside quartiles +- 2 IQR using a sample of <= 9997 observations to calculate quartiles. |
... |
Currently unused. |
The function is a convenience wrapper around feature_effects()
.
A list (of class "EffectData") with a data.frame per feature having columns:
bin_mid
: Bin mid points. In the plots, the bars are centered around these.
bin_width
: Absolute width of the bin. In the plots, these equal the bar widths.
bin_mean
: For continuous features, the (possibly weighted) average feature
value within bin. For discrete features equivalent to bin_mid
.
N
: The number of observations within bin.
weight
: The weight sum within bin. When w = NULL
, equivalent to N
.
Different statistics, depending on the function call.
Use single bracket subsetting to select part of the output. Note that each data.frame contains an attribute "discrete" with the information whether the feature is discrete or continuous. This attribute might be lost when you manually modify the data.frames.
M <- average_observed(iris$Species, y = iris$Sepal.Length) M M |> plot() # Or multiple potential features X average_observed(iris[2:5], y = iris[, 1], breaks = 5) |> plot()
M <- average_observed(iris$Species, y = iris$Sepal.Length) M M |> plot() # Or multiple potential features X average_observed(iris[2:5], y = iris[, 1], breaks = 5) |> plot()
Calculates average predictions over the values of one or multiple features specified
by X
. Shows the combined effect of a feature and other (correlated) features.
average_predicted( X, pred, w = NULL, x_name = "x", breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, seed = NULL, ... )
average_predicted( X, pred, w = NULL, x_name = "x", breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, seed = NULL, ... )
X |
A vector, matrix, or data.frame with features. |
pred |
A numeric vector of predictions. |
w |
An optional numeric vector of weights. Having observations with non-positive weight is equivalent to excluding them. |
x_name |
If |
breaks |
An integer, vector, or "Sturges" (the default) used to determine
bin breaks of continuous features. Values outside the total bin range are placed
in the outmost bins. To allow varying values of |
right |
Should bins be right-closed? The default is |
discrete_m |
Numeric features with up to this number of unique values should not
be binned but rather treated as discrete. The default is 13. Vectorized over |
outlier_iqr |
If |
seed |
Optional integer random seed used for calculating breaks: The bin range is determined without values outside quartiles +- 2 IQR using a sample of <= 9997 observations to calculate quartiles. |
... |
Currently unused. |
The function is a convenience wrapper around feature_effects()
.
A list (of class "EffectData") with a data.frame per feature having columns:
bin_mid
: Bin mid points. In the plots, the bars are centered around these.
bin_width
: Absolute width of the bin. In the plots, these equal the bar widths.
bin_mean
: For continuous features, the (possibly weighted) average feature
value within bin. For discrete features equivalent to bin_mid
.
N
: The number of observations within bin.
weight
: The weight sum within bin. When w = NULL
, equivalent to N
.
Different statistics, depending on the function call.
Use single bracket subsetting to select part of the output. Note that each data.frame contains an attribute "discrete" with the information whether the feature is discrete or continuous. This attribute might be lost when you manually modify the data.frames.
Apley, Daniel W., and Jingyu Zhu. 2016. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82 (4): 1059–1086. doi:10.1111/rssb.12377.
fit <- lm(Sepal.Length ~ ., data = iris) M <- average_predicted(iris[2:5], pred = predict(fit, iris), breaks = 5) M M |> plot()
fit <- lm(Sepal.Length ~ ., data = iris) M <- average_predicted(iris[2:5], pred = predict(fit, iris), breaks = 5) M M |> plot()
Calculates average residuals (= bias) over the values of one or multiple
features specified by X
.
bias( X, resid, w = NULL, x_name = "x", breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, seed = NULL, ... )
bias( X, resid, w = NULL, x_name = "x", breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, seed = NULL, ... )
X |
A vector, matrix, or data.frame with features. |
resid |
A numeric vector of residuals, i.e., y - pred. |
w |
An optional numeric vector of weights. Having observations with non-positive weight is equivalent to excluding them. |
x_name |
If |
breaks |
An integer, vector, or "Sturges" (the default) used to determine
bin breaks of continuous features. Values outside the total bin range are placed
in the outmost bins. To allow varying values of |
right |
Should bins be right-closed? The default is |
discrete_m |
Numeric features with up to this number of unique values should not
be binned but rather treated as discrete. The default is 13. Vectorized over |
outlier_iqr |
If |
seed |
Optional integer random seed used for calculating breaks: The bin range is determined without values outside quartiles +- 2 IQR using a sample of <= 9997 observations to calculate quartiles. |
... |
Currently unused. |
The function is a convenience wrapper around feature_effects()
.
A list (of class "EffectData") with a data.frame per feature having columns:
bin_mid
: Bin mid points. In the plots, the bars are centered around these.
bin_width
: Absolute width of the bin. In the plots, these equal the bar widths.
bin_mean
: For continuous features, the (possibly weighted) average feature
value within bin. For discrete features equivalent to bin_mid
.
N
: The number of observations within bin.
weight
: The weight sum within bin. When w = NULL
, equivalent to N
.
Different statistics, depending on the function call.
Use single bracket subsetting to select part of the output. Note that each data.frame contains an attribute "discrete" with the information whether the feature is discrete or continuous. This attribute might be lost when you manually modify the data.frames.
fit <- lm(Sepal.Length ~ ., data = iris) M <- bias(iris[2:5], resid = fit$residuals, breaks = 5) M |> update(sort_by = "resid_mean") |> plot(share_y = "all")
fit <- lm(Sepal.Length ~ ., data = iris) M <- bias(iris[2:5], resid = fit$residuals, breaks = 5) M |> update(sort_by = "resid_mean") |> plot(share_y = "all")
Extracts from an "EffectData" object a simple variable importance measure, namely
the (bin size weighted) variance of the partial dependence values, or of any other
calculated statistic (e.g., "pred_mean" or "y_mean"). It can be used via
update.EffectData(, sort_by = "pd")
to sort the variables in decreasing importance.
Note that this measure captures only the main effect strength.
If the importance is calculated with respect to "pd", it is closely related
to the suggestion of Greenwell et al. (2018).
effect_importance(x, by = NULL)
effect_importance(x, by = NULL)
x |
Object of class "EffectData". |
by |
The statistic used to calculate the variance for.
One of 'pd', 'pred_mean', 'y_mean', 'resid_mean', or 'ale' (if available).
The default is |
A named vector of importance values of the same length as x
.
Greenwell, Brandon M., Bradley C. Boehmke, and Andrew J. McCarthy. 2018. A Simple and Effective Model-Based Variable Importance Measure. arXiv preprint. https://arxiv.org/abs/1805.04755.
fit <- lm(Sepal.Length ~ ., data = iris) M <- feature_effects(fit, v = colnames(iris)[-1], data = iris) effect_importance(M)
fit <- lm(Sepal.Length ~ ., data = iris) M <- feature_effects(fit, v = colnames(iris)[-1], data = iris) effect_importance(M)
Bins a numeric vector x
into bins specified by breaks
.
Values outside the range of breaks
will be placed in the lowest or highest bin.
Set labels = FALSE
to return integer codes only, and explicit_na = TRUE
for
maximal synergy with the "collapse" package.
Uses the logic of spatstat.utils::fastFindInterval()
for equi-length bins.
fcut(x, breaks, labels = NULL, right = TRUE, explicit_na = FALSE)
fcut(x, breaks, labels = NULL, right = TRUE, explicit_na = FALSE)
x |
A numeric vector. |
breaks |
A monotonically increasing numeric vector of breaks. |
labels |
A character vector of length |
right |
Right closed bins ( |
explicit_na |
If |
Binned version of x
. Either a factor, or integer codes.
x <- c(NA, 1:10) fcut(x, breaks = c(3, 5, 7)) fcut(x, breaks = c(3, 5, 7), right = FALSE) fcut(x, breaks = c(3, 5, 7), labels = FALSE)
x <- c(NA, 1:10) fcut(x, breaks = c(3, 5, 7)) fcut(x, breaks = c(3, 5, 7), right = FALSE) fcut(x, breaks = c(3, 5, 7), labels = FALSE)
This is the main function of the package. By default, it calculates the following statistics per feature X over values/bins:
"y_mean": Average observed y
values. Used to assess descriptive associations
between response and features.
"pred_mean": Average predictions. Corresponds to "M Plots" (from "marginal") in Apley (2020). Shows the combined effect of X and other (correlated) features. The difference to average observed y values shows model bias.
"resid_mean": Average residuals. Calculated when
both y
and predictions are available. Useful to study model bias.
"pd": Partial dependence (Friedman, 2001): See partial_dependence()
.
Evaluated at bin averages, not at bin midpoints.
"ale": Accumulated local effects (Apley, 2020): See ale()
.
Only for continuous features.
Additionally, corresponding counts/weights are calculated, and standard deviations of observed y and residuals.
Numeric features with more than discrete_m = 13
disjoint values are binned via
breaks
. If breaks
is a single integer or "Sturges", the total bin range is
calculated without values outside +-2 IQR from the quartiles.
Values outside the bin range are placed in the outermost bins. Note that
at most 9997 observations are used to calculate quartiles and IQR.
All averages and standard deviation are weighted by optional weights w
.
If you need only one specific statistic, you can use the simplified APIs of
feature_effects(object, ...) ## Default S3 method: feature_effects( object, v, data, y = NULL, pred = NULL, pred_fun = stats::predict, trafo = NULL, which_pred = NULL, w = NULL, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, calc_pred = TRUE, pd_n = 500L, ale_n = 50000L, ale_bin_size = 200L, seed = NULL, ... ) ## S3 method for class 'ranger' feature_effects( object, v, data, y = NULL, pred = NULL, pred_fun = NULL, trafo = NULL, which_pred = NULL, w = NULL, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, calc_pred = TRUE, pd_n = 500L, ale_n = 50000L, ale_bin_size = 200L, ... ) ## S3 method for class 'explainer' feature_effects( object, v = colnames(data), data = object$data, y = object$y, pred = NULL, pred_fun = object$predict_function, trafo = NULL, which_pred = NULL, w = object$weights, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, calc_pred = TRUE, pd_n = 500L, ale_n = 50000L, ale_bin_size = 200L, ... ) ## S3 method for class 'H2OModel' feature_effects( object, data, v = object@parameters$x, y = NULL, pred = NULL, pred_fun = NULL, trafo = NULL, which_pred = NULL, w = object@parameters$weights_column$column_name, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, calc_pred = TRUE, pd_n = 500L, ale_n = 50000L, ale_bin_size = 200L, ... )
feature_effects(object, ...) ## Default S3 method: feature_effects( object, v, data, y = NULL, pred = NULL, pred_fun = stats::predict, trafo = NULL, which_pred = NULL, w = NULL, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, calc_pred = TRUE, pd_n = 500L, ale_n = 50000L, ale_bin_size = 200L, seed = NULL, ... ) ## S3 method for class 'ranger' feature_effects( object, v, data, y = NULL, pred = NULL, pred_fun = NULL, trafo = NULL, which_pred = NULL, w = NULL, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, calc_pred = TRUE, pd_n = 500L, ale_n = 50000L, ale_bin_size = 200L, ... ) ## S3 method for class 'explainer' feature_effects( object, v = colnames(data), data = object$data, y = object$y, pred = NULL, pred_fun = object$predict_function, trafo = NULL, which_pred = NULL, w = object$weights, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, calc_pred = TRUE, pd_n = 500L, ale_n = 50000L, ale_bin_size = 200L, ... ) ## S3 method for class 'H2OModel' feature_effects( object, data, v = object@parameters$x, y = NULL, pred = NULL, pred_fun = NULL, trafo = NULL, which_pred = NULL, w = object@parameters$weights_column$column_name, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, calc_pred = TRUE, pd_n = 500L, ale_n = 50000L, ale_bin_size = 200L, ... )
object |
Fitted model. |
... |
Further arguments passed to |
v |
Variable names to calculate statistics for. |
data |
Matrix or data.frame. |
y |
Numeric vector with observed values of the response.
Can also be a column name in |
pred |
Pre-computed predictions (as from |
pred_fun |
Prediction function, by default |
trafo |
How should predictions be transformed?
A function or |
which_pred |
If the predictions are multivariate: which column to pick
(integer or column name). By default |
w |
Optional vector with case weights. Can also be a column name in |
breaks |
An integer, vector, or "Sturges" (the default) used to determine
bin breaks of continuous features. Values outside the total bin range are placed
in the outmost bins. To allow varying values of |
right |
Should bins be right-closed? The default is |
discrete_m |
Numeric features with up to this number of unique values should not
be binned but rather treated as discrete. The default is 13. Vectorized over |
outlier_iqr |
If |
calc_pred |
Should predictions be calculated? Default is |
pd_n |
Size of the data used for calculating partial dependence.
The default is 500. For larger |
ale_n |
Size of the data used for calculating ALE.
The default is 50000. For larger |
ale_bin_size |
Maximal number of observations used per bin for ALE calculations.
If there are more observations in a bin, |
seed |
Optional integer random seed used for:
|
A list (of class "EffectData") with a data.frame per feature having columns:
bin_mid
: Bin mid points. In the plots, the bars are centered around these.
bin_width
: Absolute width of the bin. In the plots, these equal the bar widths.
bin_mean
: For continuous features, the (possibly weighted) average feature
value within bin. For discrete features equivalent to bin_mid
.
N
: The number of observations within bin.
weight
: The weight sum within bin. When w = NULL
, equivalent to N
.
Different statistics, depending on the function call.
Use single bracket subsetting to select part of the output. Note that each data.frame contains an attribute "discrete" with the information whether the feature is discrete or continuous. This attribute might be lost when you manually modify the data.frames.
feature_effects(default)
: Default method.
feature_effects(ranger)
: Method for ranger models.
feature_effects(explainer)
: Method for DALEX explainer.
feature_effects(H2OModel)
: Method for H2O models.
Molnar, Christoph. 2019. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. https://christophm.github.io/interpretable-ml-book/.
Friedman, Jerome H. 2001, Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics 29 (5): 1189-1232. doi:10.1214/aos/1013203451.3.
Apley, Daniel W., and Jingyu Zhu. 2016. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82 (4): 1059–1086. doi:10.1111/rssb.12377.
plot.EffectData()
, update.EffectData()
, partial_dependence()
,
ale()
, average_observed, average_predicted()
, bias()
fit <- lm(Sepal.Length ~ ., data = iris) xvars <- colnames(iris)[2:5] M <- feature_effects(fit, v = xvars, data = iris, y = "Sepal.Length", breaks = 5) M M |> update(sort = "pd") |> plot(share_y = "all")
fit <- lm(Sepal.Length ~ ., data = iris) xvars <- colnames(iris)[2:5] M <- feature_effects(fit, v = xvars, data = iris, y = "Sepal.Length", breaks = 5) M M |> update(sort = "pd") |> plot(share_y = "all")
Calculates PD for one or multiple features.
PD was introduced by Friedman (2001) to study the (main) effects
of a ML model. PD of a model f and variable X
at a certain value g
is derived by replacing the X
values in a reference data
by g,
and then calculating the average prediction of f over this modified data.
This is done for different g to see how the average prediction of f changes in X
,
keeping all other feature values constant (Ceteris Paribus).
This function is a convenience wrapper around feature_effects()
, which calls
the barebone implementation .pd()
to calculate PD.
As grid points, it uses the arithmetic mean of X
per bin (specified by breaks
),
and eventually weighted by w
.
partial_dependence(object, ...) ## Default S3 method: partial_dependence( object, v, data, pred_fun = stats::predict, trafo = NULL, which_pred = NULL, w = NULL, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, pd_n = 500L, seed = NULL, ... ) ## S3 method for class 'ranger' partial_dependence( object, v, data, pred_fun = NULL, trafo = NULL, which_pred = NULL, w = NULL, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, pd_n = 500L, seed = NULL, ... ) ## S3 method for class 'explainer' partial_dependence( object, v = colnames(data), data = object$data, pred_fun = object$predict_function, trafo = NULL, which_pred = NULL, w = object$weights, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, pd_n = 500L, seed = NULL, ... ) ## S3 method for class 'H2OModel' partial_dependence( object, data, v = object@parameters$x, pred_fun = NULL, trafo = NULL, which_pred = NULL, w = object@parameters$weights_column$column_name, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, pd_n = 500L, seed = NULL, ... )
partial_dependence(object, ...) ## Default S3 method: partial_dependence( object, v, data, pred_fun = stats::predict, trafo = NULL, which_pred = NULL, w = NULL, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, pd_n = 500L, seed = NULL, ... ) ## S3 method for class 'ranger' partial_dependence( object, v, data, pred_fun = NULL, trafo = NULL, which_pred = NULL, w = NULL, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, pd_n = 500L, seed = NULL, ... ) ## S3 method for class 'explainer' partial_dependence( object, v = colnames(data), data = object$data, pred_fun = object$predict_function, trafo = NULL, which_pred = NULL, w = object$weights, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, pd_n = 500L, seed = NULL, ... ) ## S3 method for class 'H2OModel' partial_dependence( object, data, v = object@parameters$x, pred_fun = NULL, trafo = NULL, which_pred = NULL, w = object@parameters$weights_column$column_name, breaks = "Sturges", right = TRUE, discrete_m = 13L, outlier_iqr = 2, pd_n = 500L, seed = NULL, ... )
object |
Fitted model. |
... |
Further arguments passed to |
v |
Variable names to calculate statistics for. |
data |
Matrix or data.frame. |
pred_fun |
Prediction function, by default |
trafo |
How should predictions be transformed?
A function or |
which_pred |
If the predictions are multivariate: which column to pick
(integer or column name). By default |
w |
Optional vector with case weights. Can also be a column name in |
breaks |
An integer, vector, or "Sturges" (the default) used to determine
bin breaks of continuous features. Values outside the total bin range are placed
in the outmost bins. To allow varying values of |
right |
Should bins be right-closed? The default is |
discrete_m |
Numeric features with up to this number of unique values should not
be binned but rather treated as discrete. The default is 13. Vectorized over |
outlier_iqr |
If |
pd_n |
Size of the data used for calculating partial dependence.
The default is 500. For larger |
seed |
Optional integer random seed used for:
|
A list (of class "EffectData") with a data.frame per feature having columns:
bin_mid
: Bin mid points. In the plots, the bars are centered around these.
bin_width
: Absolute width of the bin. In the plots, these equal the bar widths.
bin_mean
: For continuous features, the (possibly weighted) average feature
value within bin. For discrete features equivalent to bin_mid
.
N
: The number of observations within bin.
weight
: The weight sum within bin. When w = NULL
, equivalent to N
.
Different statistics, depending on the function call.
Use single bracket subsetting to select part of the output. Note that each data.frame contains an attribute "discrete" with the information whether the feature is discrete or continuous. This attribute might be lost when you manually modify the data.frames.
partial_dependence(default)
: Default method.
partial_dependence(ranger)
: Method for ranger models.
partial_dependence(explainer)
: Method for DALEX explainers.
partial_dependence(H2OModel)
: Method for H2O models.
Friedman, Jerome H. 2001, Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics 29 (5): 1189-1232. doi:10.1214/aos/1013203451.
feature_effects()
, .pd()
, ale()
.
fit <- lm(Sepal.Length ~ ., data = iris) M <- partial_dependence(fit, v = "Species", data = iris) M |> plot() M2 <- partial_dependence(fit, v = colnames(iris)[-1], data = iris) plot(M2, share_y = "all")
fit <- lm(Sepal.Length ~ ., data = iris) M <- partial_dependence(fit, v = "Species", data = iris) M |> plot() M2 <- partial_dependence(fit, v = colnames(iris)[-1], data = iris) plot(M2, share_y = "all")
Versatile plot function for an "EffectData" object. By default, all calculated
statistics (except "resid_mean") are shown. To select certain statistics,
use the stats
argument. Set plotly = TRUE
for interactive plots. Note that
all statistics are plotted at bin means, except for ALE
(shown at right bin breaks).
## S3 method for class 'EffectData' plot( x, stats = NULL, ncol = grDevices::n2mfrow(length(x))[2L], byrow = TRUE, share_y = c("no", "all", "rows", "cols"), ylim = NULL, discrete_lines = TRUE, continuous_points = FALSE, title = "", subplot_titles = TRUE, ylab = NULL, legend_labels = NULL, interval = c("no", "ci", "ciw", "sd"), ci_level = 0.95, colors = getOption("effectplots.colors"), fill = getOption("effectplots.fill"), alpha = 1, bar_height = 1, bar_width = 1, bar_measure = c("weight", "N"), wrap_x = 10, rotate_x = 0, plotly = getOption("effectplots.plotly"), ... )
## S3 method for class 'EffectData' plot( x, stats = NULL, ncol = grDevices::n2mfrow(length(x))[2L], byrow = TRUE, share_y = c("no", "all", "rows", "cols"), ylim = NULL, discrete_lines = TRUE, continuous_points = FALSE, title = "", subplot_titles = TRUE, ylab = NULL, legend_labels = NULL, interval = c("no", "ci", "ciw", "sd"), ci_level = 0.95, colors = getOption("effectplots.colors"), fill = getOption("effectplots.fill"), alpha = 1, bar_height = 1, bar_width = 1, bar_measure = c("weight", "N"), wrap_x = 10, rotate_x = 0, plotly = getOption("effectplots.plotly"), ... )
x |
An object of class "EffectData". |
stats |
Vector of statistics to show. The default |
ncol |
Number of columns of the plot layout, by default
|
byrow |
Should plots be placed by row? Default is |
share_y |
Should y axis be shared across subplots? The default is "no".
Other choices are "all", "rows", and "cols". Note that this currently does not
take into account error bars/ribbons.
Has no effect if |
ylim |
A vector of length 2 with manual y axis limits, or a list thereof. |
discrete_lines |
Show lines for discrete features. Default is |
continuous_points |
Show points for continuous features. Default is |
title |
Overall plot title, by default |
subplot_titles |
Should variable names be shown as subplot titles?
Default is |
ylab |
Label of the y axis. The default |
legend_labels |
Vector of legend labels in the same order as the
statistics plotted, or |
interval |
What intervals should be shown for observed y and residuals? One of
|
ci_level |
The nominal level of the Z confidence intervals (only when
|
colors |
Vector of line/point colors of sufficient length.
By default, a color blind friendly palette from "ggthemes".
To change globally, set |
fill |
Fill color of bars. The default equals "lightgrey".
To change globally, set |
alpha |
Alpha transparency of lines and points. Default is 1. |
bar_height |
Relative bar height (default 1). Set to 0 for no bars. |
bar_width |
Bar width multiplier (for discrete features). By default 1. |
bar_measure |
What should bars represent? Either "weight" (default) or "N". |
wrap_x |
Should categorical x axis labels be wrapped after this length?
The default is 10. Set to 0 for no wrapping. Vectorized over |
rotate_x |
Should categorical xaxis labels be rotated by this angle?
The default is 0 (no rotation). Vectorized over |
plotly |
Should 'plotly' be used? The default is |
... |
Passed to |
If a single plot, an object of class "ggplot" or "plotly". Otherwise, an object of class "patchwork", or a "plotly" subplot.
feature_effects()
, average_observed()
, average_predicted()
,
partial_dependence()
, bias()
, ale()
fit <- lm(Sepal.Length ~ ., data = iris) xvars <- colnames(iris)[-1] M <- feature_effects(fit, v = xvars, data = iris, y = "Sepal.Length", breaks = 5) plot(M, share_y = "all") plot(M, stats = c("pd", "ale"), legend_labels = c("PD", "ALE")) plot(M, stats = "resid_mean", share_y = "all", interval = "ci")
fit <- lm(Sepal.Length ~ ., data = iris) xvars <- colnames(iris)[-1] M <- feature_effects(fit, v = xvars, data = iris, y = "Sepal.Length", breaks = 5) plot(M, share_y = "all") plot(M, stats = c("pd", "ale"), legend_labels = c("PD", "ALE")) plot(M, stats = "resid_mean", share_y = "all", interval = "ci")
Updates an "EffectData" object by
turning discrete values to factor (especially useful with the next option),
collapsing levels of categorical variables with many levels,
dropping empty bins,
dropping small bins,
dropping bins with missing name, or
sorting the variables by their importance, see effect_importance()
-
Except for sort_by
, all arguments are vectorized, i.e., you can
pass a vector or list of the same length as object
.
## S3 method for class 'EffectData' update( object, sort_by = c("no", "pd", "pred_mean", "y_mean", "resid_mean", "ale"), to_factor = FALSE, collapse_m = 30L, collapse_by = c("weight", "N"), drop_empty = FALSE, drop_below_n = 0, drop_below_weight = 0, na.rm = FALSE, ... )
## S3 method for class 'EffectData' update( object, sort_by = c("no", "pd", "pred_mean", "y_mean", "resid_mean", "ale"), to_factor = FALSE, collapse_m = 30L, collapse_by = c("weight", "N"), drop_empty = FALSE, drop_below_n = 0, drop_below_weight = 0, na.rm = FALSE, ... )
object |
Object of class "EffectData". |
sort_by |
By which statistic ("pd", "pred_mean", "y_mean", "resid_mean", "ale") should the results be sorted? The default is "no" (no sorting). Calculated after all other update steps, e.g., after collapsing or dropping rare levels. |
to_factor |
Should discrete features be treated as factors?
In combination with |
collapse_m |
If a factor or character feature has more than |
collapse_by |
How to determine "rare" levels in |
drop_empty |
Drop empty bins. Equivalent to |
drop_below_n |
Drop bins with N below this value. Applied after collapsing. |
drop_below_weight |
Drop bins with weight below this value. Applied after collapsing. |
na.rm |
Should missing bin centers be dropped? Default is |
... |
Currently not used. |
A modified object of class "EffectData".
feature_effects()
, average_observed()
, average_predicted()
,
partial_dependence()
, ale()
, bias()
, effect_importance()
fit <- lm(Sepal.Length ~ ., data = iris) xvars <- colnames(iris)[-1] feature_effects(fit, v = xvars, data = iris, y = "Sepal.Length", breaks = 5) |> update(sort = "pd", collapse_m = 2) |> plot()
fit <- lm(Sepal.Length ~ ., data = iris) xvars <- colnames(iris)[-1] feature_effects(fit, v = xvars, data = iris, y = "Sepal.Length", breaks = 5) |> update(sort = "pd", collapse_m = 2) |> plot()