• About
  • Documentation

  • More Universes
  • Recent Updates
  • Leader board

  • All repositories
  • All packages
  • All articles
  • All datasets
  • All system Libraries
mayer79
  • Builds
  • Packages
  • Articles
  • Datasets
  • Contribution
  • Badges
  • API
  • Feed

Links tomayer79

shapviz - SHAP Visualizations

Visualizations for SHAP (SHapley Additive exPlanations), such as waterfall plots, force plots, various types of importance plots, dependence plots, and interaction plots. These plots act on a 'shapviz' object created from a matrix of SHAP values and a corresponding feature dataset. Wrappers for the R packages 'xgboost', 'lightgbm', 'fastshap', 'shapr', 'h2o', 'treeshap', 'DALEX', and 'kernelshap' are added for convenience. By separating visualization and computation, it is possible to display factor variables in graphs, even if the SHAP values are calculated by a model that requires numerical features. The plots are inspired by those provided by the 'shap' package in Python, but there is no dependency on it.

Last updated

explainable-aimachine-learningshapshapley-valuevisualizationxai

11.15 score 120 stars 1 dependents 718 scripts 9.1k downloads

missRanger - Fast Imputation of Missing Values

Alternative implementation of the beautiful 'MissForest' algorithm used to impute mixed-type data sets by chaining random forests, introduced by Stekhoven, D.J. and Buehlmann, P. (2012) <doi:10.1093/bioinformatics/btr597>. Under the hood, it uses the lightning fast random forest package 'ranger'. Between the iterative model fitting, we offer the option of using predictive mean matching. This firstly avoids imputation with values not already present in the original data (like a value 0.3334 in 0-1 coded variable). Secondly, predictive mean matching tries to raise the variance in the resulting conditional distributions to a realistic level. This would allow, e.g., to do multiple imputation when repeating the call to missRanger(). Out-of-sample application is supported as well.

Last updated

imputationmachine-learningmissing-valuesrandom-forest

10.69 score 72 stars 3 dependents 448 scripts 5.6k downloads

kernelshap - Kernel SHAP

Efficient implementation of Kernel SHAP (Lundberg and Lee, 2017, <doi:10.48550/arXiv.1705.07874>) permutation SHAP, and additive SHAP for model interpretability. For Kernel SHAP and permutation SHAP, if the number of features is too large for exact calculations, the algorithms iterate until the SHAP values are sufficiently precise in terms of their standard errors. The package integrates smoothly with meta-learning packages such as 'tidymodels', 'caret' or 'mlr3'. It supports multi-output models, case weights, and parallel computations. Visualizations can be done using the R package 'shapviz'.

Last updated

explainable-aiinterpretabilityinterpretable-machine-learningmachine-learningshapxai

9.43 score 62 stars 23 dependents 227 scripts 9.3k downloads

confintr - Confidence Intervals

Calculates classic and/or bootstrap confidence intervals for many parameters such as the population mean, variance, interquartile range (IQR), median absolute deviation (MAD), skewness, kurtosis, Cramer's V, odds ratio, R-squared, quantiles (incl. median), proportions, different types of correlation measures, difference in means, quantiles and medians. Many of the classic confidence intervals are described in Smithson, M. (2003, ISBN: 978-0761924999). Bootstrap confidence intervals are calculated with the R package 'boot'. Both one- and two-sided intervals are supported.

Last updated

bootstrapconfidence-intervalsstatistical-inferencestatistics

8.85 score 19 stars 21 dependents 144 scripts 8.2k downloads

splitTools - Tools for Data Splitting

Fast, lightweight toolkit for data splitting. Data sets can be partitioned into disjoint groups (e.g. into training, validation, and test) or into (repeated) k-folds for subsequent cross-validation. Besides basic splits, the package supports stratified, grouped as well as blocked splitting. Furthermore, cross-validation folds for time series data can be created. See e.g. Hastie et al. (2001) <doi:10.1007/978-0-387-84858-7> for the basic background on data partitioning and cross-validation.

Last updated

cross-validationmachine-learningtime-seriesvalidation

7.75 score 13 stars 7 dependents 205 scripts 754 downloads

treeshap - Compute SHAP Values for Your Tree-Based Models Using the 'TreeSHAP' Algorithm

An efficient implementation of the 'TreeSHAP' algorithm introduced by Lundberg et al., (2020) <doi:10.1038/s42256-019-0138-9>. It is capable of calculating SHAP (SHapley Additive exPlanations) values for tree-based models in polynomial time. Currently supported models include 'gbm', 'randomForest', 'ranger', 'xgboost', 'lightgbm'.

Last updated

explainabilityexplainable-aiexplainable-artificial-intelligenceexplanatory-model-analysisimlinterpretabilityinterpretable-machine-learningmachine-learningresponsible-mlshapshapley-valuexaicpp

7.30 score 98 stars 195 scripts 1.2k downloads

MetricsWeighted - Weighted Metrics and Performance Measures for Machine Learning

Provides weighted versions of several metrics and performance measures used in machine learning, including average unit deviances of the Bernoulli, Tweedie, Poisson, and Gamma distributions, see Jorgensen B. (1997, ISBN: 978-0412997112). The package also contains a weighted version of generalized R-squared, see e.g. Cohen, J. et al. (2002, ISBN: 978-0805822236). Furthermore, 'dplyr' chains are supported.

Last updated

machine-learningmetricsperformancestatistics

6.93 score 11 stars 6 dependents 81 scripts 1.1k downloads

flashlight - Shed Light on Black Box Machine Learning Models

Shed light on black box machine learning models by the help of model performance, variable importance, global surrogate models, ICE profiles, partial dependence (Friedman J. H. (2001) <doi:10.1214/aos/1013203451>), accumulated local effects (Apley D. W. (2016) <doi:10.48550/arXiv.1612.08468>), further effects plots, interaction strength, and variable contribution breakdown (Gosiewska and Biecek (2019) <doi:10.48550/arXiv.1903.11420>). All tools are implemented to work with case weights and allow for stratified analysis. Furthermore, multiple flashlights can be combined and analyzed together.

Last updated

interpretabilityinterpretable-machine-learningmachine-learningxai

6.70 score 25 stars 2 dependents 67 scripts 532 downloads

hstats - Interaction Statistics

Fast, model-agnostic implementation of different H-statistics introduced by Jerome H. Friedman and Bogdan E. Popescu (2008) <doi:10.1214/07-AOAS148>. These statistics quantify interaction strength per feature, feature pair, and feature triple. The package supports multi-output predictions and can account for case weights. In addition, several variants of the original statistics are provided. The shape of the interactions can be explored through partial dependence plots or individual conditional expectation plots. 'DALEX' explainers, meta learners ('mlr3', 'tidymodels', 'caret') and most other models work out-of-the-box.

Last updated

interactioninterpretabilitymachine-learningrstatstatisticsxai

6.04 score 34 stars 1 dependents 71 scripts 318 downloads

outForest - Multivariate Outlier Detection and Replacement

Provides a random forest based implementation of the method described in Chapter 7.1.2 (Regression model based anomaly detection) of Chandola et al. (2009) <doi:10.1145/1541880.1541882>. It works as follows: Each numeric variable is regressed onto all other variables by a random forest. If the scaled absolute difference between observed value and out-of-bag prediction of the corresponding random forest is suspiciously large, then a value is considered an outlier. The package offers different options to replace such outliers, e.g. by realistic values found via predictive mean matching. Once the method is trained on a reference data, it can be applied to new data.

Last updated

machine-learningoutlieroutlier-analysisoutlier-detectionrandom-forest

5.12 score 14 stars 19 scripts 229 downloads

effectplots - Effect Plots

High-performance implementation of various effect plots useful for regression and probabilistic classification tasks. The package includes partial dependence plots (Friedman, 2021, <doi:10.1214/aos/1013203451>), accumulated local effect plots and M-plots (both from Apley and Zhu, 2016, <doi:10.1111/rssb.12377>), as well as plots that describe the statistical associations between model response and features. It supports visualizations with either 'ggplot2' or 'plotly', and is compatible with most models, including 'Tidymodels', models wrapped in 'DALEX' explainers, or models with case weights.

Last updated

machine-learningregressionxaicpp

4.57 score 22 stars 17 scripts 581 downloads