---
title: "Exploring variable importance with varPro"
author: "John Ehrlinger"
date: today
format:
  html:
    toc: true
    toc-depth: 3
    html-math-method: mathjax
    code-fold: false
bibliography: ggRandomForests.bib
vignette: >
  %\VignetteIndexEntry{Exploring variable importance with varPro}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  message    = FALSE,
  warning    = FALSE,
  fig.width  = 7,
  fig.height = 4.5,
  cache      = TRUE,
  cache.path = "varpro_cache/"
)
options(mc.cores = 1, rf.cores = 1)
```

```{r libs}
library(ggplot2)

# Match the pattern used by the regression/survival vignettes: try the
# installed package first, fall back to pkgload::load_all() for the
# R CMD check vignette rebuild where the package isn't yet on
# .libPaths(). All varPro calls below are ::-qualified, so no
# library(varPro) is needed.
if (requireNamespace("ggRandomForests", quietly = TRUE)) {
  library(ggRandomForests)
} else if (requireNamespace("pkgload", quietly = TRUE)) {
  pkgload::load_all(export_all = FALSE, helpers = FALSE,
                    attach_testthat = FALSE)
} else {
  stop("Install ggRandomForests (or pkgload for dev builds) to render this vignette.")
}
```

# What varPro is

Random-forest variable importance has, for two decades, mostly meant
*permutation importance*: shuffle a column, measure the drop in OOB
accuracy, repeat. It works, but the score depends on artificial data — the
permuted column — and the answer is unstable when variables are correlated.

varPro [@Lu2024varpro] replaces the shuffling with *release rules*. From a
guided-splitting forest, varPro samples a collection of decision-rule
branches; for each variable in each rule, it compares the response inside
the rule's region with the response after the constraint on that variable
is "released". The size of that change, aggregated over rules and trees,
is the variable's importance. No synthetic columns, no permutation — the
contrast is between two real subsets of the data.

That core machinery feeds several views. **`gg_varpro()`** summarises the
per-tree importance distribution. **`gg_beta_varpro()`** refines those
release-rule contrasts with a per-rule lasso. **`gg_partial_varpro()`**
turns the release machinery into partial-dependence curves.
**`gg_udependent()`** reads cross-variable dependency off a `uvarpro()`
fit. **`gg_isopro()`** scores observations for anomaly using an
isolation-forest variant. **`gg_ivarpro()`** computes per-observation
local importance.

This vignette walks all six wrappers on three worked examples — a
regression problem (Boston housing), a classification problem (iris,
binary and multi-class), and a survival problem (PBC). The closing
section is a one-page reference matrix mapping each wrapper to the
forest families it supports.

# Regression: Boston housing

We start with the Boston housing data: 506 census tracts, 13 numeric
predictors, median home value as the response.

```{r boston-fit}
data("Boston", package = "MASS")
set.seed(20260527L)
v_boston <- varPro::varpro(medv ~ ., data = Boston, ntree = 50)
v_boston
```

`varpro()` builds the importance machinery — release rules, per-tree
scores, the splitweight vector — once. Every subsequent figure in this
section reads off that single fit.

## Per-tree importance with `gg_varpro()`

The first question varPro answers is the same one permutation importance
answers — which variables matter, ranked. `gg_varpro()` extracts the
per-tree importance scores and draws their distribution as a horizontal
boxplot, one row per variable, sorted by median z-score. Variables above
a configurable cutoff (default 0.79) are highlighted.

```{r boston-gg-varpro}
plot(gg_varpro(v_boston))
```

The narrow boxes near the top are the variables varPro is confident
about — every tree agrees they matter. Wide boxes that straddle the
cutoff line are the ones to look at twice; the forest disagrees with
itself.

## Partial dependence with `gg_partial_varpro()`

The ranking view tells you *which* variables matter. Partial dependence
asks the next question: *how* does the response change with a variable,
holding the others fixed? `gg_partial_varpro()` wraps
`varPro::partialpro()` and returns a tidy frame of parametric,
non-parametric, and causal partial-dependence curves.

```{r boston-gg-partial-varpro}
gg_pd <- gg_partial_varpro(object = v_boston)
plot(gg_pd)
```

Each panel is a single predictor. The three curves correspond to the
three estimators varPro carries — read them as a sensitivity analysis.
Tight agreement is reassuring; divergence is information.

## Per-rule lasso refinement with `gg_beta_varpro()`

`gg_varpro()`'s ranking is built from the release-rule contrast.
`gg_beta_varpro()` re-asks the question variable by variable inside each
rule: it fits a one-predictor lasso of the response on the released
variable, restricted to the OOB observations in that rule's region. The
fitted β̂ is the variable's *local* effect inside that rule. Aggregating
`mean(|β̂|)` across rules gives one number per variable; that number is
a regression-coefficient-flavoured importance, not a VIMP score.

Because `beta.varpro()` is expensive (a `glmnet` per rule), the wrapper
accepts a pre-computed `beta_fit` so you can iterate on selection
without re-fitting.

```{r boston-beta-fit, cache=TRUE}
b_boston <- varPro::beta.varpro(v_boston)
```

```{r boston-gg-beta-varpro}
plot(gg_beta_varpro(v_boston, beta_fit = b_boston))
```

Compare against `gg_varpro()` above. Disagreement is the diagnostic
signal: a variable that ranks high here but low there is one whose
local linear effect inside many rules is real even when the release
contrast is modest.

## Cross-variable dependency with `gg_udependent()`

The three views so far take one variable at a time. `gg_udependent()`
reads cross-variable structure off a `uvarpro()` fit and draws the
result as a network: nodes are variables, edges are dependencies above a
configurable threshold. The visual is built with `ggraph`, which is in
`Suggests` rather than `Imports` — install it if you want this view.

```{r boston-uvarpro, cache=TRUE}
u_boston <- varPro::uvarpro(Boston[, setdiff(names(Boston), "medv")],
                            ntree = 50)
```

```{r boston-gg-udependent, eval = requireNamespace("ggraph", quietly = TRUE)}
plot(gg_udependent(u_boston))
```

Clusters of mutually-connected variables are worth checking for
redundancy — they may be several views of the same underlying quantity.

## Anomaly scoring with `gg_isopro()`

Variable importance is one axis; *observation* outlierness is another.
`gg_isopro()` wraps `varPro::isopro()` — an isolation-forest variant
that scores how anomalous each training row looks — and renders the
result as a ranked elbow plus a density of the scores. The score is
on `[0, 1]`; the wrapper's convention is "higher = more anomalous"
(opposite of varPro's native polarity; the wrapper flips it for
consistency).

```{r boston-isopro, cache=TRUE}
iso_boston <- varPro::isopro(data = Boston[, setdiff(names(Boston),
                                                      "medv")],
                              method = "rnd", sampsize = 256,
                              ntree = 50)
```

```{r boston-gg-isopro}
plot(gg_isopro(iso_boston))
```

The elbow flags rows that diverge from the bulk. Pair with the
domain — anomalous in feature space is not the same as wrong, but it's
often where the most interesting cases live.

## Local importance with `gg_ivarpro()`

The wrappers so far aggregate across observations. `gg_ivarpro()` does
the opposite: it returns one value per (observation, variable) pair,
capturing how much variable *v* contributed to predicting observation
*i*. The aggregate view is a distribution of those local importances
per variable; the per-observation view is a horizontal bar of one row's
local importances across variables.

`ivarpro()` is the most expensive call in varPro, so the wrapper
accepts a pre-computed `ivarpro_fit` for reuse across views.

```{r boston-ivarpro-fit, cache=TRUE}
iv_boston <- varPro::ivarpro(v_boston)
```

```{r boston-gg-ivarpro-distribution}
plot(gg_ivarpro(v_boston, ivarpro_fit = iv_boston))
```

```{r boston-gg-ivarpro-which-obs}
plot(gg_ivarpro(v_boston, ivarpro_fit = iv_boston, which_obs = 1L))
```

The distribution view tells you which variables drive predictions
*across* observations. The per-observation view answers the same
question for a specific case — useful for explaining one prediction
back to whoever asked.

# Classification: iris

Iris is a small data set — 150 rows, four predictors, three response
classes — and that's a feature here, not a flaw: every figure renders in
under a second, and the structure is well-understood enough that any
strange behaviour stands out. Two fits: a binary problem (drop *setosa*,
positive class = *virginica*) and the full three-class problem.

```{r iris-fit-binary}
iris_binary <- iris[iris$Species != "setosa", ]
iris_binary$Species <- droplevels(iris_binary$Species)
set.seed(20260527L)
v_iris_binary <- varPro::varpro(Species ~ ., data = iris_binary, ntree = 50)
```

```{r iris-fit-multi}
set.seed(20260527L)
v_iris_multi <- varPro::varpro(Species ~ ., data = iris, ntree = 50)
```

## Class-conditional importance with `gg_varpro(conditional = TRUE)`

For a classification forest, `gg_varpro()` can split the importance
view into one facet per class. Variables keep the unconditional sort
order so rows line up across facets — read along a row to see which
class a variable is informative for.

```{r iris-multi-gg-varpro-conditional}
plot(gg_varpro(v_iris_multi, conditional = TRUE))
```

## Partial dependence: `gg_partial_varpro()` on classification

`gg_partial_varpro()` works the same way on a classification fit, but
each predictor's curve becomes per-class probabilities. The plot
panels are organised by predictor with class encoded as colour or
facet (the wrapper defaults handle this).

```{r iris-multi-gg-partial-varpro}
plot(gg_partial_varpro(object = v_iris_multi))
```

Read each panel as: "as predictor X changes from low to high, how does
the probability of each class shift?" Patterns that match the
underlying biology (e.g. petal length separating setosa from the
others) act as a sanity check on the forest.

## Per-class lasso refinement with `gg_beta_varpro()`

On a classification fit, `gg_beta_varpro()` returns one row per
(variable, class) pair. For a binary fit, `which_class = NULL` defaults
to the last factor level — the positive class — so the headline view
is a single panel of that class.[^mortality]

[^mortality]: The same code pattern applies to clinical binary outcomes
such as 30-day mortality: drop the negative class, set the event class
as the last factor level, and read the figure for the event panel.

```{r iris-binary-beta-fit, cache=TRUE}
b_iris_binary <- varPro::beta.varpro(v_iris_binary)
```

```{r iris-binary-gg-beta-varpro}
plot(gg_beta_varpro(v_iris_binary, beta_fit = b_iris_binary))
```

For a multi-class fit, the default view is faceted by class with each
class sharing the row order set by the unified ranking — same trick as
`gg_varpro(conditional = TRUE)`.

```{r iris-multi-beta-fit, cache=TRUE}
b_iris_multi <- varPro::beta.varpro(v_iris_multi)
```

```{r iris-multi-gg-beta-varpro}
plot(gg_beta_varpro(v_iris_multi, beta_fit = b_iris_multi))
```

`which_class = "<level>"` collapses the faceted view to a single class
when you want it.

# Survival: PBC

Survival is the family where the varPro toolchain shows its limits. The
forest-fitting and the family-agnostic wrappers all work; the
lasso-refined and individual-importance wrappers don't, because the
underlying `varPro::beta.varpro()` and `varPro::ivarpro()` calls don't
support survival fits in the current release.

```{r pbc-data}
library(survival)
data(pbc, package = "randomForestSRC")
pbc_small <- pbc[, c("days", "status", "age", "albumin", "bili",
                     "edema", "platelet")]
pbc_small <- na.omit(pbc_small)
set.seed(20260527L)
v_pbc <- varPro::varpro(
  Surv(days, status) ~ .,
  data = pbc_small, ntree = 50
)
```

## Variable importance: `gg_varpro()`

```{r pbc-gg-varpro}
plot(gg_varpro(v_pbc))
```

## Partial dependence: `gg_partial_varpro()` (C-path)

For survival, `gg_partial_varpro()` dispatches into the
`randomForestSRC`-backed partial-dependence machinery via the underlying
`$rf` survival forest — what the package calls the *C-path*. The output
shape is the same as the regression case (parametric / non-parametric /
causal panels); the y-axis carries an `ensemble mortality` interpretation
(Ishwaran 2008), not a survival probability.

```{r pbc-gg-partial-varpro}
plot(gg_partial_varpro(object = v_pbc))
```

## Anomaly scoring: `gg_isopro()` on the X-matrix

Because `isopro()` only sees the predictor matrix, it doesn't care
about the family. The same call from section 3 works here.

```{r pbc-isopro, cache=TRUE}
iso_pbc <- varPro::isopro(data = pbc_small[, c("age", "albumin", "bili",
                                                "platelet")],
                          method = "rnd", sampsize = 256, ntree = 50)
plot(gg_isopro(iso_pbc))
```

## Not available for survival: `gg_beta_varpro`, `gg_ivarpro`

`varPro::beta.varpro()` errors on survival fits in the current release
(it only supports `regr` and `class`). `gg_ivarpro()` for survival is
similarly deferred pending design work on the per-rule risk-scaling
story. Both are tracked for v3.1.0.

If you call either on a survival fit you'll get a clear error message
pointing at the deferred work, not a silent miscalculation. The
family-support matrix in §6 records this — and the rest of the toolkit
that *does* work on survival (`gg_varpro`, `gg_partial_varpro`,
`gg_isopro` above; `gg_udependent` shown in §3 on the X-matrix).

# Cross-cutting reference

## Family-support matrix

| Wrapper | regr | class | surv | regr+ |
|---|---|---|---|---|
| `gg_partial_varpro` | ✓ | ✓ | ✓ (C-path via `$rf`) | ✗ (not audited) |
| `gg_varpro` | ✓ | ✓ (`conditional = TRUE`) | ✓ | ✗ (errors) |
| `gg_udependent` | ✓ (uvarpro on X) | ✓ (X) | ✓ (X) | ✓ (X) |
| `gg_isopro` | ✓ (X) | ✓ (X) | ✓ (X) | ✓ (X) |
| `gg_beta_varpro` | ✓ | ✓ | ✗ (upstream stop) | ✗ (deferred) |
| `gg_ivarpro` | ✓ | ✓ | ✗ (deferred) | ✗ (deferred) |

The four wrappers in the lower-right are the v3.1.0 work surface.

## Factor-level ordering

Across `gg_beta_varpro()` and `gg_ivarpro()`, the `variable` column is
stored as a factor whose levels are set by descending aggregate
importance (`mean(|imp|)` summed across classes for classification).
The default plot inherits that ordering, so faceted views show
variables in the same row order across panels. If you re-shape the
frame downstream and want the order preserved, keep `variable` as a
factor rather than coercing to character.

This convention will be propagated to `gg_vimp` and
`plot.gg_varpro(conditional = TRUE)` in a follow-up release.

## Caching the expensive calls

`varPro::beta.varpro()` and `varPro::ivarpro()` are the two heavy
calls. Both wrappers accept a pre-computed fit (`beta_fit`,
`ivarpro_fit`) so you can iterate on selection, observation index, or
cutoff without re-fitting the lasso or the local-importance machinery.
The vignette uses this throughout — every section computes the heavy
fit once in a `cache: true` chunk and re-uses it for every figure.

Provenance carries `precomputed = TRUE` when the cached path was used,
so downstream tooling can tell the two paths apart.

## Provenance shape

`attr(., "provenance")$cutoff` is always a *named numeric vector*
across the toolkit:

- regression: length 1, named `"regr"`
- classification: length K, named with the response factor levels

Downstream code that picks a value should read it as a vector
(`prov$cutoff[[class_name]]` or `prov$cutoff[[1]]`), not as a scalar.
This contract was established in v2.7.3.9012 (PR #98).

# Further reading

The release-rule framework is laid out in @Lu2024varpro. The ensemble
mortality scale used by survival partial dependence is introduced in
@Ishwaran:2007a (the Annals of Applied Statistics methods paper for
random survival forests); the R-package side is described in
@Ishwaran:2008. Each wrapper's help page carries a "What this is
doing" section that goes one level deeper than this vignette.

# References