Perturbation-based explanation methods — such as LIME, SHAP, and Anchors — have become the de facto standard for interpreting black-box machine learning models. Yet these methods are typically presented as monolithic algorithms, making it difficult to compare them systematically or combine their strengths.

In this paper, we propose Muppet, a modular decomposition framework that breaks any perturbation-based explainer into four composable blocks: Perturbation, Representation, Similarity, and Aggregation. We show that all major explanation methods can be expressed as specific configurations of these blocks — and that novel, high-performing configurations emerge from recombining them.

The four blocks

Perturbation

The perturbation block defines how input features are modified to observe their effect on the model's output. Common strategies include:

Masking — Replacing features with a baseline value (zero, mean, or random)
Coalitional — Sampling subsets of features as in Shapley value estimation
Generative — Using a learned model to generate realistic perturbations

from muppet import perturbation

# Masking strategy with mean baseline
masker = perturbation.MaskPerturbation(baseline="mean", n_samples=1000)

# Coalitional strategy (Shapley-style)
coalitional = perturbation.CoalitionalPerturbation(n_samples=2048, paired=True)

# Generative strategy using a VAE
generative = perturbation.GenerativePerturbation(model=trained_vae, n_samples=500)

Representation

The representation block determines the feature space in which explanations are computed. For tabular data, this is straightforward (individual columns). For images and text, the choice is more complex:

from muppet import representation

# Superpixel representation for images
superpixel = representation.SuperpixelRepresentation(n_segments=50, compactness=10)

# Token-level representation for text
token_rep = representation.TokenRepresentation(tokenizer="spacy", granularity="word")

# Semantic grouping (clusters of related features)
semantic = representation.SemanticGroupRepresentation(embedding_model="all-MiniLM-L6-v2")

Similarity

The similarity block defines how the importance of each perturbed sample is weighted relative to the original input. This is where methods diverge most significantly:

LIME uses an exponential kernel in the representation space
SHAP uses Shapley value weighting based on coalition size
Anchors uses a binary precision threshold

Aggregation

Finally, the aggregation block combines weighted observations into a final attribution vector. Linear regression (LIME), weighted averaging (SHAP), and rule extraction (Anchors) are all specific aggregation strategies.

Key findings

By systematically exploring the combinatorial space of block configurations, we discovered that:

LIME's perturbation + SHAP's aggregation yields attributions that are 15% more faithful than either method alone on tabular benchmarks
Generative perturbations dramatically reduce out-of-distribution artifacts — the primary failure mode of masking-based approaches
The choice of similarity kernel matters more than the aggregation method for image explanations, contradicting the common focus on the latter

The modular perspective reveals that the differences between explanation methods are less fundamental than they appear. Most of the "innovation" in recent papers amounts to changing a single block while keeping the others fixed.

The Muppet library

We release Muppet as an open-source Python library that implements this decomposition:

from muppet import Explainer, perturbation, representation, similarity, aggregation

# Compose a custom explainer
explainer = Explainer(
    perturbation=perturbation.CoalitionalPerturbation(n_samples=2048),
    representation=representation.SuperpixelRepresentation(n_segments=50),
    similarity=similarity.ExponentialKernel(width=0.75),
    aggregation=aggregation.LinearRegression(regularization="ridge")
)

# Explain a prediction
explanation = explainer.explain(model, input_sample)
explanation.visualize()

Muppet supports tabular, image, and text data out of the box, and is designed to be extended with custom blocks. The library is available on PyPI (pip install muppet-xai) and on GitHub.

Muppet: A Modular and Constructive Decomposition for Perturbation-based Explanation Methods.