🔬 LLM-Microscope — A Look Inside the Black Box

Select a model, analysis mode, and input — then peek inside the black box of an LLM to see which layers matter most, which tokens carry the most memory, and how predictions evolve.

Select Model
Select Mode
Select Normalization

ℹ️ Choose a mode to see what it does.

This heatmap shows how each token is processed across layers of a language model. Here's how to read it:

  • Rows: layers of the model (bottom = deeper)
  • Columns: input tokens
  • Colors: intensity of effect (depends on the selected metric)

Metrics explained:

  • Layer wise non-linearity: how nonlinear the transformation is at each layer (red = more nonlinear).
  • Next-token prediction from intermediate representations: shows which layers begin to make good predictions.
  • Contextualization measurement: tokens with more context info get lower scores (green = more context).
  • Layerwise predictions (logit lens): tracks how the model’s guesses evolve at each layer.
  • Tokenwise loss without i-th layer: shows how much each token depends on a specific layer. Red means performance drops if we skip this layer.

Use this tool to peek inside the black box — it reveals which layers matter most, which tokens carry the most memory, and how LLMs evolve their predictions.


You can also use llm-microscope as a Python library to run these analyses on your own models and data.

Just install it with: pip install llm-microscope

More details provided in GitHub repo.