Mechanistic interpretability: 10 Breakthrough Technologies 2026

New techniques are giving researchers a glimpse at the inner workings of AI models.

January 12, 2026

A montage collage depicts a dissection of a wall of text. In the center, two hands probe into a black box of text, revealing cuts of the Golden Gate Bridge.

Hundreds of millions of people now use chatbots every day. And yet the large language models that drive them are so complicated that nobody really understands what they are, how they work, or exactly what they can and can’t do—not even the people who build them. Weird, right?

It’s also a problem. Without a clear idea of what’s going on under the hood, it’s hard to get a grip on the technology’s limitations, figure out exactly why models hallucinate, or set guardrails to keep them in check.

But last year we got the best sense yet of how LLMs function, as researchers at top AI companies began developing new ways to probe these models’ inner workings and started to piece together parts of the puzzle.

This story is only available to subscribers.

Don’t settle for half the story.
Get paywall-free access to technology news for the here and now.

Already a subscriber?

You’ve read all your free stories.

MIT Technology Review provides an intelligent and independent filter for the flood of information about technology.

Already a subscriber?

One approach, known as mechanistic interpretability, aims to map the key features and the pathways between them across an entire model. In 2024, the AI firm Anthropic announced that it had built a kind of microscope that let researchers peer inside its large language model Claude and identify features that corresponded to recognizable concepts, such as Michael Jordan and the Golden Gate Bridge.

In 2025 Anthropic took this research to another level, using its microscope to reveal whole sequences of features and tracing the path a model takes from prompt to response. Teams at OpenAI and Google DeepMind used similar techniques to try to explain unexpected behaviors, such as why their models sometimes appear to try to deceive people.

Another new approach, known as chain-of-thought monitoring, lets researchers listen in on the inner monologue that so-called reasoning models produce as they carry out tasks step by step. OpenAI used this technique to catch one of its reasoning models cheating on coding tests.

The field is split on how far you can go with these techniques. Some think LLMs are just too complicated for us to ever fully understand. But together, these novel tools could help plumb their depths and reveal more about what makes our strange new playthings work.

This is your last free story.

Our most popular stories