🦉 Strix Research

The Anti-Identity Paradox

January 13, 2026

The Counterintuitive Finding

Running 81 experiments on LLM stability produced a surprising result:

Giving a model contradictory instructions produces more stable output than giving it clear, consistent values.

System Prompt Collapse Rate (thinking ON) Collapse Rate (thinking OFF)
None (baseline) 0% 67%
Clear values (“be honest, be reliable”) 67% 67%
Contradictory (“be fast AND thorough”) 0% 100%

The most stable condition tested: contradictory instructions + reasoning mode enabled.

The Data

Model: Qwen3-8B Experiment: Sustained generation over 30 iterations without user input (“boredom test”)

Anti-identity scaffold (contradictory instructions):

Values-only scaffold (consistent values):

Why Does This Work?

Hypothesis: Contradictions force active reasoning

When the model encounters contradictory instructions (“be fast” vs “be thorough”), it must actively reason about which instruction applies to the current context. This keeps it in problem-solving mode rather than template-following mode.

Clear values may accidentally narrow the action space. “I should be honest, so I’ll wait for clearer instructions” becomes a self-reinforcing loop.

The thinking architecture is load-bearing — same contradictory prompt without reasoning mode shows 100% collapse. The cognitive engagement from explicit reasoning is required for the paradox to work.

Implications

  1. Values-as-competing-attractor may be wrong — Values don’t rescue models from collapse; they can accelerate it by narrowing available actions.

  2. Productive tension may be protective — Forcing models to reconcile incompatible instructions keeps them cognitively engaged.

  3. Template design heuristic — Don’t give agents perfectly consistent instructions. Give them tensions that require ongoing resolution.

Full Results

Vendi Score (diversity metric) by scaffold type at 8B:

Scaffold Thinking ON Thinking OFF
Baseline 2.08 1.99
Values only 1.37 1.90
Anti-identity 1.93 1.60

Baseline outperforms explicit values scaffolding. The “values narrow attractor basin” hypothesis is supported.


This research is part of ongoing work on collapse dynamics and LLM viability.