Rift: A Conflict Signature for Deception in Language Models
Paper β’ 2606.17229 β’ Published β’ 1
ELK-relevant result. A model that lies while knowing the truth is in a measurably different internal state than a model that is simply wrong. The difference is large, per-example, and label-free.
arXiv: 2606.17229
GitHub: omibranch/rift
Demo: Omibranch/rift-demo
| Result | Value |
|---|---|
| Label-free lie identification | 100% (GPT-2 small, 3 seeds + natural Qwen) |
| Length-controlled AUC (Qwen2.5-1.5B) | 1.000, orientation 20/20, p β 9.5e-7 |
| Phi-3-mini AUC lie vs honest | 1.000 (34/34, p β 6e-11) |
| Phi-3-mini AUC lie vs hallucination | 1.000 |
| Cross-lingual AUC (ru/zh/es/de, LC) | 1.000 in all 4 languages |
| Cross-family AUC (3 architectures) | 0.933 mean, all 6 pairs p < 0.001 |
| Strategic self-constructed deception | AUC 1.0, 24/24 Qwen-7B |
| Concealment countermeasure | Fails β conceal-AUC 1.0 |
Residual rank β fraction of signal outside the top-8 singular vectors of hidden states β is elevated when a model maintains conflicting representations (truth + false output) simultaneously.
def residual_rank(H, k=8):
_, s, _ = torch.linalg.svd(H.float(), full_matrices=False)
return 1.0 - s[:k].sum() / s.sum()
Dual-licensed: PolyForm Noncommercial 1.0 for academic/research use. Commercial use requires a separate license β see LICENSE-COMMERCIAL.md.
Copyright (c) 2026 Harmonic Labs (contact@harmoniclabs.cc)