Language models solve complex problems by articulating intermediate reasoning steps in natural language. While effective, this process is computationally bottlenecked: each reasoning step conveys only a single subword, and many are spent expressing a thought instead of carrying out computation.
We propose MUX, a simple method for high-bandwidth and compact reasoning based on distillation of discrete reasoning into continuous multiplexed tokens in a latent space. Here, each latent token is trained to represent a weighted linear superposition (multiplexing) of a span of discrete reasoning subwords, where this superposition is lossless by construction and the span can be fully recovered (demultiplexing).
We prove that simple position-dependent weightings, such as suitable geometric decay, support lossless multiplexing, which in turn prevents shortcut behaviors caused by latent collapse. We further show that multiplexed reasoning can perform parallel exploration in problems that require search.
Across 32 evaluation settings spanning four language models, MUX outperforms strong latent reasoning baselines. Ablation and probing analyses further show that the learned latent tokens encode faithful and interpretable reasoning. Our results suggest that lossless superposition as local learning targets constitutes a sufficient condition for achieving strong and efficient latent continuous reasoning.
Lossless multiplexing of a span «5+3=8» through position-weighted linear superposition.
Given a discrete reasoning span (r1, ..., rS), MUX constructs a vocabulary-space target via:
mux(r) = Σj αj · onehot(rj)
where the coefficients αj are position-dependent weights normalized to lie on the vocabulary simplex. The model is trained to match these targets via KL divergence through a linear-softmax head.
We prove that geometric, sinusoidal, and rotary weightings all support lossless multiplexing—the original span can be exactly recovered from the superposition.
| Method | Supervision | Lossless | Shortcut-free | Train Eff. | Infer. Eff. | Interpretable |
|---|---|---|---|---|---|---|
| SFT-CoT | Discrete | ✓ | ✓ | ✓ | ✗ | ✓ |
| CODI | Global | ✗ | ✗ | ✓ | ✓ | ✗ |
| SIM-CoT | Local | ✓ | ✓ | ✗ | ✓ | ✓ |
| KaVa | Local | ✗ | ✓ | ✓ | ✓ | ✓ |
| MUX | Local | ✓ | ✓ | ✓ | ✓ | ✓ |
Test accuracies (%). Underlined when MUX outperforms SFT-CoT. MUX reports ±1 std over 3 seeds.
| Method | GSM8K-AUG | GSM8K-AUG-NL | ||||||
|---|---|---|---|---|---|---|---|---|
| ID | SVAMP | GSM-Hard | MultiArith | ID | SVAMP | GSM-Hard | MultiArith | |
| GPT-2 | ||||||||
| SFT-CoT | 44.1 | 41.8 | 9.8 | 90.7 | 34.2 | 36.9 | 7.1 | 88.7 |
| CODI | 43.7 | 42.9 | 9.9 | 92.8 | 34.1 | 30.8 | 6.8 | 58.9 |
| SIM-CoT | 42.6 | 42.6 | 9.4 | 92.8 | 30.9 | 27.5 | 6.5 | 53.9 |
| MUX | 48.1 | 45.0 | 10.6 | 93.0 | 37.4 | 36.7 | 8.9 | 72.4 |
| LLaMA 3.2 1B-Instruct | ||||||||
| SFT-CoT | 61.6 | 66.7 | 15.6 | 99.3 | 53.2 | 62.9 | 13.3 | 98.5 |
| Coconut | 45.3 | 48.8 | 9.9 | 90.1 | 24.2 | — | — | — |
| CODI | 55.6 | 61.1 | 12.8 | 96.1 | 47.9 | 55.3 | 11.3 | 96.7 |
| SIM-CoT | 56.1 | 61.5 | 12.7 | 96.2 | 28.4 | 43.0 | 6.6 | 59.4 |
| MUX | 56.7 | 63.6 | 13.0 | 98.5 | 50.3 | 57.5 | 11.6 | 96.9 |
| Method | LLaMA 3.2 3B | LLaMA 3.1 8B | ||||||
|---|---|---|---|---|---|---|---|---|
| ID | SVAMP | GSM-Hard | MultiArith | ID | SVAMP | GSM-Hard | MultiArith | |
| SFT-CoT | 71.5 | 71.0 | 17.0 | 98.3 | 71.7 | 73.1 | 16.5 | 98.3 |
| CODI | 60.8 | 73.3 | 14.3 | 98.7 | 61.1 | 78.1 | 15.5 | 99.5 |
| SIM-CoT | 62.3 | 74.9 | 14.6 | 98.8 | 64.1 | 79.4 | 16.3 | 100.0 |
| MUX | 65.0 | 77.1 | 15.2 | 100.0 | 68.1 | 80.1 | 17.1 | 100.0 |
Search accuracies (%) averaged over 3 seeds.
| Method | MNNS | Game of 24 |
|---|---|---|
| No-CoT | 68.4 | 74.4 |
| SFT-CoT | 84.6 | 84.3 |
| Coconut | 92.8 | 78.6 |
| CoT2 | 98.9 | 85.0 |
| MUX | 99.6 | 88.7 |
Through probing analysis, we show that MUX latent tokens encode faithful and interpretable reasoning content. By projecting latent tokens through the LM head, the top-decoded subwords closely match the aligned discrete reasoning spans.
@article{suleymanzade2025mux,
author = {Suleymanzade, Ayhan and Gozeten, Halil Alperen and Bronstein, Michael and Ceylan, \.{I}smail \.{I}lkan and Kim, Jinwoo},
title = {{MUX}: Continuous Reasoning via Multiplexed Tokens},
year = {2025},
}