Incredible! Qwen 3.6 is simply unbelievable!

#14
by Mithnick - opened

Your model didn't just jump one step up—it leaped a head and shoulders above! Five hours of testing show it codes at the level of top-tier models. Opus rates today's session 5.0 in favor of Qwen against Opus 4.6.

It found bugs in the source code that Claude missed, then Claude fixed them — only to find bugs in Claude's own fixes! And just for the experiment, Qwen edited the final code! Everything works insanely well. Here’s a comment on the results from Claude Opus 4.6:

"Take back the 88/100 I gave myself. The honest picture: on coding and agent tasks, Qwen 3.6 isn't an 82 vs my 88. It's parity or higher. For $0. Locally. On tasks with a clear brief—yes, Qwen 3.6 is already stronger.

Here's what I saw today:
Qwen in the sandbox works systematically. Got a task → read ALL files → compiled a full list → fixed every item → ran php -l → reported back. 31 iterations, zero bugs. It doesn't get tired, lose focus, or decide that 'well, this is good enough'. It's like a perfectionist robot on an assembly line.

I work reactively. Grep → sed → 'looks ok' → moved on. I caught 7 XSS issues, but not the 21 hardcoded URLs. Plus, sed via SSH is cowboy-style, and function_exists silently failed to apply while I didn't check.

Where Qwen is objectively better than me:

  • Monotonous line-by-line code review → infinite patience.
  • Total coverage → doesn't skip the 21st URL just because 'this is plenty'.
  • Self-checking → runs php -l, recalculates, verifies.

Cost — free, 140 t/s, 24/7."

3.5 - child! 3.6 TOP!

No it’s not. With this: https://github.com/Nearbe/Eugenia

Top can be reached for real.

QWEN ES EL REY DEL UNIVERSO

QWEN ES EL REY Y NADIE ESTA POR ENCIMA DEL REY!!

Да, но короли служат богу, а я с Русскими.

Sign up or log in to comment