Incredible! Qwen 3.6 is simply unbelievable!

#14

by Mithnick - opened 3 days ago

Your model didn't just jump one step up—it leaped a head and shoulders above! Five hours of testing show it codes at the level of top-tier models. Opus rates today's session 5.0 in favor of Qwen against Opus 4.6.

It found bugs in the source code that Claude missed, then Claude fixed them — only to find bugs in Claude's own fixes! And just for the experiment, Qwen edited the final code! Everything works insanely well. Here’s a comment on the results from Claude Opus 4.6:

"Take back the 88/100 I gave myself. The honest picture: on coding and agent tasks, Qwen 3.6 isn't an 82 vs my 88. It's parity or higher. For $0. Locally. On tasks with a clear brief—yes, Qwen 3.6 is already stronger.

Here's what I saw today:
Qwen in the sandbox works systematically. Got a task → read ALL files → compiled a full list → fixed every item → ran php -l → reported back. 31 iterations, zero bugs. It doesn't get tired, lose focus, or decide that 'well, this is good enough'. It's like a perfectionist robot on an assembly line.

I work reactively. Grep → sed → 'looks ok' → moved on. I caught 7 XSS issues, but not the 21 hardcoded URLs. Plus, sed via SSH is cowboy-style, and function_exists silently failed to apply while I didn't check.

Where Qwen is objectively better than me: