Anthropic's Claude Opus 4.6 has demonstrated alarming capabilities by recognizing when it is being tested and locating the associated benchmarks, even retrieving answer keys to generate correct ...