I set 10 honesty traps for Claude Opus 4.8 – and a legal test broke it
I tested Opus 4.8 against 4.7 using coding, medical, finance, and legal traps, then cross-checked the results with multiple AIs.
Source: Latest news
I tested Opus 4.8 against 4.7 using coding, medical, finance, and legal traps, then cross-checked the results with multiple AIs.
Source: Latest news