ClockBench: Even the best AI models can't reliably read the clock
earthworm @ SnoringEarthworm @sh.itjust.works Posts 0Comments 7Joined 2 days ago

earthworm @ SnoringEarthworm @sh.itjust.works
Posts
0
Comments
7
Joined
2 days ago
This seems like a dumb benchmark.
What do you mean trivial? Most humans I know can't read the most basic white-background-big-black-numbers clocks.
Someone rigged the jury to get 90% on this: