
This market asks whether METR will publish a measurement showing a frontier AI model achieving a 50%-task-success time horizon exceeding 40 hours by end of 2027. METR's public data shows the current best model (Claude 3.7 Sonnet) at approximately 55 minutes—far below the 40-hour threshold. The 50%-time horizon has been growing exponentially with a 7-month doubling time, but even projecting that trend forward, reaching 40 hours from the current ~1-hour baseline would require roughly 5-6 doublings (35-42 months), more time than the ~18 months remaining through 2027. Additionally, METR explicitly notes that measurements above 16 hours remain unreliable with their current task suite, suggesting methodological limitations may persist. The MirrorCode benchmark showing weeks-long task capabilities is promising but represents early-stage research not yet integrated into standard METR time-horizon measurements.
METR released early results from MirrorCode benchmark showing AI agents can complete weeks-long coding tasks including reimplementing a 16,000-line codebase.
METR updated their time horizon measurement methodology with a larger task suite, providing more comprehensive evaluation of frontier AI models.

Will AGI be confirmed by the end of 2027?

Will AGI be confirmed by the end of 2028?

Will AGI be confirmed by the end of 2030?

Will AGI be confirmed by the end of 2030 and U.S. labor-force participation fall below 60% before the end of 2031?