#News

New AGI Test ‘ARC-AGI-2’ Comes as a Challenge for Even Advanced AI Models

New AGI Test ‘ARC-AGI-2’ Comes as a Challenge for Even Advanced AI Models

Date: March 25, 2025

The Arc Prize Foundation has announced a new AGI test called ‘ARC-AGI-2’ with the purpose of measuring AI’s general fluid intelligence.

The test is designed to assign never-seen-before tasks to AI chatbots, which might be easier for humans, but for AIs, that’s not the case.

The test adopts formats from its predecessor, ARC-AGI-1. However, making the whole process more advanced significantly increases the signal strength, reflecting any AI model’s real fluid intelligence. 

The ‘ARC-AGI-2’ model is designed to ensure that the systems being tested demonstrate high adaptability and efficiency. 

What separates ARC-AGI from alternative benchmarks is the fact that while most benchmarks focus on testing ‘PHD++ Skills,’ this test takes an opposite approach. 

As the official announcement states,

“Every ARC-AGI-2 task was solved by at least 2 humans in 2 attempts or less in a controlled study with hundreds of human participants. This matches the rules we hold for AI, which gets two attempts per task.”

Here’s How the Results Looked

François Chollet, co-founder of The Arc Prize Foundation, wrote on X

“ARC-AGI-2 is fully human-calibrated. We tested these tasks with 400 people in live sessions, and we only kept tasks that could reliably be solved by multiple people. Each eval set (public, private, semi-private) has the exact same human difficulty – average people in our test sample achieve 60% with no prior training, and a panel of 10 people achieve 100%.”

Here are the results based on the official ARC-AGI Leaderboard.

System ARC-AGI-1 ARC-AGI-2 Efficiency (cost/task)
Human panel (at least 2 humans) 98% 100% $17
Human panel (average) 64.20% 60% $17
o3-low (CoT + Search/Synthesis) 75.70% 4%* $200
o1-pro (CoT + Search/Synthesis) ~50% 1%* $200*
The ARChitects (Kaggle 2024 Winner) 53.50% 3% $0.25
o3-mini-high (Single CoT) 35% 0.00% $0.41
r1 and r1-zero (Single CoT) 15.80% 0.30% $0.08
gpt-4.5 (Pure LLM) 10.30% 0.00% $0.29

How Does ARC-AGI-2 Work?

ARC-AGI-2 tests AI fluid intelligence with novel visual puzzles, demanding adaptability and efficiency over brute force. Unlike ARC-AGI-1, it focuses on symbol interpretation, multi-rule reasoning, and context, using a 1,000-task training set and 120-task evaluation sets. 

AI gets two attempts per task, yet top models like o3-low (4%) and o1-pro (1%) trail the human average of 60%. Tied to ARC Prize 2025, it pushes for 85% accuracy at $0.42 per task, aiming for true AGI.

In Additional Announcement, the ARC Prize 2025 Made a Return

The ARC Prize has made another return on Kaggle, starting this week. Developers achieving 85% accuracy while spending no more than $0.42 per task are eligible. This dual focus on high performance and low cost aims to drive innovation toward efficient, adaptable AI systems—key traits of artificial general intelligence (AGI). 

The contest offers $1 million in prizes, including a $700K Grand Prize for the first team to hit the 85% threshold within Kaggle’s computing limits.

Arpit Dubey

By Arpit Dubey LinkedIn Icon

Have newsworthy information in tech we can share with our community?

Post Project Image

Fill in the details, and our team will get back to you soon.

Contact Information
+ * =