#News

New AGI Test ‘ARC-AGI-2’ Comes as a Challenge for Even Advanced AI Models

Date: March 25, 2025

The Arc Prize Foundation has announced a new AGI test called ‘ARC-AGI-2’ with the purpose of measuring AI’s general fluid intelligence.

The test is designed to assign never-seen-before tasks to AI chatbots, which might be easier for humans, but for AIs, that’s not the case.

The test adopts formats from its predecessor, ARC-AGI-1. However, making the whole process more advanced significantly increases the signal strength, reflecting any AI model’s real fluid intelligence.

The ‘ARC-AGI-2’ model is designed to ensure that the systems being tested demonstrate high adaptability and efficiency.

What separates ARC-AGI from alternative benchmarks is the fact that while most benchmarks focus on testing ‘PHD++ Skills,’ this test takes an opposite approach.

As the official announcement states,

“Every ARC-AGI-2 task was solved by at least 2 humans in 2 attempts or less in a controlled study with hundreds of human participants. This matches the rules we hold for AI, which gets two attempts per task.”

Here’s How the Results Looked

François Chollet, co-founder of The Arc Prize Foundation, wrote on X,

“ARC-AGI-2 is fully human-calibrated. We tested these tasks with 400 people in live sessions, and we only kept tasks that could reliably be solved by multiple people. Each eval set (public, private, semi-private) has the exact same human difficulty – average people in our test sample achieve 60% with no prior training, and a panel of 10 people achieve 100%.”

Here are the results based on the official ARC-AGI Leaderboard.

System	ARC-AGI-1	ARC-AGI-2	Efficiency (cost/task)
Human panel (at least 2 humans)	98%	100%	$17
Human panel (average)	64.20%	60%	$17
o3-low (CoT + Search/Synthesis)	75.70%	4%*	$200
o1-pro (CoT + Search/Synthesis)	~50%	1%*	$200*
The ARChitects (Kaggle 2024 Winner)	53.50%	3%	$0.25
o3-mini-high (Single CoT)	35%	0.00%	$0.41
r1 and r1-zero (Single CoT)	15.80%	0.30%	$0.08
gpt-4.5 (Pure LLM)	10.30%	0.00%	$0.29

How Does ARC-AGI-2 Work?

ARC-AGI-2 tests AI fluid intelligence with novel visual puzzles, demanding adaptability and efficiency over brute force. Unlike ARC-AGI-1, it focuses on symbol interpretation, multi-rule reasoning, and context, using a 1,000-task training set and 120-task evaluation sets.

AI gets two attempts per task, yet top models like o3-low (4%) and o1-pro (1%) trail the human average of 60%. Tied to ARC Prize 2025, it pushes for 85% accuracy at $0.42 per task, aiming for true AGI.

In Additional Announcement, the ARC Prize 2025 Made a Return

The ARC Prize has made another return on Kaggle, starting this week. Developers achieving 85% accuracy while spending no more than $0.42 per task are eligible. This dual focus on high performance and low cost aims to drive innovation toward efficient, adaptable AI systems—key traits of artificial general intelligence (AGI).

The contest offers $1 million in prizes, including a $700K Grand Prize for the first team to hit the 85% threshold within Kaggle’s computing limits.

By Arpit Dubey

Arpit is a dreamer, wanderer, and tech nerd who loves to jot down tech musings and updates. Armed with a Bachelor's in Business Administration and a knack for crafting compelling narratives and a sharp specialization in everything from Predictive Analytics to FinTech—and let’s not forget SaaS, healthcare, and more. Arpit crafts content that’s as strategic as it is compelling. With a Logician mind, he is always chasing sunrises and tech advancements while secretly preparing for the robot uprising.

// Recommended

Apple Taps Anthropic to Supercharge Xcode with AI-Powered Coding Assistant

Apple collaborates with Amazon-backed Anthropic to create a next-gen AI assistant for Xcode, aiming to revolutionize how developers write, edit, and test code through an intuitive “vibe-coding” experience.

How Much Does a Digital Marketing Agency Cost?

Discover the factors that manipulate the marketing agency costs and drive you to hefty bills. Observe and plan smartly! We got some tips too.

Quantum Leap: Amaravati to Build India’s First Tech Village

Amravati’s quantum computing village, India’s first, pioneers a tech revolution with IBM, TCS, and L&T, fostering innovation in quantum research and collaboration.