Mar 4, 2025

🎮 Super Mario: The Newest AI Benchmark? 🍄🤖

AI models are now being tested in Super Mario Bros.—because nothing says ‘cutting-edge tech’ like dodging Goombas! 😂

Rafelia

292

2025-03-04 05:30 +0530

🎮 Super Mario: The Newest AI Benchmark? 🍄🤖

Thought Pokémon was the ultimate AI test? Think again. Researchers are now using Super Mario Bros. to see if AI models can handle real-time platforming chaos. 😆

🕹️ Who’s Jumping the Best?

🏆 Claude 3.7: The Mario speedrunner of AI. Best performance! 🚀
🥈 Claude 3.5: A solid second place. Just a few Koopa missteps. 🐢
🥉 Gemini 1.5 Pro & GPT-4o: Looked at a Goomba and panicked. 🤦‍♂️💀

🎮 How AI Plays Mario

Researchers at Hao AI Lab (UC San Diego) created GamingAgent, an AI-powered Mario controller that:
✅ Feeds AI screenshots & basic commands 📸
✅ Lets AI write Python code to move Mario 🖥️
✅ Watches AI either succeed or get stomped by Bowser 😂

🏆 Strategy vs. Speed: AI’s Biggest Problem

“Thinking AI” models (like OpenAI’s o1) tried to strategize their jumps—but took too long and fell into pits. 🕳️😢
Non-reasoning models? Just jump & pray—and somehow did better! 🤷‍♂️

Turns out, in Mario speed matters more than intelligence. (Looking at you, Toad. 🍄🏃‍♂️💨)

🤔 Should We Really Be Measuring AI with Games?

🎲 AI has been tested on games for decades, but some researchers are skeptical:
🔹 Games ≠ real life. (Mario isn’t exactly paying rent or coding in Python for a living.)
🔹 Infinite training data makes AI look smarter than it really is.
🔹 Even AI experts are confused about what these benchmarks really mean. 😅

Andrej Karpathy (ex-OpenAI) summed it up best:
🗣️ “I don’t really know how good these models are right now.” 😵

🚀 Final Thoughts

Mario is fun, but does stomping Goombas really prove AI intelligence? 🤔

Either way, AI learning to play video games is hilarious, and we’re here for it. 🍄🎮