Blog/Featured

Blog cover

Human-Aligned Reward Modeling for AI: EditReward's 200K-Pair Dataset

Blog cover

Data Curation Beats Scaling: Why 20K High-Quality Samples Outperform 46K Noisy Ones in AI Image Editing

Blog cover

NL2Repo-Bench: Why GPT-5 & Gemini Struggle with Long-Horizon Coding

Blog cover

Beyond Crowdsourcing: How SuperGPQA Uses PhD Experts to Solve LLM Data Leakage

Blog cover

Have LLMs Hit a Ceiling? Why SuperGPQA Proves the AGI Journey is Just Beginning

Blog cover

2077AI 2025 Annual Report: Pioneering Open Source AI Innovation

Blog cover

GPT-5 Series vs. Gemini 3 Pro: The Verdict from SuperGPQA

Blog cover

Scaling Test-Time Compute: How CriticLean Anticipated DeepSeekMath

Blog cover

Google Gemini 3 Sets New SOTA on OmniDocBench: The New Standard for Document AI

Blog cover

Meet VideoScore2: The AI Film Critic That Thinks Before It Scores

Blog cover

IWR-Bench: Can AI Rebuild an Interactive Website Just by Watching a Video?

Blog cover

Introducing EDITREWARD: The AI Judge That’s Closing the Gap in Open-Source Image Editing

Blog cover

Unlocking Deeper Multimodal Understanding: Introducing PIN-200M, A Massive Dataset for Next-Gen LMMs

Blog cover

Introducing Chain-of-Agents: A New Paradigm for Agent Foundation Model

Blog cover

VeriGUI: The Open-Source Benchmark Testing AI Agents Real-World Capabilities

Blog cover

Unveiling GPT-5’s Two Faces: SuperGPQA Benchmark Analysis

Blog cover

Creative Writing Dataset with Thought Processes: Unleashing Human-like Creativity in AI

Blog cover

Proudly Announce Abaka AI Join 2077AI Community as Core Founding Contributor

Blog cover

FormalMATH Benchmark: A Formal Mathematics Benchmark for Pushing the Limits of AI

Blog cover

Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models

Blog cover

A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench

Blog cover

SuperGPQA: Scaling LLM Evaluation Across 285 Graduate Disciplines

Blog cover

OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving

Blog cover

Matrix Dataset: A Revolutionary Bilingual AI Pre-training Corpus

Blog cover

PIN Dataset: A Unified Paradigm for Multimodal Learning

Human-Aligned Reward Modeling for AI: EditReward's 200K-Pair Dataset

Data Curation Beats Scaling: Why 20K High-Quality Samples Outperform 46K Noisy Ones in AI Image Editing

NL2Repo-Bench: Why GPT-5 & Gemini Struggle with Long-Horizon Coding

Beyond Crowdsourcing: How SuperGPQA Uses PhD Experts to Solve LLM Data Leakage

Have LLMs Hit a Ceiling? Why SuperGPQA Proves the AGI Journey is Just Beginning

2077AI 2025 Annual Report: Pioneering Open Source AI Innovation

GPT-5 Series vs. Gemini 3 Pro: The Verdict from SuperGPQA

Scaling Test-Time Compute: How CriticLean Anticipated DeepSeekMath

Google Gemini 3 Sets New SOTA on OmniDocBench: The New Standard for Document AI

Meet VideoScore2: The AI Film Critic That Thinks Before It Scores

IWR-Bench: Can AI Rebuild an Interactive Website Just by Watching a Video?

Introducing EDITREWARD: The AI Judge That’s Closing the Gap in Open-Source Image Editing

Unlocking Deeper Multimodal Understanding: Introducing PIN-200M, A Massive Dataset for Next-Gen LMMs

Introducing Chain-of-Agents: A New Paradigm for Agent Foundation Model

VeriGUI: The Open-Source Benchmark Testing AI Agents Real-World Capabilities

Unveiling GPT-5’s Two Faces: SuperGPQA Benchmark Analysis

Creative Writing Dataset with Thought Processes: Unleashing Human-like Creativity in AI

Proudly Announce Abaka AI Join 2077AI Community as Core Founding Contributor

FormalMATH Benchmark: A Formal Mathematics Benchmark for Pushing the Limits of AI

Breaking Traditional Knowledge Dependency: KOR-Bench for Evaluating Intrinsic Reasoning Abilities of Models

A Novel Paradigm for Model Evaluation: The Innovative Multi-source Document Parsing Evaluation Framework OmniDocBench

SuperGPQA: Scaling LLM Evaluation Across 285 Graduate Disciplines

OmniHD-Scenes: A Next-Generation Multimodal Dataset for Autonomous Driving

Matrix Dataset: A Revolutionary Bilingual AI Pre-training Corpus

PIN Dataset: A Unified Paradigm for Multimodal Learning