Insights from the frontier of AI cognitive design
Production autonomous agents on benchmarks like GAIA and WebArena report success rates of 40-60%, with tool-selection errors and hallucinated argument