AI Signals Worth Watching: March 15, 2026

Three signals this week that say something real about where AI is going.

GPT-5.4 crossed the human baseline on desktop tasks

OpenAI’s GPT-5.4 scored 75% on OSWorld-V — a benchmark that tests performance on real desktop productivity tasks, not synthetic puzzles. The human baseline is 72.4%.

That number matters less as a headline than as a transition marker. Until now, AI could answer questions, summarize documents, and generate drafts. OSWorld-V tests whether a model can actually navigate software, complete multi-step workflows, and recover from errors — the same things a junior analyst or EA does in real work environments.

The model crossed above human performance. Which means the question has shifted. It’s no longer “can AI do the work?” It’s “who decides what work to do, and in what order?”

That’s an orchestration and context problem, not a capability problem.

Apple chose Google — and it matters more than the model numbers

Apple signed a deal to make Google’s Gemini the default AI layer powering Siri, launching with iOS 26.4 and the iPhone 17e. Google gets $1–5B/year and default placement across 2+ billion devices. OpenAI loses the distribution position it didn’t officially have but was informally competing for.

The deeper signal isn’t which model won. It’s what this arrangement reveals about where the AI platform stack is settling. Apple owns the device. Google now owns the inference layer. What neither of them owns is the user’s context — their relationships, open loops, priorities, and operational history.

Platform AI is converging on a relatively small number of infrastructure winners. The personal intelligence layer — the one that knows you, not just your device — is still wide open.

Enterprise agents are past the pilot phase

New data this week: 57% of organizations already run multi-step agent workflows in production. 81% plan to expand into more complex agent use cases this year. And Galileo launched an open-source governance framework for AI agent behavior on March 13 — the fact that governance infrastructure is being built tells you this is no longer an R&D conversation.

The interesting sub-signal: domain-specific agents are winning over generalist chatbots. Companies want agents that understand a specific function deeply — not an agent that can do anything at a shallow level. Generalism is becoming a liability. Context and specialization are becoming the actual differentiators.

These three signals point at the same thing. Raw AI capability is no longer the constraint. The remaining problem is: which system knows enough about a specific person or function — their priorities, context, timing, relationships — to use that capability well?

That’s a different problem. And it’s the one worth building toward.

Eliran Keren — Founder of Deeplica, building the coordination layer for humans who’d rather direct than operate.