Product Manager, AI Agents Testing
Today, the admins who configure and manage these agents — CX managers, bot builders, operations leads — lack the tools to confidently test agent behavior before going live, measure quality in production, or experiment with changes safely. You'll own the end-to-end product strategy for our Testing & Observability suite — the layer that lets admins simulate conversations against their real knowledge and procedures, score agent quality across accuracy, tone, and policy adherence, run A/B experiments on agent behavior, and catch regressions before they reach end users. This is a strategic opportunity that directly determines whether enterprises can trust and scale agentic AI in their customer service operations.
Key Responsibilities
Own product strategy and roadmap for AI agent testing — simulation, quality scoring, experimentation, regression detection, and conversation tracing
Ship testing as an integrated experience embedded in the builder and deployment flow
Define how simulation works end-to-end: scenario generation from real conversation patterns, automated pass/fail evaluation, and results that point admins to exactly what broke and where
Build the experimentation layer — A/B testing of agent behavior, staged rollouts with statistical rigor, safe iteration on tone and resolution strategies
Design a pre-publish readiness gate that gives admins a quantified view of risk before every deployment — specific issues, coverage gaps, comparison to current production behavior
Partner with ML, QA, and platform teams on scoring methodology, simulation infrastructure, and tracing architecture
Make all of this usable by non-technical admins — CX managers, bot builders, operations leads who need answers without writing code or filing engineering tickets
Required Qualifications
Several years of product management experience, with 2+ years building for non-technical users in complex technical domains (QA tooling, no-code platforms, admin consoles, workflow builders) in B2B SaaS
Experience shipping AI/ML products where evaluation and reliability were real concerns, not afterthoughts
You understand why traditional testing doesn't work for LLM-based systems and have opinions about what does
Ability to ship platform capabilities through user-facing product surfaces — you don't just build infrastructure, you make it usable
Experience integrating acquired or adjacent products into a unified experience — combining capabilities from different teams, codebases, or organizations into something that feels like one product
Track record coordinating across 3+ engineering teams and multiple departments to deliver one coherent product experience
Bonus Qualifications
Experience building simulation, synthetic data, or automated testing products
Background in conversational AI, chatbot platforms, or customer service technology
Familiarity with LLM evaluation approaches — human-in-the-loop scoring, automated rubrics, AI-as-judge
Experience with experimentation infrastructure — A/B testing, staged rollouts, feature flagging at scale
Experience turning internal prototypes into customer-facing products
Success in the Role
Testing becomes part of how customers build and deploy agents — not something they do separately, but part of the flow
Customers can quantify whether their agent is ready to go live, and catch regressions before end users hit them
Automated resolution rates improve because customers can actually diagnose and fix quality issues instead of guessing
The testing platform becomes a shared capability used beyond AI Agents — consumed by other product teams that need to validate AI-powered experiences
Published on: 5/25/2026
Zendesk
Zendesk is redefining customer and employee experience. Our AI-powered solutions help over 100,000 companies build better relationships and grow. We push boundaries of what’s possible and create tech that brings people closer.
Unlock access with Plus
Please let Zendesk know you found this job on Wantapply.com. It helps us to get more jobs on our site. Thanks!




