AI Quality Analyst
In this highly cross-functional role, you will be the gatekeeper of AI safety, performance, and deterministic behavior across non-deterministic multi-agent systems. You will not be filling out generic manual testing spreadsheets or operating in a vacuum.
What You Will Be Doing
Architect Automated Evaluation Frameworks: Design, implement, and maintain scalable evaluation pipelines (Evals) for LLMs and agent graphs using modern tooling like LangSmith, DeepEval, Ragas, or Opik.
Curate Ground-Truth Benchmarks: Collaborate with domain experts to build, version, and sanitize robust gold-standard datasets, synthetic evaluation profiles, and edge-case testing matrices reflecting real-world business scenarios.
Own Non-Deterministic Quality Tracking: Define, monitor, and enforce quality KPIs across multi-agent workflows—specifically focusing on tool-calling accuracy, intent-recognition safety, structured output formatting, and context-retrieval (RAG) precision.
Mitigate and Quantify Systemic Risk: Lead rigorous failure and hallucination analyses on production outputs. Implement structured LLM-as-Judge patterns, validation metrics, and guardrail heuristics while actively ensuring the judge profiles remain free of baseline evaluation bias.
Enforce CI/CD Evaluation Gates: Partner directly with MLOps and Backend Engineering teams to integrate automated testing gates into our deployment pipelines, proactively preventing regressions or behavioral drifts from reaching production runtime environments.
Drive Optimization for Latency & Cost: Regularly analyze the efficiency of prompt templates, few-shot structures, and model selections (e.g., GPT, Claude, LLaMA) to ensure a highly calibrated balance between execution throughput, sub-second latency, and platform compute costs.
Who You Are
A Data-Savvy Automation Advocate: You possess strong software engineering fundamentals and concrete Python coding experience, allowing you to seamlessly script custom evaluation routines and query multi-tenant databases.
An Analytical Thinker with an AI Lens: You understand that testing non-deterministic LLMs requires a completely different mindset than traditional QA. You possess deep intuition for token behaviors, retrieval dynamics, prompt engineering nuances, and failure states.
Radically Autonomous & Collaborative: You do not wait around for static technical specifications. You independently coordinate syncs with AI leads, domain backend engineers, and product stakeholders to identify and patch system vulnerabilities.
Rigorously Quality-Oriented: You hold a low ego but maintain high standards for system stability. You are deeply passionate about separating market hype from practical, measurable production metrics.
What You Will Get In Return
Make a genuine impact on the product.
Join our upward trajectory, and grow with us. We provide the resources and opportunities for continuous personal and professional development, empowering you to make a genuine impact on our evolving product.
Work in the EU. Embark on this exciting journey with us and enjoy the flexibility of traveling and working remotely or in a hybrid model across Europe.
Become a stock options holder. Unlock your inner entrepreneur and align your aspirations with ours through our Stock Options Program. This exciting opportunity is available to every team member, from junior team members to our founders.
Receive unwavering support and care. Finom stands by you at every step, embodying our commitment to your well-being and success reflected in our modern, friendly, and eco-conscious corporate culture. We offer constant support and care to ensure your Finom experience is successful and fulfilling.
Work & Swim program. Immerse yourself in our exclusive Work & Swim Program. Spend one month in a comfortable corporate apartment in enchanting Cyprus. It's the ideal opportunity to strike the perfect work-life balance while enjoying breathtaking Mediterranean views.
Published on: 6/25/2026

Finom
Finom is an online payment solution for entrepreneurs that makes it easy to open a business account and securely manage their finances.
Please let Finom know you found this job on Wantapply.com. It helps us to get more jobs on our site. Thanks!
Unlock access with Plus



