This job has expired and no longer accepts applications.

Lead ML Systems Engineer, Voice AI

ArmeniaHybridLead

As a Lead ML Systems Engineer, you will own the architecture, performance, and scalability of Krisp Cloud’s real-time Voice AI serving infrastructure.

You will be responsible for transforming state-of-the-art research models into highly optimized, reliable, and cost-efficient production systems that power latency-sensitive, mission-critical Voice AI services.

This role sits at the intersection of machine learning, distributed systems, GPU performance engineering, and large-scale infrastructure, and requires deep systems thinking and long-term architectural ownership.

What you'll do

Model Serving & Production Performance

Prototype, implement, and benchmark critical components of the serving stack.
Architect and implement inference and serving strategies defining how models are packaged, deployed, replicated, batched, scheduled, and optimized under real-time constraints.
Partner with Research and Platform teams to drive deep performance optimization across runtime, precision (FP16/INT8/FP8), batching strategies, and GPU execution.
Design scaling behavior under variable real-time load (burst handling, replica strategy, workload partitioning).
Establish observability standards across inference services (latency metrics, GPU profiling, tracing, performance regression detection).
Lead root cause analysis of systemic performance regressions and implement structural improvements.
Partner closely with MLOps and Platform teams to operationalize infrastructure while retaining architectural ownership of the serving layer.

Technical Leadership

Drive alignment between model design and production constraints, ensuring research translates into performant, scalable, cost-effective systems.
Mentor senior engineers through design reviews, deep technical discussions, and hands-on collaboration.
Shape the long-term architectural direction for Voice AI serving infrastructure through both implementation and strategic design.

What we are looking for

Experience

5+ years building performance-critical backend or distributed systems.
Hands-on experience deploying and operating ML inference systems in production environments.
Experience working on latency-sensitive or real-time services.
Demonstrated ownership of significant system components or architectural decisions in production environments.
Track record of improving performance, scalability, or cost efficiency of production systems.

Technical Depth

Strong systems background (distributed systems, networking, concurrency, performance engineering).
Hands-on experience deploying and optimizing GPU-based inference systems in production (TensorRT or similar runtimes; graph optimization, precision tuning, memory optimization, CUDA-level profiling).
Strong experience working with high-performance transformer/LLM inference engines (e.g., vLLM or similar), including continuous batching, KV cache optimization, and throughput tuning.
Deep understanding of modern transformer inference optimizations (e.g., efficient attention mechanisms, KV caching strategies, memory-efficient attention).
Experience with model serving frameworks (e.g., Triton, Ray Serve, or custom high-performance serving stacks).
Experience with quantization (INT8/FP16/FP8), ONNX optimization, and advanced batching strategies.
Hands-on GPU profiling and performance tuning (memory fragmentation, utilization optimization, latency reduction).
Strong programming skills in Python and/or C++.
Experience with Docker, Kubernetes, and cloud-native deployment architectures.

Nice to Have

Experience optimizing ASR or TTS systems for real-time production workloads.
Experience with streaming inference and low-latency (<200ms) systems.
Experience building cost-efficient inference infrastructure at scale.
Familiarity with CUDA internals or custom kernel optimization.

Published on: 2/25/2026

Krisp

Krisp is at the forefront of voice-AI technologies, enabling developers and enterprises to integrate advanced voice capabilities — from noise cancellation and turn-taking to real-time voice translation and accent conversion.

Website

See 1 job at Krisp

Similar jobs

ChillBase

Lead ML Systems Engineer, Voice AI

What you'll do

What we are looking for

Similar jobs

System Administrator

System Administrator

System Administrator