This job has expired and no longer accepts applications.

Lead ML Systems Engineer, Voice AI

ArmeniaHybridLead

As a Lead ML Systems Engineer, you will own the architecture, performance, and scalability of Krisp Cloud’s real-time Voice AI serving infrastructure.

You will be responsible for transforming state-of-the-art research models into highly optimized, reliable, and cost-efficient production systems that power latency-sensitive, mission-critical Voice AI services.

This role sits at the intersection of machine learning, distributed systems, GPU performance engineering, and large-scale infrastructure, and requires deep systems thinking and long-term architectural ownership.

What you'll do

Model Serving & Production Performance

  • Prototype, implement, and benchmark critical components of the serving stack.

  • Architect and implement inference and serving strategies defining how models are packaged, deployed, replicated, batched, scheduled, and optimized under real-time constraints.

  • Partner with Research and Platform teams to drive deep performance optimization across runtime, precision (FP16/INT8/FP8), batching strategies, and GPU execution.

  • Design scaling behavior under variable real-time load (burst handling, replica strategy, workload partitioning).

  • Establish observability standards across inference services (latency metrics, GPU profiling, tracing, performance regression detection).

  • Lead root cause analysis of systemic performance regressions and implement structural improvements.

  • Partner closely with MLOps and Platform teams to operationalize infrastructure while retaining architectural ownership of the serving layer.

Technical Leadership

  • Drive alignment between model design and production constraints, ensuring research translates into performant, scalable, cost-effective systems.

  • Mentor senior engineers through design reviews, deep technical discussions, and hands-on collaboration.

  • Shape the long-term architectural direction for Voice AI serving infrastructure through both implementation and strategic design.

What we are looking for

Experience

  • 5+ years building performance-critical backend or distributed systems.

  • Hands-on experience deploying and operating ML inference systems in production environments.

  • Experience working on latency-sensitive or real-time services.

  • Demonstrated ownership of significant system components or architectural decisions in production environments.

  • Track record of improving performance, scalability, or cost efficiency of production systems.

Technical Depth

  • Strong systems background (distributed systems, networking, concurrency, performance engineering).

  • Hands-on experience deploying and optimizing GPU-based inference systems in production (TensorRT or similar runtimes; graph optimization, precision tuning, memory optimization, CUDA-level profiling).

  • Strong experience working with high-performance transformer/LLM inference engines (e.g., vLLM or similar), including continuous batching, KV cache optimization, and throughput tuning.

  • Deep understanding of modern transformer inference optimizations (e.g., efficient attention mechanisms, KV caching strategies, memory-efficient attention).

  • Experience with model serving frameworks (e.g., Triton, Ray Serve, or custom high-performance serving stacks).

  • Experience with quantization (INT8/FP16/FP8), ONNX optimization, and advanced batching strategies.

  • Hands-on GPU profiling and performance tuning (memory fragmentation, utilization optimization, latency reduction).

  • Strong programming skills in Python and/or C++.

  • Experience with Docker, Kubernetes, and cloud-native deployment architectures.

Nice to Have

  • Experience optimizing ASR or TTS systems for real-time production workloads.

  • Experience with streaming inference and low-latency (<200ms) systems.

  • Experience building cost-efficient inference infrastructure at scale.

  • Familiarity with CUDA internals or custom kernel optimization.

Published on: 2/25/2026

Krisp

Krisp

Krisp is at the forefront of voice-AI technologies, enabling developers and enterprises to integrate advanced voice capabilities — from noise cancellation and turn-taking to real-time voice translation and accent conversion.

Website

See 1 job at Krisp