MLOps Engineer

EuropeArmeniaSpainGeorgiaSerbiaRelocationRemoteSenior

Relocation support is available to our hubs in Armenia, Georgia, Serbia, and Spain, including flights, temporary accommodation, and legal setup.

Key Skills and Responsibilities

LLM Serving & Model Management:

  • Deep expertise in high-throughput serving using vLLM, NVIDIA TensorRT-LLM, and sglang to minimize latency and maximize hardware efficiency.

  • Hands-on experience deploying and optimizing large-scale open-weights models, specifically DeepSeek 3.1/3.2, Qwen, and GPT-OSS variants.

  • Advanced optimization and security hardening of Docker specifically for GPU environments.

  • Managing model weights and orchestration within Kubernetes (GKE) environments.

  • Real-Time Data Engineering & CDC:

  • Designing and maintaining high-throughput CDC (Change Data Capture) pipelines using the Apache ecosystem (e.g., Debezium, Kafka) to sync data from Cloud PostgreSQL.

  • Deploying and tuning ClickHouse for real-time analytics, ML feature storage, and high-speed logging.

  • Orchestrating complex ML data workflows using Airflow (Google Cloud Composer) to ensure data reliability. Must

Core Infrastructure & Networking:

  • Strong Linux systems expertise including internals, networking, and performance tuning for large-scale distributed systems.

  • Experience with Istio service mesh to manage microservices communication and traffic.

  • Provisioning and maintaining dedicated GPU nodes (A100/H100/H200/B200), including driver management and OS-level tuning using Ansible.

  • Solid Kubernetes expertise: controllers, CRDs, CNI, and Ingress.

  • CI/CD & Tooling:

  •  Implementing pipelines as code within GitLab CI, managing runners, caching, and security scanning.

  • Infrastructure as Code with Terraform and Terragrunt.

  • Proficiency in  Python/Bash for building custom automation and AI Agent tooling.

Load Testing & Observability:

  • Conducting rigorous load testing for GenAI applications, focusing on metrics like TTFT, TPS, and RPS.

  • Deploying and managing LiteLLM Gateway for unified API access, load balancing, and cost tracking.

  • Experience with Datadog for monitoring GPU utilization, inference health, and log pipelines.

Soft Skills:

  • Strong ownership mindset: balancing speed, reliability, and cost.

  • Comfortable working cross-functionally with developers, security, and compliance.

  • Excellent sense of responsibility and accountability.

  • English B2 or higher.

Nice to Have:

Experience with PCI-DSS, SOC2, or regulations compliance environments.

Our Tech Stack: Linux, Docker, Kubernetes, GCP (GKE, Cloud PostgreSQL), Datadog, GitLab, Apache CDC, ClickHouse, Airflow, Istio, Terraform, Terragrunt, Ansible, vLLM, TensorRT-LLM, sglang, LiteLLM, DeepSeek, Qwen, Go, Python

What we offer

  • Full-time B2B contract

  • Fully remote setup, work from anywhere in Europe

  • Up to 20% tax allowance

  • 22 paid leave days annually

  • Stock options (ESOP) in a fast-scaling, pre-IPO company

  • Flexi benefits you can use for wellness, travel, or learning

  • Work alongside a high-performing, international engineering team in a global fintech unicorn

Published on: 1/30/2026

Tabby

Tabby

Tabby is a UAE-based buy now, pay later method that enables customers to purchase products online or in store and split the payment over 4 monthly installments.

Website

See all 6 jobs at Tabby

Please let Tabby know you found this job on Wantapply.com. It helps us to get more jobs on our site. Thanks!