This job has been archived and is no longer active.

ML OPS (DevOps)

EuropeCyprusRemote

5500$

  • Prohibited locations: RF, Ukraine, RB

  • English: B2

  • Years: ML - 6+ months, DevOps -3+ years commercial experience

Key Responsibilities

  • Development and support of end-to-end ML pipelines (training, validation, deployment, monitoring, retraining)

  • Construction and operation of CI/CD for models (test automation, packaging, and deployment)

  • Design of LLM/RAG pipelines, context management,

  • embedding dashboards (embedding quality/dynamics dashboards), index regeneration, prompt and fact-check testing (Grounding/citation)

  • MLOps platform setup: experiment tracking, model registry, feature store, monitoring

  • Management of ML infrastructure and environments (GPU/CPU pools, Kubernetes/EKS, Docker)

  • Implementation of deployment strategies: canary, shadow, A/B testing

  • Ensuring model quality monitoring (accuracy drift, data drift, PSI, SLO/SLA)

  • Artifact management (data, models, metadata, versions)

  • Security compliance (encryption, access control, auditing, operation in private VPCs)

  • Integrating ML models into backend services (API, gRPC, REST)

  • Collaborating with Data Engineering and Data Science teams

  • Documenting processes and best practices for ML infrastructure

  • Managing the cost and scaling of ML infrastructure in AWS

  • Data governance: storage policies (S3 lifecycle), dataset versioning (DVC/LakeFS), data lineage (OpenLineage), quality gates in CI/CD

Requirements

ML Ops Tools

  • MLflow or Kubeflow (experiments, registry)

  • Feature Store (Feast, Tecton, or custom)

  • Airflow, Prefect, or Kubeflow Pipelines (ML workflow orchestration)

Infrastructure and Containerization

  • Docker, Kubernetes/EKS

  • AWS S3, ECR, EKS, IAM, KMS, VPC

  • Terraform or Pulumi (IaC)

  • GitHub Actions, GitLab CI, or Jenkins (CI/CD)

  • Autoscaling, AWS Batch/Step Functions for offline processing and retrieval

Monitoring and Observability

  • Prometheus, Grafana, CloudWatch, CloudTrail

  • Model Quality Metrics (AUC, F1, Brier, logloss)

  • Stability metrics (drift detection, PSI)

  • LLM-specific metrics: tokens/sec, context length, prompt/response size, grounding rate, citation coverage, hallucination rate.

Key Competencies

  • Building a stable and secure ML infrastructure

  • Automation Full-cycle ML: from data to inference services

  • Quality control and stability of models in production

  • Effective collaboration with data science and data engineering teams

Joining Valletta Software Development means:

  • 🌍 A Global, Thriving Team

  • Join 100+ specialists from 20+ countries, united by a passion for outstanding

  • IT solutions.

  • 🚀Diverse projects: Fintech, MedTech, AI/ML, e-commerce, and more. Switch

  • teams or industries to broaden your skills.

  • 💡 Support at Every Step Client interview prep: We train you to succeed + give actionable feedback.

  • ✔️ Strategic stability: Well-structured processes, strong management, and long- term vision.

  • ✔️ Core values: Honesty, flexibility, innovation, and a people-first approach.

  • 💸 Regular salary review based on your personal results

  • ✨ Paid rest days and sick leaves;

Published on: 10/17/2025

Valletta Software

Valletta Software

Valletta Software - custom mobile/web software developer in the US and Europe.

Website

See all 3 jobs at Valletta Software