Wantapply.com

Senior Site Reliability Engineer

EuropeRemoteSenior

We are looking for a Senior SRE Engineer to drive the design, implementation, and evolution of our Kubernetes-based platform in a multi-cloud environment (GCP/AWS). At Finom, SREs are not just executors of tasks; you are the architects of reliability. 

This role requires strong ownership of reliability, scalability, and platform architecture for high-load, mission-critical systems operating 24/7.

What You Will Be Doing

  • Lead the Platform Evolution: Design and operate our Kubernetes ecosystem (GKE, multi-cluster) with a focus on high availability and zero-downtime operations.

  • Build "Paved Roads": Own and evolve our PaaS strategy, using GitOps (ArgoCD) and CI/CD (GitLab) to empower domain teams to deploy independently.

  • Architect Reliability: Define and implement our observability strategy across metrics, logs, and tracing (Prometheus, VictoriaMetrics, OpenTelemetry).

  • Drive Infrastructure-as-Code: Lead the automation of our infrastructure using Terraform, ensuring all resources are standardized and version-controlled.

  • Own the Error Budget: Partner with engineering teams to establish and manage SLOs, SLAs, and incident management frameworks.

  • Disaster Recovery Mastery: Design and participate in regular DR drills, implementing blue/green and active/passive strategies across regions to ensure service continuity.

  • Innovate Operations: Proactively apply AI-driven approaches to improve operational efficiency and automated bottleneck detection.

Who You Are

  • Production K8s Mastery: Strong hands-on experience managing Kubernetes (GKE preferred) in high-load, multi-cluster production environments.

  • Cloud Infrastructure: Deep experience with GCP (AWS is a strong plus) and Terraform for large-scale infrastructure.

  • GitOps Expertise: Solid experience with ArgoCD, GitLab CI, and the "Infrastructure as Code" philosophy.

  • Observability Expert: Deep knowledge of the Prometheus/Grafana stack and implementing tracing/logging at scale.

  • System Design: Proven ability to design highly available 24/7 systems with automated failover and rollback capabilities.

  • English Fluency: English level B2+ for effective cross-functional communication.

Nice-to-Haves

  • Compliance Knowledge: Understanding of banking-grade standards like PCI DSS, GDPR, or ISO 27001.

  • Distributed Systems: Experience with Kafka (Confluent), RabbitMQ, or managing high-load Redis and PostgreSQL clusters.

  • AI for Ops: Experience using AI tools to improve alerting, anomaly detection, or engineering efficiency.

  • Security-Minded: Experience with Vault for secret management and credential rotation.

Our Infrastructure Landscape

  • Primary Cloud: GCP (~90%)

  • Orchestration & Deploy: GKE, ArgoCD, GitLab CI

  • Automation: Terraform

  • Data & Messaging: PostgreSQL, Kafka, Redis, RabbitMQ

  • Observability: Prometheus, Grafana, VictoriaMetrics, OpenTelemetry, Cloud Logging

  • Security: Vault

Published on: 5/27/2026

Finom

Finomverified company badge

Finom is an online payment solution for entrepreneurs that makes it easy to open a business account and securely manage their finances.

Website

See all 9 jobs at Finom

Unlock access with PlusPlus

Please let Finom know you found this job on Wantapply.com. It helps us to get more jobs on our site. Thanks!

Similar jobs