Site Reliability Engineer
We're looking for an SRE Engineer to support and optimize large-scale distributed systems, ensuring high availability, performance, and reliability across production environments. You will monitor system health, troubleshoot complex issues, and drive improvements through automation, observability, and site reliability engineering best practices.
Responsibilities
Production Support and Incident Response
Identify, analyze, and resolve issues in production and non-production systems.
Participate in incident response, root cause analysis, and follow-up actions.
Take part in an on-call rotation and support production incidents when needed, including outside regular working hours.
Help develop and improve the observability system.
Collect and analyze metrics from operating systems, infrastructure, and applications.
Use monitoring data to support performance tuning, fault finding, and capacity planning.
Implement, maintain, and improve CI/CD processes.
Create sustainable systems and services through automation and continuous improvement.
Reduce manual work and improve operational efficiency.
Partner with development teams to improve service reliability, testing, deployment, and release processes.
Support platform stability, scalability, and operational readiness.
Work closely with development, QA, infrastructure, and other cross-functional teams.
Create and maintain clear technical documentation, runbooks, operational guides, and support procedures.
Requirements
Strong SQL skills (T-SQL preferred), including query optimization, performance tuning, and data integrity management.
Hands-on experience with Microsoft SQL Server, database design, migrations, and partitioning strategies.
Experience with monitoring and observability tools such as Prometheus, Grafana, and ELK.
Familiarity with cloud platforms (AWS, GCP, Azure).
Proficiency in Python and scripting (Bash/PowerShell) for automation, ETL processes, data manipulation, and API integrations.
Basic understanding of networking concepts and protocols (HTTP, DNS, CDN).
Additional Skills:
Experience with Apache Airflow, Docker, Kubernetes, Ansible/IaC, and CI/CD tools (GitLab, Jenkins).
Strong communication and collaboration skills, with a proactive, problem-solving mindset.
English level: Intermediate (B1) or higher.
Experience with Airflow, Docker, Kubernetes, Ansible/IaC, and CI/CD pipelines.
Strong communication skills and a proactive approach to problem-solving.
English level: B1+.
Benefits
Quarterly bonuses based on Company performance
24 working days of annual leave
Corporate events and team building activities
Udemy Business unlimited membership & language training courses
Professional and personal development opportunities in a fast-growing environment
Published on: 5/29/2026

Libertex
The multi-awarded online trading platform, Libertex, enables traders to access the market and invest in stocks or trade CFDs with underlying assets being commodities, Forex, ETFs, cryptocurrencies, and others.
Please let Libertex know you found this job on Wantapply.com. It helps us to get more jobs on our site. Thanks!
Unlock access with Plus




