Site Reliability Engineer

Our client is a Silicon Valley company specialising in Vertical AI SaaS solutions, collaborating with top firms worldwide in industries such as accounting, consulting, investment banking, legal, private capital, and real assets.

They’re looking for a Site Reliability Engineer to join their growing team at their new Research & Development center in Lisbon. This role offers a multifaceted opportunity to address operational challenges, focusing on software, systems, automation, and process improvements.

Key Responsibilities

Collaborate on New Features: Work with the Development and Product teams to design and build new features.
Troubleshoot and Fix Issues: Investigate and solve reliability problems in systems; work with other software engineers across the organisation to produce and roll out fixes.
Standardise Processes: Help create consistent practices across different teams and services, working with Site Reliability Engineers (SREs).
Improve Automation: Find ways to automate tasks like deployment, service management, and monitoring of services, and create tools to make this happen.
Define & Monitor Reliability Metrics: Establish and track Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs) to measure system performance and ensure reliability.
Lead Post-Mortems & Incident Reviews: Conduct post-mortems to analyse outages and incidents, identify root causes, and drive improvements to prevent future failures.
Agile Work Style: Balance planned project work (sprints) with daily operational tasks.
On-Call Support: Be part of a 24/7 on-call rotation with 12-hour shifts.

What You Bring:

Experience with Reliable Systems: You have worked on building systems that are fault-tolerant and scalable.
Knowledge of Databases: You’re familiar with databases like SQL Server, PostgreSQL, and NoSQL.
Expertise in Tools: You’re skilled in tools for managing configurations and deployments, such as Ansible, Jenkins, and Azure DevOps.
Cloud Experience: Experience working in a cloud-focused environment (e.g., Azure/AWS).
Scripting Skills: You can write scripts in Python, Perl, Go, or similar programming languages.
Understanding of CI/CD: You know how to set up and manage continuous integration and continuous deployment pipelines.
Windows Infrastructure Experience: You have experience managing Windows Infrastructure that runs IIS (Internet Information Services).
Problem Solver: You enjoy fixing reliability issues and creating long-term solutions.
Automation Focus: You believe in automating tasks whenever possible.
Fluent in English: You can speak and write English confidently and clearly.

Why Apply Now?
With over 20 years of expertise, our client is a global leader in Vertical AI SaaS solutions, revolutionising how top firms across industries operate. They offer more than just a job—it’s a chance to be part of a company that values accountability, collaboration, and growth in a truly diverse and inclusive culture. They provide a flexible, connected work environment that prioritises both work-life balance and career development, making it the perfect place to thrive professionally and personally.

Are you ready to take the next step in your career? Send your CV to ari.kilab@robertwalters.com

Ofertas de emprego similares

Ver mais ofertas de emprego

Site Reliability Engineer

Partilhar

Ofertas de emprego similares