Cavendish Professionals

Site Reliability Engineering Architect

Stellenbeschreibung:

Overview

Site Reliability Engineering (SRE) Architect – Munich (Remote/Hybrid)

I am partnered with a global consulting organisation that is expanding its engineering leadership team in Germany. They are seeking an experienced SRE Architect to define and drive the long-term reliability, scalability, and performance strategy across complex, cloud-native systems. This is a senior architectural role with wide influence across engineering, combining deep technical expertise with leadership. You will set standards, frameworks, and practices that enable teams to deliver world-class services at scale.

Key Responsibilities

  • Architect & Strategy – Design highly scalable and fault-tolerant infrastructure on leading cloud platforms (AWS, GCP, or Azure).
  • Reliability Frameworks – Define and govern SLOs, SLIs, and error budgets across engineering teams.
  • Observability – Lead observability design for metrics, tracing, logging, and alerting.
  • Automation & IaC – Champion Infrastructure as Code (Terraform, Ansible) for secure, repeatable provisioning.
  • Resilience & Recovery – Develop disaster recovery strategies, resilience patterns, and chaos engineering practices.
  • Leadership & Mentoring – Act as a thought leader, mentoring engineers and embedding reliability best practices across the organisation.
  • Incident Evolution – Analyse major incidents, drive systemic improvements, and evolve incident management culture.

Key Requirements

  • 10+ years in software engineering, DevOps, or systems engineering, including 5+ years in senior SRE/architecture roles.
  • Expertise in at least one major cloud provider (AWS, GCP, Azure).
  • Strong hands-on experience with Kubernetes and microservices at scale.
  • Proven skills in Infrastructure as Code (Terraform, Ansible, Chef, or Puppet).
  • Solid background in observability platforms (Prometheus, Grafana, OpenTelemetry, ELK, Datadog, etc.).
  • Proficiency in Python or Go for automation and tooling.
  • Deep knowledge of distributed systems, networking, and high-availability design patterns .

Nice-to-Haves

  • Professional cloud certifications.
  • Knowledge of service mesh technologies
  • DevSecOps/security best practices.
  • Experience leading large-scale tech transformations.

If this sounds like something you’d thrive in—or even if you’re just curious—I’d love to chat and tell you more.

Cavendish (Recruitment) Professionals Ltd are proud to be an equal opportunity employer and we believe that inclusivity begins with the candidate experience. All qualified applicants will receive consideration for employment regardless of gender, race, age, sexual orientation, religion, or belief.

Seniority level

  • Mid-Senior level

Employment type

  • Full-time

Job function

  • Information Technology

Industries

  • IT Services and IT Consulting

#J-18808-Ljbffr
NOTE / HINWEIS:
EnglishEN: Please refer to Fuchsjobs for the source of your application
DeutschDE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung

Stelleninformationen

  • Veröffentlichungsdatum:

    02 Jan 2026
  • Standort:

    WorkFromHome
  • Typ:

    Vollzeit
  • Arbeitsmodell:

    Vor Ort
  • Kategorie:

  • Erfahrung:

    2+ years
  • Arbeitsverhältnis:

    Angestellt

KI Suchagent

AI job search

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!

Diese Jobs passen zu Deiner Suche:

Vodafone GmbH
Senior Expert Site Reliability Engineer (m/w/d)
Vodafone GmbH
partner ad:img
Vollzeit Düsseldorf
12 Jan 2026Development & IT
E.ON Grid Solutions GmbH
Site Reliability Engineer (SRE) - Fokus Monitoring
E.ON Grid Solutions GmbH
Vollzeit Dortmund
14 Jan 2026Development & IT
Liebherr-IT Services GmbH
Site Reliability Engineer (m/w/d)
Liebherr-IT Services GmbH
Vollzeit Dettingen an der Iller
13 Jan 2026
STACKIT
(Senior) Site Reliability Engineer / Distributed Cloud - STACKIT (m/w/x)
STACKIT
Vollzeit Stuttgart
13 Jan 2026
Scalable GmbH
(Senior) Cloud Site Reliability Engineer (Scalability) (m/f/x)
Scalable GmbH
Vollzeit WorkFromHome
01 Jan 2026
company logo
Site Reliability Engineer (fmx)
ilert GmbH
Vollzeit WorkFromHome
14 Jan 2026
DFS Deutsche Flugsicherung GmbH
DevOps & Site Reliability Engineer • für Flugsicherungssysteme
DFS Deutsche Flugsicherung GmbH
Vollzeit Bremen
14 Jan 2026
Xempus
Senior Site Reliability Engineer (m/f/d)
Xempus
Vollzeit WorkFromHome
13 Jan 2026