ilert

Site Reliability Engineer (f/m/x)

ilert WorkFromHome

Stellenbeschreibung:

ilert Cologne, North Rhine-Westphalia, Germany

Site Reliability Engineer (f/m/x)

Location: Hybrid – Cologne (Rheinauhafen) — 3 days in the office, 2 remote (Tue + Thu)

Team: Engineering · Reports to CTO

Keep the world awake — build reliability at scale

ilert helps thousands of DevOps & IT teams detect, fix, and communicate incidents faster.

Our platform is mission-critical: customers rely on us 24/7 to keep their always-on businesses running.

As a Site Reliability Engineer at ilert, you’ll own the reliability, performance, and scalability of our core platform across AWS, Kubernetes, Kafka, and more.

Responsibilities

  • Run and evolve our AWS-based infrastructure
  • Operate and optimize self-managed Kafka, ClickHouse clusters and our Observability stack
  • Ensure resilience, disaster recovery, and capacity planning across the stack

Improve reliability & performance

  • Build and maintain SLOs, SLIs, error budgets, and observability dashboards
  • Debug production issues across layers (networking, Kubernetes, application, DB)
  • Improve performance of our ingestion pipeline

Automation & tooling

  • Automate operations with Terraform, Helm, Kubernetes operators, and internal tooling
  • Build tooling for safer deploys, blue/green rollouts, and automated verification
  • Strengthen incident response workflows through deep collaboration with our AI SRE agent team

Security & compliance

  • Implement best practices for workload isolation, secrets management, IAM, and auditability
  • Support our ISO27001 posture by automating controls and hardening our infrastructure

Cross-functional impact

  • Partner with Backend, AI, and Product teams to design reliable services
  • Participate in on-call rotation
  • Lead post-incident reviews and drive reliability improvements long-term

Requirements

  • 3+ years experience as SRE, Platform Engineer, DevOps Engineer, or Infrastructure Engineer
  • Strong hands‑on experience with AWS, Kubernetes, Linux internals, networking, performance tuning
  • Experience operating self‑managed distributed systems , ideally Kafka or ClickHouse
  • Strong understanding of observability
  • Experience automating infrastructure with Terraform and CI/CD systems
  • Fluent English (our working language); German optional

Benefits

NOTE / HINWEIS:
EnglishEN: Please refer to Fuchsjobs for the source of your application
DeutschDE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung

Stelleninformationen

  • Veröffentlichungsdatum:

    15 Dez 2025
  • Standort:

    WorkFromHome
  • Typ:

    Vollzeit
  • Arbeitsmodell:

    Vor Ort
  • Kategorie:

  • Erfahrung:

    2+ years
  • Arbeitsverhältnis:

    Angestellt

KI Suchagent

AI job search

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!

Diese Jobs passen zu Deiner Suche:

company logo
Fachinformatiker für Systemintegration (m/w/d)
Engineering und IT
Vollzeit Hamburg
20 Dez 2025Development & IT
company logo
IT Service Techniker (m/w/d)
Engineering und IT
Vollzeit Pinneberg
20 Dez 2025Development & IT
company logo
Senior IT-Administrator (w/m/d)
Engineering und IT
Vollzeit Norderstedt
20 Dez 2025Development & IT
company logo
Senior IT-Support Experte (m/w/d)
Engineering und IT
Vollzeit Hamburg
20 Dez 2025Development & IT
company logo
IT Mitarbeiter 1st und 2nd Level Support (m/w/d)
Engineering und IT
Vollzeit Pinneberg
20 Dez 2025Development & IT
company logo
IT-Netzwerkadministrator (m/w/d)
Engineering und IT
Vollzeit Pinneberg
20 Dez 2025Development & IT
company logo
Werkstudent (m/w/d) Manufacturing / Industrial Engineering
Moog Holding GmbH & Co. KG
Vollzeit Böblingen
20 Dez 2025
Apple
Software Formal Verification Engineer
Apple
Vollzeit München
20 Dez 2025