Senior Site Reliability Engineer - Neocloud Provider

Stellenbeschreibung:

Do you want to join a leading next-generation AI cloud provider as a Senior Site Reliability Engineer?

You will be joining a Neocloud that is building one of the most advanced GPU and high-performance computing platforms in Europe.

The role offers the chance to help design and maintain the reliability, scale and performance of a growing cloud platform with real engineering challenges.

You will collaborate with highly skilled teams across software, hardware, networking & AI infrastructure, with the autonomy to influence technical direction and build systems that support large-scale compute workloads.

If you are interested in this opportunity and want to learn more, get in touch today.

Responsibilities

  • Architect and maintain reliable, fault-tolerant, large-scale distributed systems for high-performance GPU and compute workloads.
  • Build and automate deployment, failover, monitoring, capacity planning, and incident-response workflows.
  • Develop, optimise, and maintain CI/CD pipelines to enable safe, rapid, and repeatable software delivery.
  • Drive incident response and root-cause analysis while improving system observability, performance, and long-term stability.
  • Partner with backend, hardware, and networking teams to optimise service performance, support regional expansion, scale compute clusters, and participate in on-call rotations.

Required Skills & Experience

  • Strong Linux debugging expertise, including network and system-call tracing.
  • Proficiency with Terraform and Kubernetes (network policies, scheduling, taints/tolerations).
  • Experience with Slurm job monitoring and core configuration.
  • Solid Python or Go skills, covering async/error handling, environment management, and common system/HTTP tooling.
  • Ability to automate workflows and troubleshoot distributed systems using CLI tools, logs, and scripting.

Salary & Benefits

  • Up to €130,000 Gross Per Year
  • Bonus Scheme
  • Company share scheme

#J-18808-Ljbffr
NOTE / HINWEIS:
EnglishEN: Please refer to Fuchsjobs for the source of your application
DeutschDE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung

Stelleninformationen

  • Veröffentlichungsdatum:

    04 Jan 2026
  • Standort:

    WorkFromHome
  • Typ:

    Vollzeit
  • Arbeitsmodell:

    Vor Ort
  • Kategorie:

  • Erfahrung:

    2+ years
  • Arbeitsverhältnis:

    Angestellt

KI Suchagent

AI job search

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!

Diese Jobs passen zu Deiner Suche:

company logo
Senior Site Reliability Engineer
OWKIN
Vollzeit Neumarkt in der Oberpfalz
05 Jan 2026
company logo
Senior Site Reliability Engineer (all genders)
TeamViewer GmbH
Vollzeit WorkFromHome
01 Jan 2026
.
Senior Site Reliability Engineer (m/w/d) AWS / Serverless / Backend
.
Vollzeit Hamburg
01 Jan 2026
company logo
Senior Site Reliability Engineer (Database)
Kombo Technologies GmbH
Vollzeit Berlin
01 Jan 2026
company logo
Senior Site Reliability Engineer - Automation Platform (x/f/m) Nouveau
MonDocteur
Vollzeit WorkFromHome
01 Jan 2026
flaschenpost SE
(Senior) Site Reliability Engineer / DevOps (m/f/x)
flaschenpost SE
Vollzeit WorkFromHome
01 Jan 2026
company logo
Senior Site Reliability Engineer - Fintech · EU ·
ICEO
Vollzeit WorkFromHome
01 Jan 2026
STACKIT
(Senior) Site Reliability Engineer / Distributed Cloud - STACKIT (m/w/d)
STACKIT
Vollzeit Heilbronn
01 Jan 2026