TEKsystems

Operations Expertise— Compute & OS (All Genders)

Stellenbeschreibung:

Description

Local Operations manages the on-premises production platform, which serves as the primary host for all mission‑critical business applications. Local operations are responsible for the following core areas:

  • Platform Stability : Ensuring the high availability and performance of the on‑premises private cloud environment.
  • Application Hosting : Consulting on the seamless operation of Germany‑specific productive business applications.
  • Incident Management : Resolving technical issues within standard business hours to minimize operational downtime.
  • Lifecycle Maintenance : Executing routine updates, patches, and system optimizations within the local infrastructure.

Provide Tier‑3 operational ownership for Compute & Operating System services for Local Production (DE).

Tasks

  • Handling of complex incidents, deep troubleshooting, and root cause analysis; drive permanent fixes and preventive measures.
  • Ensuring compute/OS readiness for releases/changes, including monitoring/alerting coverage, performance baselines, hardening, patch strategy, rollback and recovery procedures, and runbooks.
  • Executing and improving standard operational procedures through automation (reduce toil, improve MTTR and stability).
  • Technical coordination with Kubernetes/Data and Network/Storage SMEs to resolve cross‑domain production issues.
  • Handling of operational readiness: validating deployment artifacts from an operations perspective, defining and enforcing quality assurance measures, ensuring rollback strategies and operational monitoring are in place for production deployments.
  • Monitoring, incident, problem, and change management: monitoring system health, performance metrics, and service availability across multi‑tenant environments; identifying, analyzing, and resolving incidents, minimizing service disruption; triggering root cause analysis and implementing corrective and preventive actions.
  • Automation of operational critical standard processes: addressing operational issues by automating remedial standard operations processes, validating all automated procedures following the established software development life cycle including staging, testing, and validation reviews.
  • Security and compliance enforcement: implementing monitoring and logging strategies to support audit and compliance requirements; performing routine security scans and remediating identified vulnerabilities.

Profile Requirements

  • 5‑10+ years in IT operations / service delivery / platform operations with demonstrated leadership in mission‑critical environments.
  • Proven experience implementing/leading Incident, Problem, Change, Release governance in production.
  • Expertise with ITSM: Jira Service Management (JSM), Jira, Confluence.
  • Experience in core operations processes (incident management, change management, problem management, IT Service Management) as well as SRE concepts.
  • Experience gathering operational insights from monitoring or observability including SLI/SLA/SLO management and tracking.
  • Hands‑on experience documenting procedures properly and enforcing clear runbooks or playbooks.
  • Observability hands‑on experience with monitoring and logging tools (e.g., Prometheus, Grafana, Datadog, Mimir, Loki).
  • Familiarity with enterprise DevOps toolchains is a plus (GitLab, JFrog Artifactory, Backstage, Harness).
  • Proficiency in both speech and writing in English (at least C1) and in German (at least C1).
  • Experience operating in regulated / high‑availability industries (banking, telco, public sector, healthcare).
  • Experience with SRE practices (SLOs/SLIs, error budgets) and reliability management.

Skills

  • Linux, Unix
  • 3‑tier architecture
  • Hybrid cloud
  • Prometheus
  • Grafana
  • Loki

Job Details

Job Title: Operations Expertise— Compute & OS (All Genders)

Location: Frankfurt am Main, Germany

Job Type: Contract

#J-18808-Ljbffr
NOTE / HINWEIS:
EnglishEN: Please refer to Fuchsjobs for the source of your application
DeutschDE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung

Stelleninformationen

  • Veröffentlichungsdatum:

    17 Apr 2026
  • Standort:

    Frankfurt
  • Typ:

    Vollzeit
  • Arbeitsmodell:

    Vor Ort
  • Kategorie:

  • Erfahrung:

    2+ years
  • Arbeitsverhältnis:

    Angestellt

KI Suchagent

AI job search

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!