DevOps Engineer – Docker / Server Infrastructure / Clustering / Monitoring
Location Not Available
Stellenbeschreibung:
    DevOps Engineer Job Description
    Project Overview

    Our company operates a multi-server infrastructure consisting of several Mac Mini M4 units within a local network. These servers run Docker-based deployments hosting a variety of services, including n8n, NocoDB, ostgres, and others, with NGINX managing routing to multiple domains via SSL.

    The current setup is stable and supports production workloads effectively. However, as our operational requirements evolve, we are now focused on scalability, high availability, and operational transparency. Our goal is to optimize workload distribution, implement robust monitoring and alerting, strengthen security, and establish reliable backup and disaster recovery strategies.

    We are seeking an experienced DevOps Engineer to design and implement a secure, maintainable, and scalable architecture that ensures long-term operational excellence.

    Objectives

    • Scalable Architecture: Establish a robust server cluster (Docker Swarm, Kubernetes, or other suitable orchestration) to allow flexible distribution of services.
    • Monitoring & Observability: Deploy dashboards and alerts for CPU, memory, disk usage, and service health, accessible to both technical and non-technical stakeholders.
    • Load Balancing & High Availability: Distribute workloads effectively across servers and implement failover mechanisms to minimize downtime.
    • Security & Compliance: Harden external services against potential attacks and ensure SSL-based secure routing.
    • Backup & Disaster Recovery: Implement regular, automated backups for data, configurations, and containers, with a clear restoration process.
    • Operational Support & Knowledge Transfer: Provide documentation, training, and ongoing consultation for our internal team.

    Responsibilities

    1. Infrastructure Audit
      • Review the existing Docker and NGINX configuration, networking setup, and server utilization.
      • Identify areas for optimization and potential risks.
    2. Architecture Design & Implementation
      • Propose and implement a clustering and orchestration solution that fits our operational requirements.
      • Ensure seamless integration of NGINX routing or recommend improved alternatives.
    3. Monitoring & Alerting
      • Set up a centralized monitoring solution (e.g., Prometheus, Grafana, Portainer, or similar).
      • Configure performance tracking and alerts for proactive issue detection.
    4. Load Balancing & Resilience
      • Implement load distribution strategies across services and servers.
      • Ensure redundancy and failover mechanisms are in place.
    5. Security & Data Protection
      • Apply best practices for securing externally accessible services.
      • Design and implement an automated backup system with tested recovery procedures.
    6. Training & Support
      • Provide training for in-house staff on system operation and maintenance.
      • Offer ongoing support for troubleshooting and emergency interventions.

    Requirements

    • Extensive experience with Docker and container orchestration platforms (Docker Swarm, Kubernetes, or similar).
    • Strong knowledge of server clustering, networking, and load balancing strategies.
    • Hands-on experience with monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, Portainer).
    • Practical experience designing backup and disaster recovery systems.
    • Proficiency with macOS server administration; familiarity with Linux systems is an advantage.
    • Skilled in NGINX (or alternatives) configuration and SSL certificate management in containerized environments.
    • Clear and effective communication in English, with the ability to explain technical concepts to non-technical team members.
    • Availability for both planned project work and occasional urgent support requests.

    We are looking to build a long-term working relationship beyond the initial implementation.
Stelleninformationen
  • Typ:

    Vollzeit
  • Arbeitsmodell:

    Remote
  • Kategorie:

    Development & IT
  • Erfahrung:

    Erfahren
  • Arbeitsverhältnis:

    Freelance
  • Veröffentlichungsdatum:

    19 Aug 2025
  • Standort:

KI Suchagent
ai job search

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!