Jobgether

Machine Learning Systems Engineer (Remote - EU)

Jobgether WorkFromHome

Stellenbeschreibung:

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Machine Learning Systems Engineer in European Union .

We are seeking a talented Machine Learning Systems Engineer to join a remote-first, globally distributed team working on cutting-edge AI infrastructure. In this role, you will contribute to the development of large-scale language model systems, focusing on high-performance training, inference, and self-improving AI agents. You will work at the intersection of machine learning research, distributed systems, and high-performance computing, building tools and frameworks that enable researchers and organizations worldwide to deploy advanced AI solutions. This role offers the chance to work on technically demanding, open-source projects while collaborating with a passionate international team. Your work will have a direct impact on the future of scalable AI systems.

Accountabilities

  • Contribute to the development and optimization of large-scale language model frameworks .
  • Implement high-performance distributed training algorithms using frameworks such as Megatron-LM , DeepSpeed , and vLLM .
  • Develop and optimize inference engines and tools for model deployment, fine-tuning, and AI agent self-improvement.
  • Integrate diverse machine learning ecosystems including HuggingFace and other LLM tools.
  • Optimize performance across multi-GPU, multi-node architectures , leveraging HPC and CUDA/ROCm programming.
  • Collaborate with the open-source community to enhance the codebase, implement features, and resolve issues.
  • Research and implement advanced techniques for self-improving AI agents and high-efficiency ML pipelines.

Requirements

  • 3+ years of experience in machine learning engineering or research.
  • Proficiency in Python and C/C++ , with strong systems programming skills.
  • Deep understanding of high-performance computing concepts, including MPI, BSP, and distributed multi-GPU training.
  • Solid experience with transformer architectures, gradient descent, backpropagation, and deep learning training.
  • Familiarity with distributed training strategies: data parallelism, model parallelism, pipeline parallelism .
  • Experience with containerization (Docker, Kubernetes) and cluster orchestration.
  • Demonstrated experience with ML frameworks like vLLM, Megatron-LM, HuggingFace , or similar.
  • Commitment to open-source development and community collaboration.
  • Excellent problem-solving, debugging, and performance optimization skills.
  • Bonus: Advanced degrees (MS/PhD), experience with SLURM, mixed-precision training, MLOps, or prior contributions to major open-source ML projects.

Benefits

  • Competitive compensation including salary and equity participation.
  • Fully remote, work-from-anywhere flexibility.
  • Comprehensive global benefits including mental health support.
  • Open PTO policy and flexible working hours.
  • Paid parental leave and support for personal well-being.
  • Opportunities for continuous learning and professional development.
  • Regular team offsites, virtual events, and global gatherings to foster team collaboration.
  • Inclusive, transparent, and supportive culture prioritizing growth and knowledge-sharing.

#J-18808-Ljbffr
NOTE / HINWEIS:
EnglishEN: Please refer to Fuchsjobs for the source of your application
DeutschDE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung

Stelleninformationen

  • Typ:

    Vollzeit
  • Arbeitsmodell:

    Vor Ort
  • Kategorie:

  • Erfahrung:

    2+ years
  • Arbeitsverhältnis:

    Angestellt
  • Veröffentlichungsdatum:

    06 Nov 2025
  • Standort:

    WorkFromHome

KI Suchagent

AI job search

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!