Aleph Alpha

AI Inference Engineer - Large Language Models (f/m/d)

Aleph Alpha Heidelberg

Stellenbeschreibung:

Overview

You will join our product team in a position that sits at the intersection of artificial intelligence research and real-world solutions. We foster a highly collaborative work culture where you can expect to work closely with your teammates and have a high level of communication between teams through methodologies such as pair or mob programming.

Your Responsibilities

  • Model Inference: Focus on inference optimization to ensure rapid response times and efficient resource utilization during real-time model interactions.
  • Hardware Optimization: Run models on various hardware platforms, from high-performance GPUs to edge devices, ensuring optimal compatibility and performance.
  • Experimentation and Testing: Regularly run experiments, analyze outcomes, and refine the strategies to achieve peak performance in varying deployment scenarios.
  • Staying up to date with the current literature on MLSys

Your Profile

  • You care about making something people want. You want to ship something that will bring value to our users. You want to deliver AI solutions end-to-end and not finish building a prototype.
  • Bachelor's degree or higher in computer science or a related field.
  • You understand how multimodal transformers work.
  • You understand the characteristics of LLM inference (KV caching, flash attention, and model parallelization).
  • You have hands-on experience with large language models or other complex AI architectures.
  • You have experience in system design and optimization, particularly within AI or deep learning contexts.
  • You are proficient in Python and have deep understanding of deep learning frameworks such as PyTorch.
  • A deep understanding of the challenges associated with scaling AI models for large user bases.

Nice If You Have

  • Previous experience in a high-growth tech environment or a role focused on scaling AI solutions.
  • Expertise with CUDA and Triton programming and GPU optimization for neural network inference.
  • Experience with Rust.
  • Experience in adapting AI models to suit a range of hardware, including different accelerators.
  • Experience in model quantization, pruning, and other neural network optimization methodologies.
  • A track record of contributions to open-source projects (please provide links).
  • Some Twitter presence discussing ML Sys topics.

What You Can Expect From Us

  • Become part of an AI revolution!
  • 30 days of paid vacation
  • Access to a variety of fitness & wellness offerings via Wellhub
  • Mental health support through nilo.health
  • Substantially subsidized company pension plan for your future security
  • Subsidized Germany-wide transportation ticket
  • Budget for additional technical equipment
  • Flexible working hours for better work-life balance and hybrid working model
  • Virtual Stock Option Plan
  • JobRad® Bike Lease
NOTE / HINWEIS:
EnglishEN: Please refer to Fuchsjobs for the source of your application
DeutschDE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung

Stelleninformationen

  • Typ:

    Vollzeit
  • Arbeitsmodell:

    Hybrid
  • Kategorie:

    Development & IT
  • Erfahrung:

    Erfahren
  • Arbeitsverhältnis:

    Angestellt
  • Veröffentlichungsdatum:

    18 Aug 2025
  • Standort:

    Heidelberg

KI Suchagent

AI job search

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!