LLM Inference Engineer
AI plarform Germany
Stellenbeschreibung:
    LLM Inference Engineer
    We’re building the world’s largest AI companionship platform
    We’re building the world’s largest AI companionship platform — 30M+ users in just 18 months, with 5M+ daily messages. To keep scaling, we need an LLM Inference Engineer who can optimize our vLLM stack and make a direct impact on millions of conversations every day.

    Responsibilities:
    • Optimize LLM inference pipelines (vLLM, TensorRT-LLM, custom techniques) to reduce latency and memory usage.
    • Own fine-tuning and prompt engineering, driving improvements in conversation quality and personalization.
    • Collaborate with co-founders, web engineers, and DevOps to scale global AI chat experiences.
    • Architect and maintain distributed training systems (FSDP, DeepSpeed, accelerate) for multi-GPU optimization.
    • Lead multilingual model adaptation projects from dataset creation to production deployment.
    • Profile and debug compute/memory bottlenecks to improve system performance and lower infra costs.

    Qualifications and Technical Skills:
    • 2+ years hands-on experience in LLM inference optimization at scale.
    • Proven track record with inference systems serving 1M+ daily messages.
    • Open-source impact (contributor to vLLM, HF Transformers, Triton, etc.).
    • Strong CS/Engineering background (degree or equivalent experience).
    LLM Architecture Expertise: Transformers, attention, positional encodings, tokenizers, reasoning LMs.
    Inference Optimization: Quantization, distillation, latency/memory reduction.
    Distributed Training: Multi-GPU/multi-node with FSDP, DeepSpeed, accelerate.
    Low-level Systems: CUDA / Triton, custom kernel optimization.

    Why join us:
    • Scale that matters: Work on inference pipelines powering 30M+ users and 5M+ daily messages.
    • Direct impact: Work closely with co-founders and a world-class engineering team.
    • Competitive package: €70–100K, fully remote, streamlined 4-step process.
    • High-impact role: Rare chance to shape inference systems for tens of millions of users.

    💡 Please note: this describes our ideal candidate, but if you’re excited about the role and feel you could grow into it, we’d love to hear from you!
NOTE / HINWEIS:
EnglishEN: Please refer to Fuchsjobs for the source of your application
DeutschDE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung
Stelleninformationen
  • Typ:

    Vollzeit
  • Arbeitsmodell:

    Remote
  • Kategorie:

    Development & IT
  • Erfahrung:

    Erfahren
  • Arbeitsverhältnis:

    Angestellt
  • Veröffentlichungsdatum:

    23 Aug 2025
  • Standort:

    Germany
KI Suchagent
AI job search

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!