LLM Inference Engineer

Stellenbeschreibung:

We’re building the world’s largest AI companionship platform
We’re building the world’s largest AI companionship platform — 30M+ users in just 18 months, with 5M+ daily messages. To keep scaling, we need an LLM Inference Engineer who can optimize our vLLM stack and make a direct impact on millions of conversations every day.

Responsibilities:

Optimize LLM inference pipelines (vLLM, TensorRT-LLM, custom techniques) to reduce latency and memory usage.
Own fine-tuning and prompt engineering, driving improvements in conversation quality and personalization.
Collaborate with co-founders, web engineers, and DevOps to scale global AI chat experiences.
Architect and maintain distributed training systems (FSDP, DeepSpeed, accelerate) for multi-GPU optimization.
Lead multilingual model adaptation projects from dataset creation to production deployment.
Profile and debug compute/memory bottlenecks to improve system performance and lower infra costs.

Qualifications and Technical Skills:

2+ years hands-on experience in LLM inference optimization at scale.
Proven track record with inference systems serving 1M+ daily messages.
Open-source impact (contributor to vLLM, HF Transformers, Triton, etc.).
Strong CS/Engineering background (degree or equivalent experience).

LLM Architecture Expertise: Transformers, attention, positional encodings, tokenizers, reasoning LMs.
Inference Optimization: Quantization, distillation, latency/memory reduction.
Distributed Training: Multi-GPU/multi-node with FSDP, DeepSpeed, accelerate.
Low-level Systems: CUDA / Triton, custom kernel optimization.

Why join us:

Scale that matters: Work on inference pipelines powering 30M+ users and 5M+ daily messages.
Direct impact: Work closely with co-founders and a world-class engineering team.
Competitive package: €70–100K, fully remote, streamlined 4-step process.
High-impact role: Rare chance to shape inference systems for tens of millions of users.

💡 Please note: this describes our ideal candidate, but if you’re excited about the role and feel you could grow into it, we’d love to hear from you!

NOTE / HINWEIS:

EN: Please refer to Fuchsjobs for the source of your application

DE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung

Stelleninformationen

Typ:
Vollzeit
Arbeitsmodell:
Remote
Kategorie:
Development & IT
Erfahrung:
Erfahren
Arbeitsverhältnis:
Angestellt
Veröffentlichungsdatum:
23 Aug 2025
Standort:
Germany

KI Suchagent

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!