LLM Engineer

Stellenbeschreibung:

LLM Engineer Job Description

About the Company
We specialize in creating conversational AI experiences for roleplay, gaming, social interactions, and creative writing.

As our LLM Engineer, you'll fine-tune and optimize large language models that power conversations for over 30 million users, processing more than 5 million messages daily. You'll be at the forefront of developing AI companionship technology that scales globally while maintaining personalized and meaningful interactions.

Key Responsibilities

Interact with stakeholders (Co-founders, Web Engineers, DevOps Engineers) to bring your project to life.
Oversee the creation and optimization of algorithms for LLM behavior adjustments based on user interactions, focusing on fine-tuning and prompt engineering.
Develop features to improve the richness of the product (multi-character chats, gamification, etc)
In addition to chat, interacting with modalities managed by other team members (audio, image, video), and collaborating with them
Adaptation and fine-tuning of base models for multilingual support
Manage the creation and maintenance of diverse datasets critical for training and improving the performance of LLMs.
Assess and determine the best technological approaches, selecting between classifiers, fine-tuning, and other methods based on the specific project's needs.

Your Qualifications

Python Mastery: 5+ years building production-grade, modular, maintainable codebases
LLM Architecture Expertise: Deep understanding of transformers and their training dynamics (attention, positional encodings, samplers, tokenizers, post- training, reasoning LM)
Inference Optimization at Scale: Expert with vLLM / TensorRT-LLM (or similar); proven record of reducing latency and memory via quantization and/or distillation
Distributed Training: Hands-on multi-GPU / multi-node fine-tuning using FSDP, DeepSpeed, or accelerate; comfortable with mixed-precision, gradient checkpointing, and memory-aware scheduling
Performance Profiling & Optimization: Skilled at identifying and resolving compute or memory bottlenecks across CPU/GPU pipelines with industry-standard profiling workflows

Nice-to-Haves

Concurrency & Runtime Engineering: Strong with asyncio, multiprocessing, or equivalent backend/batch-scheduling patterns
Low-level Systems: Practical CUDA / Triton experience; able to write or debug custom kernels
Open-Source Impact: Contributor to core LLM tooling (vLLM, HF Transformers Triton, etc.)
Real-time Deployments: Built or maintained latency-critical, multi-user LLM services (RAG, streaming, agents, chatbots)
Specialized Generation Use Cases: Exposure to erotic role playing, multi-turn instruction tuning, or non-English quality alignment

NOTE / HINWEIS:

EN: Please refer to Fuchsjobs for the source of your application

DE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung

Stelleninformationen

Typ:
Vollzeit
Arbeitsmodell:
Remote
Kategorie:
Development & IT
Erfahrung:
Erfahren
Arbeitsverhältnis:
Angestellt
Veröffentlichungsdatum:
21 Aug 2025
Standort:
Remote

KI Suchagent

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!