Hyphen Partners
LLM Engineer
Stellenbeschreibung:
    LLM Engineer Job Description
    About the Company
    We specialize in creating conversational AI experiences for roleplay, gaming, social interactions, and creative writing.

    As our LLM Engineer, you'll fine-tune and optimize large language models that power conversations for over 30 million users, processing more than 5 million messages daily. You'll be at the forefront of developing AI companionship technology that scales globally while maintaining personalized and meaningful interactions.

    Key Responsibilities
    • Interact with stakeholders (Co-founders, Web Engineers, DevOps Engineers) to bring your project to life.
    • Oversee the creation and optimization of algorithms for LLM behavior adjustments based on user interactions, focusing on fine-tuning and prompt engineering.
    • Develop features to improve the richness of the product (multi-character chats, gamification, etc)
    • In addition to chat, interacting with modalities managed by other team members (audio, image, video), and collaborating with them
    • Adaptation and fine-tuning of base models for multilingual support
    • Manage the creation and maintenance of diverse datasets critical for training and improving the performance of LLMs.
    • Assess and determine the best technological approaches, selecting between classifiers, fine-tuning, and other methods based on the specific project's needs.

    Your Qualifications
    • Python Mastery: 5+ years building production-grade, modular, maintainable codebases
    • LLM Architecture Expertise: Deep understanding of transformers and their training dynamics (attention, positional encodings, samplers, tokenizers, post- training, reasoning LM)
    • Inference Optimization at Scale: Expert with vLLM / TensorRT-LLM (or similar); proven record of reducing latency and memory via quantization and/or distillation
    • Distributed Training: Hands-on multi-GPU / multi-node fine-tuning using FSDP, DeepSpeed, or accelerate; comfortable with mixed-precision, gradient checkpointing, and memory-aware scheduling
    • Performance Profiling & Optimization: Skilled at identifying and resolving compute or memory bottlenecks across CPU/GPU pipelines with industry-standard profiling workflows

    Nice-to-Haves
    • Concurrency & Runtime Engineering: Strong with asyncio, multiprocessing, or equivalent backend/batch-scheduling patterns
    • Low-level Systems: Practical CUDA / Triton experience; able to write or debug custom kernels
    • Open-Source Impact: Contributor to core LLM tooling (vLLM, HF Transformers Triton, etc.)
    • Real-time Deployments: Built or maintained latency-critical, multi-user LLM services (RAG, streaming, agents, chatbots)
    • Specialized Generation Use Cases: Exposure to erotic role playing, multi-turn instruction tuning, or non-English quality alignment
NOTE / HINWEIS:
EnglishEN: Please refer to Fuchsjobs for the source of your application
DeutschDE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung
Stelleninformationen
  • Typ:

    Vollzeit
  • Arbeitsmodell:

    Remote
  • Kategorie:

    Development & IT
  • Erfahrung:

    Erfahren
  • Arbeitsverhältnis:

    Angestellt
  • Veröffentlichungsdatum:

    21 Aug 2025
  • Standort:

    Remote
KI Suchagent
AI job search

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!