Research Engineer (LLM Training and Performance) Amsterdam, Netherlands; Berlin, Germany; Limas[...]

Stellenbeschreibung:

Amsterdam, Netherlands; Berlin, Germany; Limassol, Cyprus; London, United Kingdom; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Warsaw, Poland; Yerevan, Armenia

At JetBrains, code is our passion. Ever since we started back in 2000, we have been striving to make the world’s most robust and effective developer tools. By automating routine checks and corrections, our tools speed up production, freeing developers to grow, discover, and create.

The JetBrains AI team is focused on bringing advanced AI capabilities to JetBrains products, which includes supporting the internal AI platform used across JetBrains and conducting long‑term R&D in AI and machine learning. We collaborate closely with product teams to brainstorm and prioritize AI‑driven features, as well as support product marketing and release planning. Our team includes about 50 people working on everything from classical ML algorithms and code completion to agents, retrieval‑augmented generation, and more.

We’re looking to strengthen our team with an AI Evaluation Lead who will help define and execute our strategy for evaluating AI‑powered features and LLMs. In this role, you will be instrumental in ensuring our models deliver meaningful value to users, by shaping evaluation pipelines, influencing model development, collaborating with product and research teams across the company, and publishing your work to open source.

We value engineers who:

  • Plan their work and make decisions independently, consulting with others if needed.
  • Follow the latest advances in AI and ML fields, think long‑term, and take ownership of their scope of work.
  • Prefer simplicity, opting for sound, robust, and efficient solutions.

In this role, you will:

  • Design and develop rigorous offline and online evaluation benchmarks for AI features and LLMs.
  • Manage the team, prioritize tasks, and mentor teammates.
  • Define evaluation methodology and benchmarks for our open‑source models and public releases.
  • Communicate your findings and best practices across the organization.

We’ll be happy to have you on our team if you have:

  • A strong understanding of statistics and data analysis.
  • Excellent management and communication skills.
  • Solid practical experience with Python and evaluation frameworks.
  • Attention to detail in everything you do.

We’d be especially thrilled if you have experience with:

  • Preparing public evaluation reports for feature or model releases.
  • Managing data annotation efforts, including crowdsourcing and in‑house labeling.
  • CI systems, workflow automation, and experiment tracking.
  • The Kotlin programming language.
  • Weights & Biases and Langfuse for experiment tracking and reporting.
  • ZenML for ML workflow automation.
  • AWS and GCP for infrastructure.
  • Git for source code management.
  • TeamCity for continuous integration.

#J-18808-Ljbffr
NOTE / HINWEIS:
EnglishEN: Please refer to Fuchsjobs for the source of your application
DeutschDE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung

Stelleninformationen

  • Typ:

    Vollzeit
  • Arbeitsmodell:

    Vor Ort
  • Kategorie:

  • Erfahrung:

    2+ years
  • Arbeitsverhältnis:

    Angestellt
  • Veröffentlichungsdatum:

    24 Nov 2025
  • Standort:

    Berlin

KI Suchagent

AI job search

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!