Aleph Alpha Research’s mission is to deliver AI innovation that enables open, accessible, and trustworthy deployment of GenAI in enterprise applications. Our organization develops foundational models and next-generation methods that make it easy and affordable for Aleph Alpha’s customers to increase productivity in finance, administration, R&D, logistics, and manufacturing processes. We do this with a flat hierarchy and IC-driven culture: ideas come from the bottom up, and it’s our shared responsibility to deliver impactful research.
We’re looking for skilled Software Engineers to join our research team, headquartered in Heidelberg, with a focus on evaluating the capabilities, safety, and trustworthiness of our models. While we highly value in-person work, we offer flexibility to work from Berlin or elsewhere in Germany, with regular travel to onsite events.
As an AI Software Engineer in Model Evaluation , you will help design, implement, and scale the systems that measure our models’ performance at the cutting edge. You will work closely with researchers to create evaluation benchmarks, datasets, and environments that test model capabilities, safety, and reliability across tasks from multilingual understanding to mathematical reasoning and creativity.
You will own significant portions of our evaluation infrastructure, including dataset generation pipelines, automated benchmarking tools, analysis dashboards, and large-scale evaluation orchestration on our compute clusters. You’ll be building tools and experiments that drive product decisions, shape research priorities, and guide responsible deployment of our models.
This is high-scale, high-impact engineering: you’ll work with petabyte-scale data, run evaluations across large-scale distributed GPU clusters, and deliver insights that inform the direction of Aleph Alpha’s research.
Our current open source eval-framework can be found here.
You can expect to contribute to the following areas:
We hire slowly and deliberately. We recognise that we need top talent to deliver top research, and we value ability over experience: if you think you would be a good fit for this role, we’d encourage you to apply even if you do not meet all of the following qualifications.
Basic Qualifications
Preferred Qualifications
We do not require prior experience in AI for this role, but we value eagerness to learn. If you have prior experience in AI, we will be particularly excited about your ability to translate evaluation insights into actionable improvements for models and systems.
We believe embodying these values would make you a great fit in our team:
We own work end-to-end, from idea to production : You take responsibility for every stage of the process, ensuring that our work is complete, scalable, and of the highest quality.
We ship what matters : Your focus is on solving real problems for our customers and the research community. You prioritize delivering impactful solutions that bring value and make a difference.
We work transparently : You collaborate and share your results openly with the team, partners, customers, and the broader community through publishing and sharing results and insight including blogposts, papers, checkpoints, and more.
We innovate through leveraging our intrinsic motivations and talents : We strive for technical depth and to balance ideas and interests of our team with our mission-backwards approach, and leverage the interdisciplinary, diverse perspectives in our teamwork.
Typ:
VollzeitArbeitsmodell:
Vor OrtKategorie:
Erfahrung:
2+ yearsArbeitsverhältnis:
AngestelltVeröffentlichungsdatum:
27 Nov 2025Standort:
WorkFromHome
Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!
