Staff Applied AI Engineer

Stellenbeschreibung:

Overview

The challenge Getting AI to perform reliably at scale takes more than a successful proof of concept. Latency that looks fine in staging can break down under real traffic. Models hallucinate at the worst possible moment. Multi-agent systems introduce coordination problems — shared context, task handoffs, graceful recovery — that have no settled playbook yet. You will map these complex workflows and securely connect disparate business systems so the agents have the exact context and tools they need to operate autonomously and reliably. The domain raises the stakes further. A hallucinated output here can become a compliance incident. The systems you build need to be intelligent, auditable, and fail-safe, on a stack that will look different in six months. At this level, the work involves setting technical direction: recognising where current approaches won’t scale and shaping the architecture before the gap becomes a problem.

Why this matters Deriv's mission is Trading for anyone, anywhere, anytime. We serve more than 3 million traders, around the clock, across regulatory environments that don't forgive mistakes. At this scale, AI isn't a feature; it's how the business operates. And our Berlin team is building the next generation of agentic systems: co-intelligence that collaborates, debates, and decides alongside humans.

Why Deriv At Deriv, AI is already running in production across customer ops, compliance, finance, engineering, security, hiring, and internal decision-making, with real impact on speed, accuracy, control, and scale. That includes customer service and operational workflows, AI-driven KYC, sanctions and regulatory review, finance automation like OCR and reconciliations, internal intelligence tools, incident and vulnerability analysis, and recruiting and growth systems. The opportunity here is to build and scale AI systems that matter in a regulated environment, across real business domains, with direct impact on customers, operators, and decision-makers.

Recent work includes engineering Amy, a support AI agent that reduced median response times from several hours to under 25 seconds for 117,000 Deriv affiliates in 190 countries, replacing a 20-year-old back-office system through AI-driven spec-to-implementation workflows, and testing whether security prompts can make AI-generated code safer before the first line is written. We write about what broke, what we learned, and what’s running in production. See what we’re building.

What You’ll Do

Identify high-leverage agent workflows: Figure out the ideal future-state workflows—such as compliance reviews or complex customer routing—where applying agentic systems can execute tasks exponentially faster or at vastly higher volumes.
Map complex data flows and human interfaces: Design how structured and unstructured data flows into the agent's context window. Determine exactly where and when human operators need to interface with the agent to ensure safe task handoffs.
Manage evals and model lifecycles: Build robust evaluation frameworks to monitor agents after major underlying model or data changes, tracking operational KPIs to ensure continuous business value.
Architect for scale: Break down ambiguous, function-wide problems into shippable technical architectures. You won't just fix the immediate failure — you'll design the system that makes it less likely to recur.
See work through to adoption: Ship a feature, then stay responsible for it. You'll track how it performs in production, catch failure modes early, and iterate until the system solves the actual business problem.
Raise engineering standards: Review code and design documents to improve how the whole team builds, not just to catch individual errors. Document what you know so it outlasts your memory.
Resolve competing constraints: When engineering, compliance, and ops each want something different, you'll facilitate the conversation and translate those constraints into a coherent architecture. When technical debt and shipping speed are in tension, you'll make the call with the information available.

Who You Are

You have a track record of shipping production AI: Seven or more years of software engineering experience, with at least three years building AI systems that handle real traffic at scale. Systems that went to production, broke in ways you didn't anticipate, and are better for having broken. This is the signal we weight most heavily.
You master the full applied AI stack: You’re fluent in agent orchestration, context engineering (beyond basic RAG), systematic evals/guardrails, and production hardening—while using AI tools (Cursor, Claude Code) to ship complex systems faster. You bridge deep technical architecture with business and process realities in regulated domains.
You've worked across paradigms: Comfortable with SQL and prompt engineering, deterministic systems and LLM-based agents. You pick the right tool for the problem, not the one you're most familiar with.
You design around LLM limitations: You understand what these systems actually are. You build guardrails before you need them, and fallbacks before they're urgent.
You lead through ambiguity: When the problem isn't fully defined, you ask the questions that define it. When the path isn't clear, you find one and bring the team with you.
You think in systems: You see the structural issue behind the symptom and ask whether the organisation is solving the right problem, not just whether the code is correct.

Tech stack

Languages: Python, TypeScript
AI/ML: Model Context Protocol (MCP), tool-calling frameworks, custom skill integrations, OpenAI APIs, Anthropic APIs, LangGraph, and custom ML pipelines
Infrastructure: AWS, PostgreSQL, Redis, Docker, LangFuse, vector databases, graph databases

The team

You\'ll join the Berlin AI team, working closely with engineering, security, and product across Deriv's global offices. The team operates with significant autonomy and a direct line to production systems.

The honest reality

This is demanding work. You'll make architectural decisions with incomplete information and be accountable for how they hold up under real conditions. You'll need to convince stakeholders to change how they work to accommodate systems you're building — and occasionally you'll be wrong about what the right system is.

The problems are genuinely hard. The people you'll work with have already shipped AI at scale. And you'll build systems that actually run, not ones that live in a slide deck.

Before you apply

This role is open to candidates who already have the right to live and work in Germany. We're not able to sponsor or transfer a work permit for this position.

What To Include In Your CV

Evidence of AI-native development: GitHub repos, deployed projects, or examples of what you've built with Claude Code/Cursor
Brief note: What's the most complex thing you've shipped using AI-assisted development? What broke, and what did you learn?
Bonus: Any experiments with autonomous agents (Claude Agent SDK, OpenClaw, custom setups, etc.). We'd love to see what you've tried.

#J-18808-Ljbffr

NOTE / HINWEIS: