At Toloka AI we create data that powers leading GenAI models and innovations. We work with frontier labs, big tech, renowned AI startups, enterprises and non-profit research organizations worldwide. We use a combination of Experts + Crowd + Tech Platform to teach AI models to reason and evaluate their efficacy and safety. We have experts in more than 50 different domains-from doctors and lawyers to physicists and engineers-and boast one of the most diverse global crowds, representing over 100 countries and speaking 40+ languages. We are a well-funded startup with an enviable portfolio of clients including Anthropic, Amazon, Microsoft, poolside, Recraft, and Shopify.
Recently, we secured strategic investment led by Bezos Expeditions with participation from Mikhail Parakhin, CTO of Shopify and board advisor to leading GenAI companies, who now serves as our Chairman of the Board. Our remote-first team is globally distributed around the world: USA, UK, the Netherlands, Israel, Czech Republic, Serbia, and more. We are headquartered in Amsterdam.
We are looking for an Freelance Agent Evaluation Analyst to take ownership of quality, structure, and insight across the project. This role goes far beyond task-checking - it’s about critical thinking, systems-level analysis, and ensuring clarity, reliability, and consistency at scale.You’ll work as both a hands-on evaluator and an analyst, collaborating with domain experts, delivery managers, and engineers. Beyond reviewing outputs, you’ll be expected to understand the “why” behind the work, identify logical gaps or inconsistencies, and propose meaningful improvements.
This is a flexible, impact-driven role where you’ll have space to grow, contribute ideas, and help shape how evaluation and quality are scaled across the project.
This role is especially well-suited for:
Analysts, researchers, or consultants with strong structuring and reasoning skills
Junior product managers or strategists curious about AI and evaluation work
Smart problem-solvers (students or early-career professionals) who enjoy digging into logic, systems, and edge cases
You do not need a coding background. What matters most is curiosity, intellectual rigor, and the ability to evaluate complex setups with precision.
Typ:
VollzeitArbeitsmodell:
RemoteKategorie:
Development & ITErfahrung:
ErfahrenArbeitsverhältnis:
AngestelltVeröffentlichungsdatum:
01 Nov 2025Standort:
EMEA
Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!