Project Overview
    We are developing high-quality training and evaluation datasets to improve how Large Language Models (LLMs) perform on real software engineering problems. The core of this project involves identifying and curating verifiable coding tasks from public GitHub repositories, supported by a human-in-the-loop review process.
    As a contractor on this project, you will review code written by AI to solve real software tasks. Your feedback will help improve how future AI models learn to write and understand code.
    Key Responsibilities
    
        - Review and compare 3–4 model-generated code responses for each task using a structured ranking framework
 
        - Assess code changes (diffs) for correctness, quality, readability, and performance
 
        - Provide clear, concise explanations for your ranking decisions
 
        - Maintain consistency and fairness across all evaluations
 
        - Identify and document edge cases or unusual model behavior
 
        - Collaborate with the team to improve evaluation processes and identify improvement areas
 
    
 
    Required Qualifications
    
    Professional Experience
    
    
        - Several years of experience as a software engineer (experience working as a data scientist will not be considered)
 
        - Minimum 2 years of experience as a full-stack engineer at a leading tech product company (e.g., Google, Shopify, Microsoft, Snowflake, Meta, PayPal, etc.)
 
    
    Note: Experience from contractual or part-time roles will not be considered. Only full-time employment qualifies.
    
    
    
Technical Skills
    
        - Strong understanding of software design, debugging, and engineering best practices
 
        - Familiarity with code review processes and version control systems
 
        - Ability to analyze and compare real-world code changes
 
        - Excellent written communication skills for clearly explaining technical evaluations
 
    
 
    Preferred Qualifications
    
    Degree from a top-ranked university is preferred but not required
    Experience working with LLM-generated code or AI evaluation projects
    Background in developer tools or systems automation
    Exposure to AI research or developer agents
    Engagement Details
    
    Type: Contract (no medical benefits or paid leave)
    Commitment: 10–20 hours/week, with flexible work hours (partial overlap with Pacific Time)
    Duration: 1 month (starting next week, with potential for extension)
    Compensation: $50–$150/hour, based on experience, geography, and skill level
    About Turing:
    Turing is one of the world’s fastest-growing AI companies, pushing the boundaries of AI-assisted software development. Our mission is to empower the next generation of AI systems to reason about and work with real-world software repositories. You’ll be working at the intersection of software engineering, open-source ecosystems, and frontier AI.
    If you have strong software engineering expertise and are interested in helping shape the next generation of AI-assisted development tools, we encourage you to apply.