Ext-dev-solution-classification
Location Not Available
Stellenbeschreibung:

    ext-dev-solution-classification


    General Requirements:
    • Agile development multiple reviews
    • Collaboration on Teams & GitHub, both to be provided by Frank
    • Azure OpenAI endpoint and key to be provided by Frank
    • 1st week, until Friday, accuracy of the approach has to be proven to be eligible for payments

    PART 1
    Create a 3-level hierarchy for existing list of 28k job titles (user personas)
    • 1st level: Department Names (28) from LinkedIn incl. id (existing Excel file, without duplicates)
    • 2nd level: Occupations from ONET incl. id (online list: https://www.onetcenter.org/taxonomy/2019/list.html)
    • 3rd level: Job titles from LinkedIn incl. id (existing Excel file)
    • automatically using an LLM (Azure OpenAI) incl. automatic validation step (technically, broader context/content-wise/makes sense).
    • Right approach (multi-step flow, agents, graph, RAG, combination...) to be defined
    • Including code

    Deliverables:
    • the machine-readable hierarchy/graph (which can be consumed in LLM-based classification activities)
    • the code to generate it

    Success metrics:
    • the automated quality controls/reports/logging
    • manual reviews

    PART 2
    Setup the system that provides an API to do the classification
    A short software solution description (= data set) is given as input and should be classified.

    General

    Priorities
    1. Accuracy
      • take all available information into account for classification, resulting in a reasonable non-biased result
    2. Cost per request/amount of data
      • less important, as volume, frequency and amount of data to be classified is fairly limited
    3. Performance
      • not important, as classification happens probably only “once” for each solution/update, doesn’t have to be real time

    Input
    Volume
    • 20K to 220K data sets and their recurring updates
    Format
    • Solution description with or without metadata (title, headline, categories, tags etc.)
    • Plain text (~2000 characters)
    • Could include HTML tags
    Language
    • Any language of the world
    • 1st priority: German and English
    • 2nd priority: European languages
    • 3rd priority: Any other

    Output
    • Format
    • JSON
    • Classifications
    • Reporting

    Reporting
    • Part of API response
    • Consumed amount of cost-based unit e.g. tokens
    • Version of backend, model and settings used

    Testing
    • Human validation of selected edge cases to be defined
    • Automated GPT-based/different tech-based cross-check to be defined

    2. Business user or buyer persona of solution
    Classify a B2B solution regarding all potential business user or business buyer personas (which could benefit from the solution) out of a pool of 28K standardized roles/titles

    Classes
    • 28K standardized job roles/titles

    Requirements
    • Must take all roles into consideration!

    Output
    • Multiple classes (all matching roles from standardized list)

    Examples
    • Solutions example (source): cr2da_solutionsais.csv (attached)
    • Roles (source): LinkedIn.StandardizedData.Titles.xlsx (attached)

    Solution ideas

    Known challenges
    • Asking a LLM to come up with personas → doesn’t take all potential personas into account due to laziness of the model
    • Sending solution description to RAG → biased towards similar words mentioned in the description → wrong personas are prioritized
    • Example from RAG (biased list of personas): bad-example-generated-pesonas.xlsx (attached)
    • Classical NLP and distance-based clustering/matching → doesn’t take different weights of parts of the text into account

    System must be reusable for doing industry classification of software solutions using an existing hierarchy as well! See industry file attached.

    Architecture

    API
    Must be accessible through an asynchronous REST API
    Minimal
    • processing one item per request
    • no authentication/authorization required
    • with documented examples for all properties
    Better
    • batch processing capability, multiple items per request
    • OAuth2 based authentication and authorization
    • with Swagger/OpenAPI schema

    Cloud
    • Must run on/within Microsoft Azure or GCP, AWS as an alternative
    • Use of proprietary Azure specific features only if it's at least 50% faster/cheaper to develop/deploy and there is an exit/migration plan, so it could be run within/migrated to a different cloud environment like Google Cloud within a day
    • Must be fully executable within EU data centers

    Programming languages
    • Python preferred programming language

    AI models

    Performance
    Response must be available within seconds
    Minimal
    • 10 seconds
    Better
    • 5 seconds or less

    Scalability
    Must be horizontally scalable to process multiple data sets in parallel
    Minimal
    • at least 10x

    Cost
    Must not produce cost within idle state
    Minimal
    • automatically deployable, start/stop triggerable within seconds
    Better
    • serverless
NOTE / HINWEIS:
EnglishEN: Please refer to Fuchsjobs for the source of your application
DeutschDE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung
Stelleninformationen
  • Typ:

    Vollzeit
  • Arbeitsmodell:

    Remote
  • Kategorie:

    Development & IT
  • Erfahrung:

    Leitend
  • Arbeitsverhältnis:

    Freelance
  • Veröffentlichungsdatum:

    20 Aug 2025
  • Standort:

KI Suchagent
AI job search

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!