Statistical audit of aggregation for LLM‑assisted CV...

Statistical audit of aggregation for LLM‑assisted CV pipeline

Location Not Available

Stellenbeschreibung:

Context

We estimate per‑company ML headcount from CV data using keyword filters, LLM classifiers, and a log‑space debiasing + aggregation step. Code was largely LLM‑authored. We want a fast, principled sanity check focused on the aggregation and uncertainty story, plus a quick health check on keyword derivation to catch only major mistakes.

Day 1 (max 8 hours):

max 4 hours hands‑on replication and aggregation deep dive, max 2 hours issue triage with short written findings.

Follow‑on (max 1 to 2 days total):

implement the highest‑leverage fixes and rerun the affected figures.

Primary objectives

Verify and stress‑test our debiasing and aggregation of per‑method estimates on the log scale, including zero handling. Reason about at least one alternative estimator with uncertainty intervals that reflect sampling noise. Run a quick diagnostic on how the keyword list was derived to rule out a glaring construction mistake. No full re‑derivation is in scope unless a red flag appears:

Reproduce the pipeline end‑to-end in a pinned environment. Report any nondeterminism or hidden dependencies.
Inspect the current aggregator: log transform, per‑method bias correction, consensus point estimate, and “80 percent” interval. Identify whether the approach is sound or better alternatives exist.
Propose and implement one fast alternative method.
Do a sanity pass over the rest of the code / methods only for construction errors: tokenization rules that drop C++ or R, over‑aggressive stopwords, language mismatch. Document the impact.
This also includes spotting any misleading visuals that hide heavy tails or uncertainty. Propose alternatives.

Deliverables

1 to 3 page memo within the Day 1 cap: reproducibility notes, pass‑fail on red‑flag checks, and a prioritized list of fixes with expected effect sizes.
One implemented alternative aggregator with uncertainty and a rank‑stability plot.
fixed plots for affected figures / tables.

Must‑have qualifications

3 to 5+ years applied statistics or data science with measurement and aggregation work on messy signals.
Python: pandas or polars, numpy, scikit‑learn, matplotlib or plotnine, pytest.

Nice to have

Experience auditing LLM classifiers and prompt drift.

How to apply

In 3 to 5 sentences, outline your experience with methods aggregating similar data and how you would approach the overall exercise here. Bonus points if you already have something to say about the code attached. Sharing of previous work is highly appreciated.

NOTE / HINWEIS:

EN: Please refer to Fuchsjobs for the source of your application

DE: Bitte erwähne Fuchsjobs, als Quelle Deiner Bewerbung

Stelleninformationen

Typ:
Vollzeit
Arbeitsmodell:
Remote
Kategorie:
Erfahrung:
Senior
Arbeitsverhältnis:
Freelance
Veröffentlichungsdatum:
21 Aug 2025
Standort:

KI Suchagent

Möchtest über ähnliche Jobs informiert werden? Dann beauftrage jetzt den Fuchsjobs KI Suchagenten!