(Senior) Data Manager / Data Governance for AI (f/m/d)
mbiomics was founded in 2020 by an experienced founding team with the vision to deliver effective microbiome-based therapeutics that will revolutionize the treatment of many chronic diseases. We recently closed a Series A funding, which enables us to start the development of game‑changing live‑bacterial therapeutics (LBT). mbiomics is leveraging its tailored microbiome analysis platform to overcome challenges in LBT development – by providing precision profiling data for improved bacterial consortia selection, patient stratification, and patient monitoring for clinical trials.
Position Summary: You will be responsible for establishing mbiomics’ data governance framework while also building the data infrastructure to make research and clinical data usable, compliant, and AI/ML‑ready. In this role you will set standards, implement cloud‑based solutions, and collaborate closely with scientists to improve the capture, organization, and accessibility of our data.
As a growing startup, this position combines research and clinical data governance leadership with hands‑on implementation. You will also play a key role in preparing mbiomics for the next generation of AI/ML applications by ensuring data is standardized, annotated, and interoperable, and by enabling teams to make effective use of enterprise AI tools.
Governance & Compliance
- Define and implement data governance policies (data lifecycle, data security, data integrity, metadata, standards, access, retention, lineage, alignment to infrastructure).
- Promote FAIR data principles across all R&D teams.
- Ensure compliance with data protection regulations (e.g., GDPR, EU IA act, HIPAA when applicable).
- Create governance documentation and ensure adoption of good practices across the company.
Implementation & Infrastructure
- Build and maintain pipelines/workflows for ingesting, organizing, and validating data (GCP/BigQuery preferred).
- Automate repetitive tasks such as metadata capture, file organization, and ontology mapping.
- Put in place a data catalogue and documentation systems for discoverability and reusability.
AI Readiness & Enablement
- Ensure datasets are properly structured, annotated, and versioned for AI/ML model training.
- Collaborate with computational biology and data science teams to prepare training datasets.
- Evaluate and integrate enterprise AI tools (LLM copilots, agents, workflow assistants) to accelerate documentation, validation, and reporting.
- Help non‑programmers use safe, structured workflows with AI assistants for data exploration and reporting.
Collaboration & Cross‑Functional Role
- Work closely with scientists to implement structured data capture at the point of generation.
- Provide training and support to promote adoption of governance standards and AI‑ready practices.
- Build lightweight dashboards or interfaces for exploration, QC, and usability.
- Serve as a bridge between research, computational biology, IT, and leadership.
- Communicate progress, risks, and needs clearly across stakeholders.
Required Skills and Competencies
- Strong understanding of data governance frameworks, FAIR principles, and metadata standards .
- Hands‑on experience with cloud‑based data management (GCP preferred).
- Strong skills in Python/SQL for data wrangling and pipeline development.
- Experience preparing datasets for AI/ML model training .
- Knowledge of at least one data management framework (e.g., Amsterdam, DAMA-DMBOK).
- Comfort with enterprise AI tools (LLM copilots, agent frameworks, documentation assistants).
- Familiarity with modern ML tooling (e.g., PyTorch, HuggingFace, OpenAI API, RAG architectures) as well as traditional approaches.
- Comfortable working with non‑programmers, in research settings with ambiguity and iterative feedback cycles.
- Excellent communication and training skills for working with non‑programmers.
- Highly organized, detail‑oriented, and comfortable balancing strategy with hands‑on execution .
- Desire to pursue professional development by acquiring new skills, presenting work at internal and external venues, and acting on feedback.
Preferred
- Background in life sciences, bioinformatics, or NGS data .
- Familiarity with workflow/containerization tools (Nextflow, Docker).
- Experience with clinical or regulatory data management .
- Knowledge of ontologies and controlled vocabularies in biology.
Nice to Have
Education and Experience
- College degree in related field (CS / software engineering / bioinformatics).
- At least 6-12 months industry experience.
- Experience with data analysis and workflow development.
Environment at mbiomics
- Experience the unique dynamics and spirit of a biotech start‑up.
- Our team? A colorful mix of international talents, humor and brilliant minds.
- We offer flexible working hours and 30 days of vacation.
- Benefits include our job ticket offer, free coffee, fruit and candy and regular team-events.
Seniority level
Director
Employment type
Full‑time
Job function
Research, Science, and Management
Industries
Biotechnology Research and Pharmaceutical Manufacturing
#J-18808-Ljbffr