Job Description
About bitvizo
bitvizo is an AI-powered analytics platform for cryptocurrency markets.
Our backend stack: FastAPI (async I/O), Postgres, Redis (stateless), Celery for background jobs, Docker/K8s deployments, model storage on S3. GPU training available via AWS.
Models are trained periodically, stored, and selected based on the best evaluation metric.
Goal (Hard Requirement)
Raise precision of positive forecasts to ≥ 80% on our evaluation protocol (recall can be lower).
Deliver a reproducible training & evaluation pipeline and a production-ready model.
Scope of Work (1 Week)
Model Training & Evaluation
- Implement precision-first training objective with metrics tracking: Precision, Recall, F1, PR-AUC, Confusion Matrix.
- Run threshold sweep to find optimal probability cut-offs (Precision ≥ 80%).
- Compare LSTM, XGBoost, and CNN-LSTM; document & select best model for production.
Ensure compatibility with existing periodic training (every 60 min via Celery) and model selection logic.
Labels & Thresholds
- Current label: pump_threshold = 0.8% price increase over 5/10/30/60 min windows.
- Test variations (e.g., 1.0%, 1.25%, 1.5%) and/or multi-threshold probabilities from a single forward pass.
- Expose threshold selection in inference API/output.
Features (Model Inputs)
- Evaluate new feature set (new_features_metrics.py).
- Perform SHAP/permutation importance & correlation/VIF checks to remove redundancies.
- Preserve domain-critical features; deliver feature-importance report before removal.
- Explore sequence lengths (30 → 60/120 min), multi-scale inputs, and integrate volume/liquidity metrics, time features, cross-asset BTC returns.
Data & Compliance
- Use only internal DB data (no external datasets).
- Ensure UI predictions never exceed available data horizon (≤ 30 days).
- Persist prediction MAR footer fields (creator/model-ID, timestamps, reference price, methodology, score meaning, disclaimers) in DB and expose to frontend.
Production Integration
- Keep FastAPI endpoints & WebSocket topics compatible.
- Models persisted to S3 for sharing/recovery.
- Training remains single-runner via Celery.
Deliverables (End of Week)
- Reproducible training pipeline (scripts/notebooks + config) with logs & artifacts in S3.
- Evaluation report with all metrics, threshold sweeps, plots, and justification of the chosen threshold (Precision ≥ 80%).
- Feature-importance report + recommended reduced feature set with ablation results.
- CNN-LSTM implementation and comparison results.
- Inference update: multi-threshold outputs or user-selectable threshold.
- DB updates for MAR footer fields.
- Short handover doc (train/eval/deploy runbook + rollback steps).
Success Criteria
Hold-out set: Precision ≥ 80% at operational threshold (exact value reported).
Acceptable inference latency in periodic runs.
Pipeline runs in our stack (Docker/Celery/S3) and respects scheduler.
Nice-to-Have (If Time Allows)
- Sigmoid calibration / class-imbalance handling.
- Separate per-horizon models if better than multi-output.
Tech & Access
Python, PyTorch/TF, XGBoost, scikit-learn, SHAP.
FastAPI, Celery, Postgres, Redis, S3, Docker.
Collaboration via GitHub/Slack.
Dev environment with Swagger at /docs.
Timeline (Example)
Day 1: Data audit, baseline run, evaluation split, start feature importance.
Day 2: Threshold sweep, precision-first tuning, initial report.
Day 3: CNN-LSTM implementation, sequence-length tests, imbalance handling.
Day 4: Feature reduction candidate, retrain, compare models.
Day 5: MAR footer DB updates, inference threshold options, S3 artifacting.
Day 6: Final model selection, evaluation report, handover docs.
Day 7: Buffer for fixes, merge & deploy artifacts.
Budget & Contract
Fixed price for 1 week — please propose your amount with a brief effort breakdown.
15 h/week cap acceptable if you use internal hour tracking; deliverables due end-of-week.
Payment after weekly review/approval.
How to Apply
- Share relevant repos/notebooks from crypto time-series projects with precision targets.
- Briefly outline your approach to hit Precision ≥ 80% (which parameters/features you’d prioritize).
- Confirm experience with CNN-LSTM and threshold calibration.
- Include a short Day 1–2 plan.