Multi-Cancer Detection with Scalable MLOps
- Prasoon Prasoon
- Aug 15, 2024
- 2 min read
Updated: 5 days ago
A full-stack machine learning system designed to classify multiple cancer types from histopathological images using CNNs. Achieved 96% accuracy on a real-world dataset, with a modular, reproducible pipeline powered by TensorFlow, MLflow, Docker, and DVC — built to scale across medical imaging use cases.

The Problem
Early and accurate classification of cancer types can significantly improve clinical outcomes, especially when used for triaging or screening. However, ML systems in healthcare often fail to move beyond notebooks due to poor reproducibility, weak versioning, or lack of scalability.
The Solution
I built an end-to-end MLOps-enabled pipeline to detect and classify multiple types of cancer using histopathological images. Using a CNN model trained in TensorFlow, the system achieved 96% accuracy on the Multi-Cancer Kaggle Dataset, covering lung, colon, breast, and other cancer types.
What made this more than a modeling project was the focus on engineering:
Modular pipeline: Structured stages for data ingestion, preprocessing, model training, evaluation (with stratified cross-validation), and inference
Experiment tracking: All model runs logged and compared using MLflow
Version control for data & models: Handled with DVC
Containerization: The full system — including training, evaluation, and inference — was containerized using Docker for reproducibility and portability
Deployment-ready: Built a simple Flask UI for loading models and making inferences locally
CI/CD workflows: Integrated GitHub Actions to automate key checks for retraining or rollout
Designed to be reused across other medical imaging datasets, this project balances high model accuracy with real-world deployment practices, aligning closely with modern AI-in-healthcare workflows.
Tech Stack
ML Framework: TensorFlow (CNN-based classification)
Tracking & Versioning: MLflow, DVC
Containerization & Automation: Docker, GitHub Actions
Serving: Flask-based inference interface
Workflow Design: Modular stages from ingestion to deployment
Dataset: Multi-Cancer Kaggle dataset (images for 5+ cancer types)
For more details, checkout my Github page: https://github.com/pparashar21/CancerDetectionMLOps
Comments