top of page
Search

Multi-Cancer Detection with Scalable MLOps

  • Writer: Prasoon Prasoon
    Prasoon Prasoon
  • Aug 15, 2024
  • 2 min read

Updated: 5 days ago

A full-stack machine learning system designed to classify multiple cancer types from histopathological images using CNNs. Achieved 96% accuracy on a real-world dataset, with a modular, reproducible pipeline powered by TensorFlow, MLflow, Docker, and DVC — built to scale across medical imaging use cases.

The Problem

Early and accurate classification of cancer types can significantly improve clinical outcomes, especially when used for triaging or screening. However, ML systems in healthcare often fail to move beyond notebooks due to poor reproducibility, weak versioning, or lack of scalability.


The Solution

I built an end-to-end MLOps-enabled pipeline to detect and classify multiple types of cancer using histopathological images. Using a CNN model trained in TensorFlow, the system achieved 96% accuracy on the Multi-Cancer Kaggle Dataset, covering lung, colon, breast, and other cancer types.


What made this more than a modeling project was the focus on engineering:


  • Modular pipeline: Structured stages for data ingestion, preprocessing, model training, evaluation (with stratified cross-validation), and inference

  • Experiment tracking: All model runs logged and compared using MLflow

  • Version control for data & models: Handled with DVC

  • Containerization: The full system — including training, evaluation, and inference — was containerized using Docker for reproducibility and portability

  • Deployment-ready: Built a simple Flask UI for loading models and making inferences locally

  • CI/CD workflows: Integrated GitHub Actions to automate key checks for retraining or rollout


Designed to be reused across other medical imaging datasets, this project balances high model accuracy with real-world deployment practices, aligning closely with modern AI-in-healthcare workflows.


Tech Stack

  • ML Framework: TensorFlow (CNN-based classification)

  • Tracking & Versioning: MLflow, DVC

  • Containerization & Automation: Docker, GitHub Actions

  • Serving: Flask-based inference interface

  • Workflow Design: Modular stages from ingestion to deployment

  • Dataset: Multi-Cancer Kaggle dataset (images for 5+ cancer types)


For more details, checkout my Github page: https://github.com/pparashar21/CancerDetectionMLOps

 
 
 

Comments


bottom of page