Breast Cancer Prediction With IBM AutoAI Masterpiece

Mohamed Hashem
7 min readApr 23, 2021

Breast cancer is the most frequent of its type among women, impacting 2.1 million each year, and causing the greatest number of cancer-related deaths.

Imagine a world where we can predict such a fatal disease before it gets complicated, preventing painful consequences, and most importantly saving precious lives!

In this study, I will share with you how we’ve adopted machine learning in the medical research and promoted our scopes on so many levels using IBM AutoAI.

Unleashing the power of IBM AutoAI.

Initial Difficulties Before IBM AutoAI

  • We didn’t have a reliable all-in-one tool to feed us with valid predictions and hints.
  • Our decision-making criteria were lacking the proper abstraction of data.
  • We didn’t have a tool that is capable of applying various synchronized algorithms on datasets.

What We Have Achieved With IBM AutoAI

  • We got a native and fully automated graphical tool that processes our datasets and comes up with the best possible candidates!
  • Now we have more confidence in our internal decision criteria.
  • We’ve saved critical resources especially time which in return boosts our productivity.
  • Our revenue has reached a new record with an increase of +135% compared to the last month.

Problem Abstraction

Our target is to make a concise and accurate prediction that refers to (Diagnosis) column which in turn will decide if the patient is positive or not as follows:

  • 0 Indicates breast cancer low chance.
  • 1 Indicates breast cancer high chance.

Dataset Mapping

As a reference for better understanding, here are some technical details about the dataset columns:

  • Mean Radius: Mean of distances from center to points on the perimeter.
  • Mean Texture: Standard deviation of gray-scale values.
  • Mean Perimeter: mean size of the core breast tumor.
  • Mean Area: mean area of the core breast tumor.
  • Mean Smoothness: Mean of local variation in radius lengths.
  • Diagnosis: The patient positivity.

Prerequisites

I’ll assume that you have completed the following steps:

  • Create an IBM Cloud account.
  • Have your current data as a CSV file to be loaded.
  • Be ready to feel the power of IBM Cloud 🚀.

Note: Everything required to get started with IBM Cloud is completely free of charges, no credit card required, so feel free to create your account right now!

Roadmap

During this study we’ll explore various concepts that could be confusing or overwhelming at the first glance, so I’ve designed this visual roadmap to clarify and illustrate the entire process in one shot:

Step 1 —Creating Watson Studio Instance

In order to venture into IBM AutoAI we’ll need some initial preparations, firstly we’ll create a Watson Studio instance.

Just type and select Watson Studio from your IBM Cloud console search bar, it may take up to one minute for the first initialization:

It’s essential to create a project within Watson Studio to execute machine learning experiments:

Then we need to define a storage medium for the Watson Studio project, select an existed storage or click (Add) and follow the steps below, then click (Refresh):

Select the free (Lite) plan as a starting point and click (Create):

Our Watson Studio project is now available and ready for use 🤗.

We can load the dataset to the project and make something meaningful, let’s move to step 2.

Step 2 — Creating AutoAI Experiment

The dataset will be imported as an asset into the Watson Studio project for later use, now let’s proceed and create an AutoAI experiment:

Then associate a machine learning instance to your AutoAI experiment and click reload:

Link the dataset to your AutoAI experiment, you can upload it directly or select from the project assets that we’ve previously submitted:

Choose the column for the AutoAI experiment to predict which will be the absolute target, in our case we’ll naturally choose the Diagnosis column.

The prediction type is automatically set to Binary Classification as the target prediction column includes only two values of 0 and 1:

During the training process, you will see a beautiful visual infographic that displays the process of building pipelines using different algorithms and estimators:

IBM AutoAI will automatically conduct pipeline comparison to select the highest performant candidate:

A part of IBM AutoAI beauty is that it will run different pipelines against a single metric and come up with the highest-ranked one, ready to be deployed and go live in the next step:

Before we can use the pipelines to make predictions on new data, we need to save the top-ranked pipeline as a model for later deployment:

Step 3— Creating A Deployment From The Saved Model

To deploy the trained model we need to promote it to a deployment space as follows:

Select the target space if existed, or just create a new space and click (Promote):

Then select online deployment if you want to generate predictions online, in real-time:

Step 4— Testing The Deployed Model Within Watson Studio

Finally, we made it to the most interesting part of the entire operation, testing our deployed model against real-life metrics obtained from a former patient.

As we can see below, the result is positive and reflected by a value of 1 which indicates high positivity chance. That means our trained model works as expected and gives precise and reliable predictions, cheers! 🥳

Model Notebook

Click here to view this experiment model notebook.

We can easily create a notebook to apply further optimizations manually or interact with the model programmatically:

Web Service Application

Let’s go online and spread the impact.

Watson studio gives you multiple deployment options for creating your own live app including Java, JavaScript, Python, and Scala:

Check out this GitHub repository by IBM to familiarize yourself with the concept and generate your API key.

As an example, I’ve created this simple dockerized Flask web application to communicate with the deployed model:

Don’t forget updating the metrics within [app.py - forms.py - index.html] to reflect your dataset metrics:

The web application is now up and running: https://autoai.dadberg.com also the dataset is publicly available on this GitHub Gist.

I’ve upgraded the machine learning instance within IBM cloud from (Lite) to (v2 Standard) plan for production usage and heavy API requests:

Conclusion

In this study, we’ve achieved remarkable results that enrich the medical community and widen data scientists vision.

Other possible practical applications include: wildfires prevention, medical diagnosis, fraud detection, insurance estimation, solar power efficiency, psychological studies, and much more.

Apparently, IBM AutoAI is a revolutionary asset with endless use cases, give it a try and share your success story with the world! 🎉

--

--