Breast Tumor Quality Prediction With The Help Of Dataiku

a Supper & Supper Use Case

Assessment of breast cancer tumors-560×420

Project Goal

The project goal was to develop a machine learning model to predict breast tumor quality.

Two different data sets from a Wisconsin hospital with 570 and 700 cases, respectively, were used to address the problem. The first data set was based exclusively on the cellular level. For each image ten cell features with the corresponding mean, standard error and “worst” (mean of the three largest values) were computed. The second data set provided discrete values from one to ten of cell attributes and mitosis stage.

Challenges

Both data sets contained heterogenous distribution of benign and malignant groups.

Applied Methods (Implementation)

Taking into consideration, that the output groups were known, we implemented supervised learning algorithms like random forest and logistic regression. Accuracy was chosen as a metric to obtain a comparison between algorithms performances. These steps were carried out with Dataiku, the platform democratizing access to data and enabling enterprises to build their own path to AI.

Project outcome

Random forest approach performed with an accuracy of 99,7 % and of 99,3% respectively.

Using random forest method we obtained the importance of used variables of both data sets. For the first data set the means of the three largest values were critical for making predictions. It was notable that mitosis stage of the second date set played no role for the prediction.

Logistic regression approach performed with an accuracy of 99.6% in both cases. The confusion matrices based on the optimized F1-score were stored. The false prediction of benign tumor instead of malignant was penalized higher.

The threshold of the confusion matrices corresponded to the number beyond which the prediction was considered positive , the values were set to 0.475 and 0.25.

← back to use case overview

Technologies

Dataiku
random forest
logistic regression

Download

Use Case (long Version)

Social Sharing

Other Use Cases in this category

Dicreate_Chris2023-05-08T15:01:03+02:00

Breast Tumor Quality Prediction With The Help Of Dataiku

Project Goal

Dataset used

Challenges

Applied Methods (Implementation)

Project outcome

Category

Technologies

Download

Social Sharing

Other Use Cases in this category

Graph Database Verification and Optimization

Global Weed Management Platform for Field Trials

Explainable artificial intelligence for trial field analysis