Author: Tim Salimans, Mark-Jan Harte, Gerben van Veenendaal Repository: https://bitbucket.org/aidence/kaggle-data-science-bowl-2017/src/38c4f2f67294?at=master The 3rd place at the Data Science Bowl 2017 on the private leaderboard.
tensorflow==1.1 opencv>=3.1 scipy==0.17.0 numpy==1.13 scikit-learn==0.19.0 pydicom==0.9.9 SimpleITK==1.0.1 pandas==0.20.3 pycuda==2017.1.1
Resampling to the isotropic resolutions of and for the final model.
Fully convolutional Resnet has been employed in order to detect for each pixel whether it is contained in the center of a nodule. It was trained it over the LIDC/IDRI dataset. Two of those models has been trained: one for normal sized nodules and one for masses. The masses on the train data of Kaggle have been annotated and the mass network has been trained on both masses from LIDC/IDRI as well as masses from Kaggle. Takes the logit output of that network for the whole volume and thresholds it to determine candidates. It also masks out nodules outside the lung.
Prediction of cancer probability¶
Takes the candidates and trains some attributes of the LIDC dataset (malignancy, etc.) and trains the cancer label for the Kaggle scans in a multi-task model.
Usage instructions: README
Training- / prediction time¶
|GPU||Nvidia K80||4 for everything but the final model <br/> 8 for the final model|
It takes about 3-5 days to run everything (infer+train) on a decent machine with 8 GPUs. Prediction time:unknown, but must be less than 14 min per CT, since it processes the 506 CTs for the 5 days
Dataset: Data Science Bowl evaluation dataset
When to use this algorithm¶
- The annotation for the mass and nodules over the Kaggle dataset, provided by the aidence team, can be used in futher fine-tunings / retrainings.
When to avoid this algorithm¶
- even with GPU support the approach of per voxel examination may consume a huge amount of time. The authors have used 8 GPUs Nvidia K80 which is
Adaptation into Concept To Clinic¶
Porting to Python 3.5+¶
The solution is already compatible with Python 3.5+
Porting to run on CPU and GPU¶
The approach consists of two deep 3D residual networks for classification (which runs through each
voxel from a CT scan). It’ll require a huge amount of time to even predict with this pipeline using CPU only.
Improvements on the code base¶
The code itself looks good to me.