Reproducing CheXNet with PyTorch

John Zech
3 min readMay 1, 2018
Predictions for a test image run remotely in the browser with binder

I am sharing on GitHub PyTorch code to reproduce the results of CheXNet. CheXNet, the paper from Rajpurkar et al., predicted 14 common diagnoses using convolutional neural networks in over 100,000 NIH chest x-rays. Radiologists and other interested individuals, regardless of deep learning experience, can explore the model’s predictions and underlying code online on binder, a cloud-based service that runs in the browser. Those who are comfortable with PyTorch can clone the repo, download the NIH data, and retrain the model for themselves on a GPU.

It is important that deep learning for radiology research be available to both the healthcare and machine learning communities. Like many others, I appreciated Ali Rahimi’s talk at NIPS 2017 describing deep learning as ‘alchemy’:

“If you’re building photo sharing services, alchemy is fine. But we’re now building systems that govern healthcare and our participation in civil debate. I would like to live in a world whose systems are build on rigorous, reliable, verifiable knowledge, and not on alchemy.”

In healthcare, it is particularly true that deep learning is frequently treated as alchemy. Most healthcare datasets cannot be publicly distributed due to concerns about de-anonymization of patient data, and thus most deep learning models trained to healthcare data cannot be inspected and reviewed by the scientific community.

In the machine learning community, new algorithms are tested and demonstrated on standard datasets like MNIST or ImageNet. These open benchmarks have facilitated rapid progress, as code can easily be shared to reproduce an algorithm’s performance, and these results are directly comparable. Researchers can rapidly iterate on each others’ ideas. NIH’s release of over 100,000 chest x-rays is a first step towards creating these kind of datasets in healthcare and thereby enabling open scientific research to diagnose disease using deep learning.

While there is enthusiasm for adopting deep learning-based decision support in radiology, the demonstrations I have seen came from companies whose implementations are proprietary and not available for review. I believe that broad scrutiny of these models in the healthcare context is necessary for both progress and safety. It will lead to refinements that improve real-world diagnostic performance and increase physician trust in and adoption of this technology. Tools like binder from Project Jupyter can help greatly in making code reproducible by others.

Much work remains to address Rahimi’s primary criticisms about better characterizing the performance properties of deep learning-based models. Those of us applying deep learning to healthcare can contribute by making our work more transparent and reproducible. I appreciate that this will not always be possible due to legitimate data privacy concerns. However, I would very much like to see reproducible demonstrations on public datasets become a vital part of research in our community. I hope that this repo can help contribute.

Click here to try the code on binder

Click here to view the github repo

Many thanks to the researchers and developers at PyTorch, NIH, Stanford, and Project Jupyter, on whose generous work this project relies.

--

--

John Zech

Radiology resident @ColumbiaRadRes, passionate about machine learning. @johnrzech