Researchers at Stanford University have created an artificially intelligent (AI) diagnosis algorithm for skin cancer that matched the performance of certified dermatologists.
By making a database of nearly 130,000 skin disease images and training their algorithm to visually diagnose potential cancer, the researchers tested the final product against 21 certified dermatologists. In its diagnoses of skin lesions, which represented the most common and deadliest skin cancers, the algorithm matched the performance of dermatologists.
There are about 5.4 million new cases of skin cancer in the United States every year, and while the five-year survival rate for melanoma detected in its earliest states is around 97 percent, that drops to about 14 percent if it's detected in its latest stages.
Diagnosing skin cancer begins with a visual examination. A dermatologist usually looks at the suspicious lesion with the naked eye and with the aid of a dermatoscope, which is a handheld microscope that provides low-level magnification of the skin. If these methods are inconclusive or lead the dermatologist to believe the lesion is cancerous, a biopsy is the next step.
In hopes of creating better access to medical care, the researchers set out to create an algorithm for skin cancer and reported their findings in this week's issues of Nature.
"We realized it was feasible, not just to do something well, but as well as a human dermatologist," said Sebastian Thrun, an adjunct professor in the Stanford Artificial Intelligence Laboratory. "That's when our thinking changed. That's when we said, 'Look, this is not just a class project for students, this is an opportunity to do something great for humanity.'"
Bringing the algorithm into the examination process follows a trend in computing that combines visual processing with deep learning, a type of artificial intelligence modeled after neural networks in the brain. "We made a very powerful machine learning algorithm that learns from data," said Andre Esteva, co-lead author of the paper and a graduate student in the Thrun lab. "Instead of writing into computer code exactly what to look for, you let the algorithm figure it out."
"There's no huge dataset of skin cancer that we can just train our algorithms on, so we had to make our own," Brett Kuprel, co-lead author of the paper and a graduate student in the Thrun lab, was quoted as saying in a news release from Stanford University in Northern California on the U.S. west coast.
"We gathered images from the internet and worked with the medical school to create a nice taxonomy out of data that was very messy - the labels alone were in several languages, including German, Arabic and Latin," Kuprel said.
After going through the necessary translations, the researchers collaborated with dermatologists at Stanford Medicine, as well as Helen M. Blau, professor of microbiology and immunology at Stanford and co-author of the paper. Together, the interdisciplinary team worked to classify the hodgepodge of internet images. Many of these, unlike those taken by medical professionals, were varied in terms of angle, zoom and lighting.
In the end, they amassed about 130,000 images of skin lesions representing over 2,000 different diseases.
During testing, the researchers used only high-quality, biopsy-confirmed images provided by the University of Edinburgh and the International Skin Imaging Collaboration Project that represented the most common and deadliest skin cancers - malignant carcinomas and malignant melanomas. The 21 dermatologists were asked whether, based on each image, they would proceed with biopsy or treatment, or reassure the patient.
The researchers evaluated success by how well the dermatologists were able to correctly diagnose both cancerous and non-cancerous lesions in over 370 images.
The algorithm's performance was measured through the creation of a sensitivity-specificity curve, where sensitivity represented its ability to correctly identify malignant lesions and specificity represented its ability to correctly identify benign lesions. It was assessed through three key diagnostic tasks: keratinocyte carcinoma classification, melanoma classification, and melanoma classification when viewed using dermoscopy.
In all three tasks, the algorithm matched the performance of the dermatologists with the area under the sensitivity-specificity curve amounting to at least 91 percent of the total area of the graph.
Although the algorithm currently exists on a computer, the researchers believe it will be relatively easy to transition the algorithm to mobile devices.