It’s no exaggeration to say the development of Micronoma’s Oncobiota™ microbiome-driven liquid-biopsy technology to detect cancer at an early stage would not be possible without extensive expertise and use of machine learning (ML) tools.
Micronoma’s proprietary workflow system, increasingly clever machine learning tools, advances in high-performance computing, decreases in the cost of sequencing orthogonally, and a large increase in the depth of data collected have all converged to enable the company to sift through the massive amounts of information that need to be analyzed to show the relationships between microbes and cancer.
Though the terms “machine learning” and “artificial intelligence” are often used interchangeably, Micronoma Co-founder, President, and CEO Sandrine Miller-Montgomery, Pharm.D, Ph.D., said she prefers the term “machine learning” because it implies the use of a reiterative process and refinement that can be validated by the human brain and intelligence, while artificial intelligence sounds more like a machine is magically coming up with concepts or answers on its own with little or no human input.
In short, what the machines learn depends on what we decide to teach them, and as always the saying: “garbage in, garbage out” applies.
But not to worry, Micronoma’s co-founders have a long history of using machine learning and bioinformatics to study the microbiome, including co-founder of Micronoma, Rob Knight, Ph.D., whose lab has produced many of the software tools and laboratory techniques that enable high-throughput microbiome science.
Machine learning has played a pivotal role at every stage of Micronoma’s development, starting with helping Co-Founder and Chief Analytics Officer, Greg Sepich-Poore, Ph.D.,process the data that led to his original discovery, published in Nature, showing there was a very specific pattern of microbes present in the tumor tissue and the blood samples of patients with cancer.
Without the power of machine learning, Micronoma would not have been able to get off the ground, let alone lead the way in distinguishing microbial communities in different types of cancer. Cumbersome and time-consuming, traditional methods would have collapsed under the weight of the data being processed. For instance, it would have been impossible for Sepich-Poore to meaningfully mine The Cancer Genome Atlas (TCGA) data set, and even with the help of ML and some large computing tools it still took him six months to process the data that resulted in his initial discovery back in 2018.
Then, as Micronoma moved from discovery to application, the machine learning tools were refocused and refined to address early lung cancer detection, to recognize the difference in samples between someone with lung cancer and someone with some other form(s) of lung disease.
While being 100% sensitive and specific at all times is the goal, the reality is that no technology is perfect, especially when dealing with biological variations. Yet, with many incremental improvements in its machine learning tools made since the original discovery, Micronoma aims to set the gold standard of lung cancer detection at the earliest and trickiest stage of the disease: stage I. The goal is to optimize the microbial-focused algorithm at a level that provides a more accurate diagnostic tool than is currently commercially available.
“For us, stage I is really the ultimate goal; to get as close as possible to 100% of the stage I people that come through referred by clinicians,” Miller-Montgomery said.
In the case of lung cancer, Micronoma’s Oncobiota™ platform can be used in two key scenarios: 1) Currently in development is the method used for when patients have had a low-dose CT scan show a nodule and determining if that is cancer or a benign disease. 2) In the future, the method will be extended to at-risk patients for lung cancer as part of their routine annual screenings.
Micronoma uses the same technical workflow process in both scenarios. But machine learning enables the company to rephrase the question and tune it for the applications (1st scenario: is this cancer or benign nodules, 2nd scenario: is the patient healthy or do they have lung cancer?) Indeed, while both scenarios may sound similar, the first one is calibrated by studying samples from patients with benign or cancer nodules, while the second scenario is done comparing healthy patient samples with lung cancer patients. This demonstrates again that it is not just the question that matters, but what information you are providing the machine learning tools to provide the correct answers.
It is also important to realize that the way you tune the machine learning process for sensitivity or specificity depends on the questions. Again, while the ideal is 100% sensitivity (detecting cancer all the time) and 100% specificity (never falsely call someone who doesn’t have cancer as a cancer subject), in reality, we have to deal with trade-offs.
For example, those who have undergone basic lung cancer screenings because they are at risk (50 years old and above heavy smokers) could be as many as 15 million people in the US alone. Most of these people would (statistically) be healthy, while a small fraction would have lung cancer. The goal is to be as specific as possible, since you don’t want to send otherwise healthy people to have unnecessary procedures or additional tests that can be inconvenient, costly, and even dangerous. If we do the math, a method that sounds great–with a specificity of 90%–would translate into 10% of the screened population undergoing unnecessary tests. This is 1,500,000 people that may have a false health scare at great cost to the healthcare system. With this in mind, 90% specificity doesn’t sound that great anymore.
In the initial scenario we are focusing on–patients with an already-detected nodule, Micronoma needs to be able to provide a high-degree of sensitivity as it is crucial to not send a patient with active lung cancer home without treatment. We are targeting 97% or above sensitivity. While this may come at the cost of lower specificity in this smaller size population where one flag is already raised (presence of a nodule), it is important to ensure that the cancer does not remain undefined any longer because time is of the essence once the nodule is observed.
The demands being placed on machine learning have grown more intense as Micronoma and others seek answers from larger data sets. As an example, the Human Genome Project used machine learning to map 23,000 human genes. In contrast, there are more than 2 million microbial genes to be analyzed. The 100-fold magnitude is quite considerable. While mapping the human genome required the machine learning equivalent of a 10 lumen night light, Micronoma’s machine learning process is shining a 1,000 lumen light on the microbiome to reveal the relationships between microbes and cancer.
Without giving everything away, Micronoma is using a proprietary workflow that utilizes machine learning to be able to read through all these previously unmined data sets of microbial origins and get a meaningful interpretation out of it.
While Micronoma is doing some in depth stride on the “wet lab” protocols, machine learning done in the “dry lab” continues to be a key focus of development at Micronoma to keep moving quickly toward its goal of making many microbiome-driven cancer applications a reality.