How can we accurately predict clinical trial terminations?
Since 2007, the Food and Drug Administration Amendments Act (FDAAA) requires that clinical trials register on an online database if they are conducted under an FDA investigation for a new drug/device, involve a drug/device made in the US but exported for research, or if the trials have one or more sites in the US.
Each year thousands of clinical trials register with the US National Library of Medicine's website, ClinicalTrials.gov. The website lists studies in all 50 US states and 219 countries. The website boasts annual visitors of 3.5 million monthly (as of March 2020).
Not all registered clinical trials succeed and many terminate early. Only a few studies have tried to model a comparison of early clinical trial terminations in order to predict which types of clinical trials terminate early. Researchers at FAU have developed artificial intelligence models to predict clinical trial Terminations. The research team includes Boca Biolistics Group Leader of Technology, Magdalyn Elkin. Elkin is a PhD candidate at Florida Atlantic University. The research was partially sponsored by the National Science Foundation and was conducted with the supervision of Elkin's PhD advisor, Dr. Xingquan Zhu. The report, Predictive Modeling of Clinical Trial Terminations Using Feature Engineering and Embedding Learning, was published in Nature Scientific Reports on February 10, 2020. The following paragraphs describe in some detail the findings of the study,
What Kinds of Activities Register as Clinical Trials?
Clinical trials come in various health intervention types. Clinical trials may study treatments by pharmaceutical or biologic means. The studies may include behavioral interventions, or they may pertain to the testing of experimental surgical devices. Modern society understands that the efficacy of these healthcare studies is imperative for the development of safe medical improvements in how the health care providers treat, diagnose, and more fully grasp the pathology of various diseases.
As of February 2021, there were 368,348 studies registered with the ClinicalTrials.gov site. The site breaks down the studies by location and percentage of the total. The non-US only trials account for 185,735 (60%). The US only trials were 19,235 (34%). Trials that covered both the US and Non-US participants were a small percentage (5%) covering only 3,011 participants. The site also counted 139 trials that did not state the location and which were, therefore, a negligible percentage.
Why Study Terminated Clinical Trials?
So why is understanding clinical trial terminations important? Clinical trials are specialized studies that seek to determine the efficacy of a medical treatment by conducting tests on human subjects. Researchers consider the best way to evaluate a medical treatment or other intervention is to use random, directed trials where the study's participants are selected at random to receive one of several types of medical treatments.
Clinical trials, however, are expensive. They are also a primary driver of the increased costs of drug development, estimated to rise at the rate of 7.4% in the US. In the final analysis, terminated clinical trials represent squandered expenses that could have been allocated to other studies. It naturally follows that a terminated clinical trial denies the healthcare community the benefits of the latest in scientific contribution with respect to that effort.
In order to study terminated trials, the research team created a testbed of 68,999 completed and terminated samples from the ClinicalTrials.gov database. The study proposed to use machine learning in order to understand the common factors behind terminating clinical trials. The study wanted to answer two questions:
- Question #1: What common factors (or markers) relate to terminated clinical trials? The answer to Q1 would assist future trial designers to better plan their trials by understanding the traits that are common to terminated clinical trials.
- Question #2: How can scientists tell in advance (and accurately) whether a clinical trial will terminate early? The answer to Q2 would help reduce clinical trial costs by accurately estimating the chances of a clinical trial's success.
How Was The Study Carried Out?
Using feature engineering and embedding learning, the research team created three types of features to represent each clinical trial:
- Statistic features that describe the basic characteristics of the clinical trials studied;
- Keyword features that contain the keywords applicable to both the conditions inherent in the clinical trials and the healthcare interventions used; and
- Embedding features. The embedding features used neural networks to represent large blocks of text as a numerical vector. Machine Learning algorithms require numerical representation of objects for faster, smoother pattern processing.
By ranking the various features, the study found that having an eligibility requirement, the number of words listed in the eligibility requirement, oncology keywords, and industry vs non-industry sponsors were related to trial early termination.
Clinical Trial Termination Prediction
The dataset used suffered from the class imbalance problem, where 88% of the samples were completed and 11% were terminated. When using machine learning for predictive modeling, imbalanced classification causes low accuracy in the predictive effect of those models that assume balanced classification. Imbalanced classification occurs where there are super majorities of completed trials in comparison to terminated trials. In those cases, the predictive models are overwhelmed by the majority class and misclassify the minority class. Predictive models could achieve as high as 88% accuracy while not correctly predicting any terminated samples. In general, machine learning works best when the number of samples of data in a class are about the same as the samples in other classes.
To overcome the class imbalance issue in this study, random under-sampling of the majority class was used to achieve an even ratio of samples to train the models. Since this introduces a bias, it is done 10 times to create an ensemble of models for prediction. The researchers then combine the results of the ensemble of models to create predictive results with a higher degree of balanced accuracy. The ensemble models increase the balanced accuracy from 50% (where only one class is correctly classified) to 67% (more minority or terminate trials are correctly classified).
The Study Results
The study resulted in a finding that the combination of all the features listed above provided the best predictive termination results. The study used sampling and ensemble learning to develop a AUC score of 73%. And a balanced accuracy score of 67%. This takes into account the imbalanced class ratio in determining the early termination of clinical trials. The predictive modeling offers insight for stakeholders to better plan clinical trials to avoid waste and ensure success.