The model uses repeated measurements of longitudinal predictors as opposed to variables collected once at baseline to assess risk of HCC after SVR.
A recently developed longitudinal model for predicting hepatocellular carcinoma (HCC) risk after sustained virologic response (SVR) using repeatedly measured data in a random survival forest (RSF) algorithm yielded accurate predictions and outperformed a baseline model.
In a validation set for 1-year prediction, the longitudinal model had an area under the receiver-operating characteristics curve (AUROC) of 0.9507 compared to 0.6113 for the baseline model, highlighting the impact of accounting for changes in predictor variables over time typically neglected by models built solely on baseline data.1
“Most of the published predictive models were built on a few variables collected at baseline using conventional modeling strategies. These models are usually mediocre in predictive performance since the risk of HCC can fluctuate over time as patients age, portal hypertension worsens, or liver stiffness increases,” wrote investigators.1 “In contrast, longitudinal models incorporating the repeated measurements of the predictor variables are able to capture the dynamic risk of HCC occurrence post-SVR.”
An estimated 2.7-3.9 million people in the US have chronic Hepatitis C, with approximately 17,000 new cases recorded each year.2 HCV accounts for about 34% of HCC cases in the US, with HCV-infected patients experiencing 15- to 20-fold increased risk of HCC. Despite the advent of direct-acting antivirals for HCV treatment and eradication, the residual risk of HCC persists and must continue to be monitored even after SVR is achieved.3
To improve the accuracy of predictive models for HCC risk in patients with HCV-related cirrhosis, Yanzheng Zou, of the department of epidemiology at Nanjing Medical University in China, and colleagues used the fast covariance estimation method to extract informative features from longitudinal patient data and incorporated them into the random survival forest (RSF) model. The performance of this longitudinal model was compared to that of a baseline model developed using RSF based on the same predictor variables but utilizing only a single measurement taken at baseline.1
The performance of both models was tested in a cohort of patients with HCV-related cirrhosis who achieved SVR with direct-acting antivirals at the Chronic Hepatitis C Research Program of Jiangsu between July 2012 and October 2020. Cirrhosis was diagnosed based either on a liver biopsy showing Metavir F4, a transient elastography score > 14 kPa, or clinical evidence. SVR was defined as a serum HCV RNA viral load below the lower limit of detection at least 12 weeks after completion of treatment.1
Patients who did not reach SVR after treatment, patients diagnosed with HCC prior to treatment, and patients who lacked the required serum biomarker values at baseline were excluded from the study. In total, 400 patients met eligibility criteria and were included in the study. Investigators randomly assigned 70% (n = 280) of participants to a training set used to develop the baseline and longitudinal models and 30% (n = 120) to a validation set for assessing the models’ performance, noting there were no significant differences in the baseline characteristics between the groups.1
The index date of the study was the start of DAA treatment. Patients were followed until HCC development, death, or November 2022, whichever came first. The primary outcome of interest was HCC occurrence after the index date.1
The predictor variables used in model development were classified into 2 categories: baseline predictors and longitudinal predictors. The baseline predictors, including age and gender, were collected at enrollment and did not change over time. The longitudinal predictors, including AFP, total bilirubin, direct bilirubin, ALT, AST, cholinesterase, ALP, GGT, total protein, and albumin, were subject to change over time and were measured multiple times after study enrollment when patients returned for medical visits during the follow-up period.1
Using the fast covariance estimation method, a new covariance-based functional principal component analysis method for extracting informative features from longitudinal data and presenting them as scores, these measures were included in the RSF model as time-independent covariates along with the 2 baseline variables.1
During a median follow-up time of approximately 5 years, 25 (8.9%) patients in the training set and 11 (9.2%) patients in the validation set developed HCC. The AUROC of the longitudinal model for predicting HCC was 0.9507 (95% confidence interval [CI], 0.8838–0.9997) at 1 year, 0.8767 (95% CI, 0.6972-0.9918) at 2 years, and 0.8307 (95% CI, 0.6941-0.9993) at 3 years. The AUROC of the baseline model was 0.6113 (95% CI, 0.4428-0.8000) at 1 year, 0.6213 (95% CI, 0.4801-0.7575) at 2 years, and 0.6480 (95% CI, 0.4865-0.7924) at 3 years, respectively.1
Investigators further assessed the efficacy of the longitudinal model using leave-one-out cross-validation. The AUROC of the longitudinal model fitted with the entire dataset in predicting HCC events 1, 2, and 3 years from year 3 was 0.8504, 0.7235, and 0.7173, respectively.1
Investigators pointed out AFP contributed to the prediction of HCC with greater variable importance (VIMP) compared to other predictors. Other key longitudinal predictors identified by VIMP included GGT, direct bilirubin, total bilirubin, albumin, and ALP. Age at baseline was also informative in predicting HCC development.1
“Our model could have a variety of applications in clinical practice. The model is particularly useful in resource-limited countries that do not have the capacity to offer surveillance to all cirrhotic patients, as it identifies high-risk patients based on a few simple laboratory biomarkers. Our model could also be used to identify high-risk patients for novel and relatively expensive surveillance strategies,” concluded investigators.1