Scaling Data Important for Creating Localized Influenza Forecasts

November 30, 2018

Article

Investigators test methods of incorporating smaller, localized datasets to improve influenza outbreak forecasts and increase community preparedness.

Investigators are testing methods of forecasting influenza outbreaks based on datasets from smaller regions than are currently used by the US Centers for Disease Control and Prevention (CDC). They hope these methods will provide more precise localized predictions that can increase time for community preparedness.

Haruka Morita, MPH, Department of Environmental Health Sciences, Mailman School of Public Health, Columbia University, New York, and colleagues anticipate the benefits of accurate forecasting on a finer geographic scale but advise that accurate prediction modeling will require addressing variations in types and amounts of data between localities.

Different types of local surveillance data can yield different, or fragmented indicators of future outbreak, they indicate. Tracking influenza-like illness incidents, for example, could signal start of an epidemic, while tracking hospitalizations is a better reflection of the virulence of a circulating strain. With predictions representing dynamics specific to the type of surveillance data, Morita and colleagues call for additional investigation of forecasting accuracy when disparate, local surveillance data are applied.

"Public health actions informed by forecasts generated from models that have not been developed and optimized with the expertise of experienced modelers could potentially damage population health and public trust," Morita told MD Magazine®. "It is essential to optimize your model using historical influenza surveillance data specific to the region or municipality for which the forecasts are being generated."

Morita and colleagues undertook an assessment of whether their real-time influenza forecasting model performed equally well with different types of local surveillance data, and they tested whether different system input adjustments could better accommodate the differences. Their model incorporates regional ambient humidity—known to modulate survival and transmission of influenza—along with real-time observations of influenza incidence, a dynamic state-space representation of the propagation of the influenza though a population, and a method of data assimilation that updates variables and parameters to match ongoing outbreak dynamics.

Three different system inputs, which can be specified by the forecaster, were tested to determine their effect on forecast accuracy: observational error variance (OEV), multiplicative inflation factor, and scaling of influenza data. OEV is described as an input for the model algorithm to account for error associated with observation. Inflation factor is applied to the variance of the observed state variable in order to counteract filter divergence, or departures of the model from the true trajectory, which occurs when the observations receive too little weight relative to the prior ensemble moments. Scaling is a factor used to map the surveillance data derived from selected populations, such as those seeking medical care, to the model, which simulates per capita incidence.

The investigators performed retrospective forecasts for all influenza seasons with available data, excluding 2008-2009 and 2009-2010 pandemic years for 3 sites with different geographic and population sizes: a county in Arizona, the state of Indiana, and a single county in Indiana. Each retrospective forecast was then compared to the metrics actually observed in the season. Accurate forecasting of peak timing was defined as being within ± 1 week of the observed peak, for example, and the forecasted peak intensity was deemed accurate if within 25% (±12.5%) of that observed peak.

Morita and colleagues reported that forecast accuracy differed significantly by data type as well as by system input. They described all tested data streams as relatively smooth, with data streams from the same locations tending to peak at or around the same week. No data type was distinguished as performing best, however, as this was inconsistent across different measures of forecast accuracy.

There was also no clear optimal combination of the 3 system inputs, but the investigators identified scaling as most critical to forecast accuracy. "Uninformed scaling choices can potentially lead to public health actions that are not optimized at best and harmful at worst, negatively impacting both population health and public trust," the investigators warned.

"In our study, we found that one parameter is particular critical to forecast accuracy," Morita said. "Retrospective forecasts allow us to calibrate such model parameters appropriately to the specific locale and type of surveillance data."

The study, “Influenza forecast optimization when using different surveillance data types and geographic scale,” was published in Influenza and Other Respiratory Viruses.