Bayesian spatio-temporal model with INLA for dengue fever risk prediction in Costa Rica Shu Wei Chou-Chen1,2*, Luis A. Barboza1,3, Paola Vásquez1, Yury E. Garćıa4, Juan G. Calvo1,3, Hugo G. Hidalgo5, Fabio Sanchez1,3 1*Centro de Investigación en Matematica Pura y Aplicada, Universidad de Costa Rica, San José, Costa Rica. 2Escuela de Estad́ıstica, Universidad de Costa Rica, San José, Costa Rica. 3Escuela de Matemática, Universidad de Costa Rica, San José, Costa Rica. 4Department of Public Health Sciences, University of California Davis, CA, USA. 5Centro de Investigaciones Geof́ısicas and Escuela de F́ısica, Universidad de Costa Rica, San José, Costa Rica. *Corresponding author(s). E-mail(s): shuwei.chou@ucr.ac.cr; Abstract Due to the rapid geographic spread of the Aedes mosquito and the increase in dengue incidence, dengue fever has been an increasing concern for public health authorities in tropical and subtropical countries worldwide. Significant challenges such as climate change, the burden on health systems, and the rise of insecticide resistance highlight the need to introduce new and cost-effective tools for develop- ing public health interventions. Various and locally adapted statistical methods for developing climate-based early warning systems have increasingly been an area of interest and research worldwide. Costa Rica, a country with microcli- mates and endemic circulation of the dengue virus (DENV) since 1993, provides ideal conditions for developing projection models with the potential to help guide public health efforts and interventions to control and monitor future dengue out- breaks. Climate information were incorporated to model and forecast the dengue cases and relative risks using a Bayesian spatio-temporal model, from 2000 to 2021, in 32 Costa Rican municipalities. This approach is capable of analyzing the spatio-temporal behavior of dengue and also producing reliable predictions. 1 Keywords: Public Health, Bayesian inference, spatio-temporal models, climate, vector-borne diseases MSC Classification: 62F15 , 62P10 , 62P12 1 Introduction Dengue is one of the most prevalent vector-borne diseases globally, affecting individuals of all ages. The infection can be asymptomatic or cause a broad spectrum of clinical manifestations that range from a non-specific and auto-limited viral syndrome to a disease with hemorrhagic manifestations and multi-systemic damage that can lead to the death of the patient (World Health Organization, 2022). The infection is caused by one of four dengue virus serotypes (DENV 1–4) transmitted to humans through the bite of infected female mosquitoes, primarily by Aedes aegypti and Aedes albopictus as a secondary vector. Asymptomatic infections occur much more often than symptomatic cases, and studies have shown that this occurrence varies by region, epidemiological context, patient status, and the circulating type of DENV. Ratios ranging from 5.5:1 to 14:1 times the reported number of infections have been reported in various studies (Chastel, 2012). The interaction of a variety of factors, including globalization, trade, travel, demo- graphic trends, and warming temperatures, have been associated with the spread of the mosquito, which is now present on all continents except Antarctica (Medlock, Avenell, Barrass, & Leach, 2006; Romi, Severini, & Toma, 2006), making it one of the 100 worst invasive species in the world (Outammassine, Zouhair, & Loqman, 2022). It has also led to the emergence of the disease in places where it was previously absent (Gubler, 2012; Lopez et al., 2016; Massad et al., 2018; Murray, Quam, & Wilder-Smith, 2013; Wang et al., 2022), putting more than half of the world’s population at risk of infection, mainly in tropical and subtropical regions (World Health Organization, 2022). In this context, and without effective prevention and control measures, dengue is expected to continue its geographical expansion (Messina et al., 2019; Yang, Quam, Zhang, & Sang, 2021). Due to the severely limited vaccine availability and antiviral drugs, the key to preventing and controlling outbreaks continues to be the reduction of breeding sites through chemical and biological interventions as well as the active involvement of the community, which is mainly dependent on available resources of the affected countries. Given that the ecology of the virus is intrinsically tied to the ecology of mosquitoes that transmit dengue (Morin, Comrie, & Ernst, 2013), climatic and environmental conditions can alter spatial and temporal dynamics of vector ecology, especially tem- perature, rainfall, and relative humidity (Campbell et al., 2015; Naish et al., 2014). Temperature affects viral amplification (Watts, Burke, Harrison, Whitmire, & Nisalak, 1986), increases vector survival, reproduction, and biting rate (Rueda, Patel, Axtell, & Stinner, 1990; Tun-Lin, Burkot, & Kay, 2000). Long-term breeding habitats for eggs, larvae, and pupae may be influenced by wastewater left by rainfall or human behavior 2 of storing water in containers (Dieng et al., 2012; Sarfraz et al., 2012). Dengue inci- dence has also been associated with vegetation indices, tree cover, housing quality, and surrounding land cover (Barrera, Amador, & Clark, 2006; Van Benthem et al., 2005). Understanding the influence of these climatic variables on disease incidence in different regions can lead to early detection of disease progression, guide resource allo- cation, and implement appropriate health intervention (World Health Organization, 2022). In this effort, several methods of surveillance systems have been developed (Eid- son et al., 2001; European Centre for Disease Prevention and Control, 2022; Mateus & Carrasquilla, 2011). However, successful early warning strategies are limited due to the complex and dynamic nature of the disease. The complex interaction of biological, socioeconomic, environmental, and climatic factors creates substantial spatio-temporal heterogeneity in the intensity of dengue. It imposes a challenge in the creation of surveillance and control systems. In Costa Rica, a Central American country with a variety of microclimates in an area of 51,179 km2, dengue has been endemic since 1993 and has represented a public health burden since then. According to the Ministry of Health, more than 398,000 cases have been reported during the last 28 years (Ministerio de Salud, 2022a). DENV- 1, DENV-2, and DENV-3 have been the main serotypes in circulation. However, it is essential to highlight that in 2022 the circulation of serotype four was identified in different municipalities in the country. Since this serotype has historically been absent in national territory, its emergence and lack of immunity in the country which can lead to an increase in the incidence of cases reinforces the need to increase prevention, detection, and timely treatment efforts and thus avoid an increase in the incidence and evolution of more severe forms of the disease. Based on dengue’s historical incidence burden and suitable environment for disease transmission, the Costa Rican health authorities have identified 32 municipalities of interest (out of the 83 municipalities the country is divided) where the study was conducted. The selection included mainly municipalities located in the coastal areas on the Pacific and Caribbean coasts and municipalities in the Great Metropolitan Area, the country’s most urban and populated region. This selection was also based on the decision-maker’s necessity to include new and cost-effective tools to guide the allocation of resources throughout the year. A series of efforts (Barboza et al., 2023; Garcia et al., 2023; Vásquez, Loŕıa, Sanchez, & Barboza, 2020) has been carried out to explore and develop an early warn- ing system to monitor dengue risk in the country. The aim is to provide guidance tools to the health authorities of Costa Rica that can be implemented and validated and to optimize and distribute resources in the prevention and control of dengue. These stud- ies considered dengue cases as time series that are registered in independent areas so that machine learning tools and statistical models can be used to predict dengue rela- tive risks separately for each area. However, this assumption is restricted since nearer areas may share similar characteristics to be modeled. A spatio-temporal approach has been employed to predict dengue outbreaks in Latin America, and the results are promising (see e.g. Akter et al., 2021; Desjardins, Eastin, Paul, Casas, & Delmelle, 2020; Lowe et al., 2011; Muñoz, Poveda, Arbeláez, & Vélez, 2021). 3 In this work, we employ a Bayesian spatio-temporal model to predict dengue risks in Costa Rica. This model incorporates climatic variables and geographic information to capture the effect of factors that modulate the spatio-temporal variation of dengue incidence in the country and whose information is not available, such as population mobility, socioeconomic and demographic information. Our method yields superior results compared to previous studies (Barboza et al., 2023; Vásquez et al., 2020) and provides valuable insights into the spatial and temporal structure of dengue risks in Costa Rica. This paper is divided as follows: Section 2 describes the data, statistical model, assumptions, and implemented methodologies. Section 3 presents the results for the 32 municipalities of interest. Finally, section 4 discusses this modeling approach’s results, limitations, advantages, and future work. 2 Methods 2.1 Data description 2.1.1 Dengue Cases For this analysis, we used monthly dengue cases for 32 municipalities (known as can- tons in Spanish) of interest to public health authorities in Costa Rica. The data covered 2000-2021 obtained from the Ministry of Health of Costa Rica (Ministerio de Salud, 2022b). A general time series of dengue cases for 32 municipalities, along with a choro- pleth map indicating population density and displaying the name of each municipality, is presented in Figure 1. We also incorporated the relative risk (RR) to quantify the relative incidence of dengue for municipality i and month t: RRi,t = Casesi,t Populationi,t CasesCR,t PopulationCR,t . 2.1.2 Climate variables 1. Daily Precipitation estimates (Pi,t) were used to index land surface rainfall. Data were obtained from the Climate Hazards Group InfraRed Precipitation with Sta- tion data (CHIRPs); see Funk et al. (2015). Due to the high-resolution spatial nature of this dataset (5km by 5km), we were able to compute monthly cumulative rainfall estimates for each municipality by adding the exact estimate over smaller administrative areas (distritos). 2. El Niño Southern Oscillation (ENSO, Si,t) variations were indexed using the Sea Surface Temperature Anomaly (SSTA) index for the region known as Niño 3.4 (5N-5S, 120W-170W). monthly data was obtained from the Climate Prediction Center (CPC) of the United States National Oceanographic and Atmospheric Administration (NOAA) (see NOAA, n.d.). 3. Normalized Difference Vegetation Index (NDVI, Ni,t), an index of the greenness of vegetation for a 16-day time resolution and 250m spatial resolution. It was obtained 4 from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite and available through the MODISTools R package (see Tuck et al., 2014). 4. Daytime Land Surface Temperature (LST, Li,t) in degrees Kelvin for an 8-day time resolution and 1km spatial resolution were obtained using the same resources as the NDVI covariate. 5. Tropical Northern Atlantic Index (TNA, TNi,t). Anomaly index of the Sea-Surface Temperature Anomaly (SSTA) over the eastern tropical North Atlantic Ocean (see Enfield, Mestas-Nuñez, Mayer, & Cid-Serrano, 1999). Previous work in the region (Hidalgo, Alfaro, & Quesada-Montano, 2017) suggested that including SSTA information from the Caribbean/Atlantic improves the performance of the predic- tion of land surface precipitation and temperature in Central America compared to forecasts produced with only Pacific Ocean ENSO conditions. Fig. 1: Time series depicting the cumulative dengue cases across 32 municipalities in Costa Rica (displayed in the left panel), accompanied by a map illustrating these municipalities along with their respective total population. 2.2 Model We incorporate the historical exposure of the climate covariates and the behavior of the relative risks in the past by applying the Distributed Lag Non-Linear Model (DLNM) framework (Gasparrini, 2014; Gasparrini, Armstrong, & Kenward, 2010). This methodology incorporates a bi-dimensional space of functions that specifies an exposure-lag-response function f ·w(x, l), which depends on the predictor x along the time lags l in a combined way. This combination specifies a non-linear and delayed association between a climate covariate and dengue incidence. For each covariate, we consider a minimum of 3-month lag and a maximum exposure of 12 months to obtain 5 the estimates up to a maximum of a three-month ahead prediction. This combination of lags was determined based on the cross-correlation and wavelet behavior among the series (see Garcia et al., 2023). We also tested the model with different combinations of maximum and minimum lags. To compare the predictive gain of the non-linear specification against the linear ones, we compared models with a non-linear relationship of the delayed effect of expo- sure on the outcome using natural cubic B-splines function with 2 knots for each covariate and delayed effects against models with the linear relationship of the delayed effect of exposure on the outcome. After specifying the historical exposure of the covariates, we use a spatio-temporal Bayesian hierarchical model with the response variable as the monthly number of cases of dengue fever for each municipality i, i = 1, ..., 32 for t = 1, ..., 264 as follows: Yit|µit, κ ∼ NegBin(µit, κ), (1) log(µit) = log(Eit) + log(RRit), (2) and logRRit = α+ f1(RRt) + f2(Pt) + f3(St) + f4(Nt) + f5(Lt) + f6(TNt) + f7(Mt) + ϕi,(month) + θi,(year), where Eij is the expected number of cases, RRit is the relative risk of the municipality i and month t, fk, k = 1, ..., 7 is the exposure-lag-response function that applies a linear effect on each climate covariate from lag 3 to 12; ϕi,(month) is the municipality-specific monthly random effect that follows a prior according to a cyclic random walk of order 1, i.e., ϕi,(month) − ϕi,(month−1) ∼ N(0, σ2 ϕ); and θi,(year) is a random spatial effect. For the spatial effect, two types of proximity matrix W are defined: 1. The usual neighbor matrix is defined by W = {W}ij = 1 if municipality i and j are neighbors, and 0 otherwise. 2. An alternative distance matrix based on the main road distance in kilometers between the central downtown of each pair of municipalities, i.e. W = {W}ij = 1 if the distance is less than its overall median and 0 otherwise. We incorporate this distance to provide a more realistic way to measure the proximity between social dynamics. Four types of spatial structures are implemented. First, the independent case is assumed. Then, we used the intrinsic conditional auto-regressive (CAR) specification with improper prior, the CAR model with proper prior, and also the Besag-York-Mollie (BYM) model (Besag, York, & Mollié, 1991). Specifically, the CAR specification for the spatial effect for a specific year is defined by: θi,(year)|θj,(year),τθ ∼ N ( 1 ni ∑ j∼i θj,(year), 1 τθni ) , 6 where τθ is the conditional precision, j ∼ i denotes that W ij = 1, and ni is the num- bers of neighbors, according to the definition of the two types of proximity matrix. The proper CAR model is obtained by adding a positive quantity d to ni, whereas the BYM model is obtained by adding an unstructured random effect per munici- pality. For more details, see Bivand, Gómez-Rubio, and Rue (2015). The R packages dlnm (Gasparrini, 2011) and INLA (Rue, Martino, & Chopin, 2009) were used for all calculations. Finally, we performed two simple forecasting procedures to compare them with the proposed model regarding predictive skill. Both methods use the training set for the last five years (2015-2019). First, näıve forecasting uses a simple monthly mean ˆRRit for the last five years. This method is easy to compute, but the prediction interval can only be calculated by assuming strong assumptions, such as independence; thus, only NMRSE can be computed. Second, as an alternative, we estimated the model (1) with logRRit = α, as a Negative Binomial null model so that the prediction uncertainty can be computed in this case. The main objective of performing these two prediction procedures is to compare how the proposed model outperforms these two simple algorithms by using a more complex structure composed of the covariates and spatial random effect. 2.3 Model selection and prediction Previous studies (Barboza et al., 2023; Garcia et al., 2023) have shown that these cli- matic covariates are essential to predict dengue incidence. To begin the calibration, a training period is chosen to fit the model (1) using different combinations of covari- ates and spatio-temporal configurations of the model. First, the DLNM framework allows us to choose different combinations of maximum and minimum lags of histori- cal information on the covariates. Moreover, the basis was chosen to be nonlinear and linear for all covariates. Finally, four spatial structures were fitted: independent, CAR, proper CAR, and BYM models. The best model was chosen by comparing all fitted models by different criteria. The deviance information criteria (DIC) and the mean cross-validation (CV) log score are calculated for each model. The DIC is a measure that contemplates the model’s precision and complexity. At the same time, the CV log score is a criterion that measures the model’s predictive capacity, letting one data out at a time. Finally, two metrics to compare the predictive performance of each model are computed. The normalized Mean-Squared Error (NRMSE): NRMSE = √√√√ 1 mRR m∑ t=1 (RRt − R̂Rt)2, where m is the number of months in the testing period, RR is the mean relative risk over the same period, and R̂R is the estimated relative risk according to any of the two 7 models. The normalized Interval Score at α level (NISα) is the normalized version of the Interval Score (see Winkler & Murphy, 1968 and Gneiting & Raftery, 2007): NISα = 1 mRR m∑ t=1 [ (Ut − Lt) + 2 1− α (Lt −RRt) · 1RRtUt ] , where Ut and Lt are the upper and lower limits of the prediction interval, respec- tively, the latter metric is more complete in evaluating the models’ predictive capacity when the uncertainty is summarized through a predictive interval (Gneiting & Raftery, 2007). It has been used in previous predictive studies on dengue fever in Costa Rica (see Barboza et al., 2023; Vásquez et al., 2020). We use the normalized version of RMSE and IS because we can compare different locations regardless of the scale of their relative risk. 3 Results We fitted the model (1) specifying different structures to the dengue data using all climate covariates. The training period consists of monthly data from January 2000 to December 2020, and the testing period from January 2021 to March 2021. Regarding the DLNM framework, specifying the maximum and minimum lags to incorporate historical exposure of climate covariates is challenging. We set the maximum and minimum lags as 12 and 3 lags, respectively, so that we can predict the dengue data for up to three months. We also fitted the models using maximum and minimum lags of 12 and 0, respectively, and the difference in predictive precision is insignificant with respect to the first alternative. Table 1 presents all models’ DIC and CV log-scores with the smallest values per metric in bold. First, it is clear that, in terms of the goodness of fit, the models with the spatial structure are better than those assuming independent spatial structures. Then, we can see that the differences in DIC and CV log scores are minimal when we compare the linear and non-linear (B-splines) DLNM framework. Finally, to guarantee an acceptable balance between the complexity of the models and the predictive precision over all the locations using the DIC and CV log-scores, we chose as the best-fitted model the proper CAR model with linear DLNM and with neighbor proximity matrix. Table 2 summarizes the predictive metrics (NRMSE and NIS0.05) of the training and testing periods per municipality for the best model compared to the baseline model with an independent spatial structure. We can observe how spatial information can contribute to obtaining more pre- cise predictions by comparing those two modeling alternatives. The most noticeable municipality Parrita has a particular behavior. The baseline model with an indepen- dent spatial structure cannot predict training and testing periods. In contrast, our 8 Table 1: Comparison of the models according to Deviance information criterion (DIC) and mean cross-validation (CV) log-score. DLNM Proximity matrix Spatial structure DIC CV log-score Linear* Independent 57135.37 3.8710 Neighbor CAR 54256.47 3.6872 proper CAR 52628.40 3.5774 BYM 52632.24 3.5784 Distance CAR 53416.92 3.6264 proper CAR 52633.29 3.5787 BYM 52636.92 3.5787 Non-linear* Independent 53429.63 3.8756 Neighbor CAR 50640.66 3.6838 proper CAR 49438.81 3.5954 BYM 49461.01 3.5977 Distance CAR 54674.34 3.9653 proper CAR 49468.89 3.5985 BYM 49456.27 3.5971 * The best model for each DLNM specification is marked in bold. selected model with proper CAR spatial structure can substantially reduce its predic- tion metrics. Moreover, the precision of the proposed model performs better than the simple forecasting procedures (see Table A1 in Appendix A). Close to Parrita, two municipalities with moderately high predictive metrics are Garabito and Quepos, located on the country’s central pacific coast. We suspect that the difficulty of predicting these particular areas is mainly because they are touristic, and dengue cases in these areas are likely underreported. Besides the climatic covariate’s contribution to the model, temporal and spatial random effects (ϕi,(month) and θi,(year)) are also important factors in modeling dengue behavior because they provide temporal or spatial information which is not observable through the selected covariates. Figure 2 shows the behavior of municipality-specific monthly random effects. Municipalities exhibiting similar temporal behavior are grouped together. As a result, seven groups with distinct temporal random effects are obtained. There is also a final group consisting of municipalities that do not exhibit any specific behavior. Later on, the geographic location of these groups of municipalities is illustrated in Figure 3. We observe that these groups provided by the temporal ran- dom effects largely align with the temporal variations attributed to the microclimates in the country. A summary of the climatic descriptions for these groups, according to Köppen-Geiger (Geiger, 1954) climate classification zones, is presented in Table 3. 9 Table 2: Predictive metrics of training and testing data set of the selected model Training set Testing set Municipality Independent Proper CAR Independent Proper CAR NRMSE NIS95 NRMSE NIS0.05 NRMSE NIS0.05 NRMSE NIS0.05 Alajuela 0.7815 14.5297 0.3982 5.3710 0.0750 1.0905 0.0416 1.6417 Alajuelita 0.3090 23.2238 0.2175 13.1177 0.4515 22.8135 0.0515 2.0922 Atenas 4.6100 24.7736 2.5706 10.1453 5.1453 87.6173 0.3283 10.2199 Cañas 10.1008 24.3243 5.7069 12.3277 5.8803 60.7800 0.4829 6.8129 Carrillo 4.3786 15.9598 3.5404 9.1421 0.9860 13.2041 0.4175 2.5945 Corredores 5.0620 25.1830 2.5155 8.6126 1.6824 17.0115 0.5697 1.9358 Desamparados 0.2351 18.6300 0.1466 8.2922 0.0282 3.0978 0.0142 3.0085 Esparza 4.7287 17.4775 2.6296 8.4081 1.1899 16.1291 0.1849 2.1748 Garabito 38.4940 29.0191 5.2545 9.8071 28.8475 232.7807 0.4258 12.9528 Golfito 3.7245 23.5563 1.7734 8.2174 2.0078 35.7736 0.3499 8.5601 Guacimo 1.9057 13.8120 1.0844 4.4132 3.0709 18.1805 2.0633 4.8628 La Cruz 6.2484 26.6755 4.2779 14.3090 6.0351 118.6721 0.5029 14.7439 Liberia 3.6829 23.1675 2.0260 10.2626 4.0913 92.4202 0.5693 16.2507 Limon 2.9640 17.1259 1.6025 6.7593 1.3724 11.2322 0.1034 1.7042 Matina 5.1143 19.9086 2.7162 7.4890 58.0750 192.2903 0.1232 3.0869 Montes de Oro 5.9220 21.8260 4.1289 12.2332 11.4909 126.5836 1.2106 19.5139 Nicoya 2.8036 20.5059 2.0672 10.1305 3.6262 101.6524 0.1894 10.2497 Orotina 13.7161 17.5376 4.0681 6.0472 1.0599 6.7613 0.4888 1.5221 Osa 5.9295 33.7053 4.2208 17.2559 1.1235 13.0806 0.5758 1.7891 Parrita 1.24 × 109 24483.9181 13.7622 9.8340 4.4446 46.3670 0.6773 7.9705 Perez Zeledón 2.0103 30.0358 0.8733 9.4268 1.2543 20.3380 0.1783 1.3942 Pocoćı 2.2524 12.7658 1.3001 6.1328 1.0524 12.0212 0.1284 1.8534 Puntarenas 1.5802 12.8416 0.9303 4.6933 1.3263 23.9665 0.1646 1.7646 Quepos 56.8344 37.6499 10.0908 12.9868 36.9331 257.7509 1.1780 23.2153 San Jose 0.2105 12.6814 0.1328 5.2650 0.0215 1.2409 0.0155 2.3248 Santa Ana 0.7837 21.9311 0.5860 12.6213 0.6759 43.5264 0.3576 15.4854 SantaCruz 35.1198 37.3329 8.4762 12.7070 6.5905 91.5896 0.3027 3.0635 Sarapiqúı 7.9218 22.2392 2.8096 6.9807 0.3880 4.9820 0.1047 1.9942 Siquirres 2.1184 13.9371 1.4077 6.6140 2.7214 19.0646 0.3564 1.7977 Talamanca 4.9193 21.1751 2.2374 8.0522 0.1113 0.7642 0.7989 1.3152 Turrialba 2.5905 25.2386 1.9572 15.8268 1.0122 20.0694 0.2614 1.8489 Upala1 1.2744 21.7159 0.9203 12.2963 - - - - 1 NRMSE and NIS0.05 of the testing set for Upala are not shown since the observed relative risks are zero. On the other hand, the spatial random effect for each year is illustrated in Figure 4. We observe that there are clusters of municipalities for different years that present a higher or lower log of relative risk. For instance, the south pacific part of the country in 2002 and the north pacific region presented a lower contribution of log of relative risk, while the central pacific in 2009 presented a high contribution of log of relative risk. It is important to notice that this spatial random effect is the variation in log of relative risk that is not captured by the climatic covariates. Still, somehow they can be modeled by the neighbor’s spatial structure. It is known that dengue is a complex phenomenon that involves not only climatic factors but also socioeconomic components and human mobility. Furthermore, there are also outbreaks in certain municipalities during different periods that are not captured by the climatic covariates in this study. In this way, this model is more complete compared to the baseline model assuming independent spatial structure. 10 Fig. 2: Posterior mean and 95% credible interval of municipality-specific monthly random effects. The dengue relative risks prediction can be visualized by using temporal or spatial dimensions. For the temporal dimension, Figure 5 shows the 95% posterior predic- tive dengue relative risk in the training period of the best six municipalities and the three worst municipalities according to NIS0.05 during the testing period. In addition, Figure 6 presents the 95% posterior predictive dengue relative risk of the same munici- palities during the testing period. In general, the behaviors of dengue relative risks can be precisely modeled, and the prediction distribution can also capture the observed RR during the testing periods. It should be noted that the predictive uncertainty is positively asymmetric due to the asymmetric nature of relative risks and the model contemplates the log of relative risks (For dengue cases prediction, see Appendix B). For the spatial dimension, Figure 7 presents the posterior predictive dengue rela- tive risk mean and their absolute percentage error for each municipality and month (January, February, and March). These maps allow us to visualize regions with higher dengue relative risks in a specific month. Furthermore, Figure 7b shows that the absolute percentage error is uniform across the region for the testing period. 11 Fig. 3: Illustration of eight groups with similar temporal behavior. For the training set, similar behavior is visualized. Maps of the posterior mean of relative risks for three selected years (2002, 2011, 2020) and maps of absolute percentage error for the same years are shown in Appendix C and Appendix D. 4 Discussion Epidemiological models provide a crucial tool to help public health authorities and pol- icymakers to allocate limited resources effectively and efficiently. Using these predictive models helps understand disease dynamics, the potential impact on the population, and the health system’s response to an outbreak. These models can be improved by including demographics, genetics, and environmental variables to make more accurate predictions. One of the challenges in developing and using epidemiological models is obtaining high-quality data. A robust and comprehensive data collection system is essential to provide accurate input for the models. In addition, the data must be integrated and 12 Table 3: Köppen-Geiger (Geiger, 1954) climate classification zones for Costa Rica and associated groups from 1-7. Group 8 belong to all climate classifications. Climate type Description Groups Comment Af Tropical Rainforest: rainfall all months, average temperature above 18oC through- out the year. Dry season not significantly defined. Characteris- tical of low lands of the Caribbean slope, Northern-Caribbean region, South Pacific slope. 5,7 These groups show that a seasonal local minimum of precip- itation in October is associated with lower contribution to log(DIR), the wettest months are associated with larger contribu- tion Aw Tropical Savannah (dry winter): Dry sea- son in boreal winter. Average tempera- ture above 18oC throughout the year, Characteristic of low lands in the Pacific Slope, especially in the Pacific Northwest and Central Pacific slope 2,3,4,6 In this case, the shape of the contribution to log(DIR) does not show a clear sea- sonal response to the seasonal cycle of pre- cipitation. Only those municipalities in group 3 show a marked inverse relationship between the contribu- tion to log(DIR) and the seasonal cycle of precipitation Cf (without dry season) and Cw (with dry season) High and mid eleva- tions of the Pacific (Cw) and Caribbean (Cf) slopes. Wet dur- ing all seasons. The coldest month present an average tempera- ture below 18oC. It is normally found in elevations greater than 1500 m a.s.l. It is present in the Central Valley in the middle of the country. 1 Most of the munici- palities in this group exhibit a positive cor- relation between pre- cipitation and contri- bution to log(DIR) processed consistently and timely to ensure that the models are accurate and up-to- date. Data sources include health records, laboratory results, surveillance systems, and household surveys. The output of these models can help in decision-making processes concerning control purposes and surveillance methods and hopefully also as good predictive tools. Prediction forms part of surveillance systems, and more specifically in early warning systems (Racloz, Ramsey, Tong, & Hu, 2012). The effect of climate variability and climate change on dengue transmission is com- plex, nonlinear, and often delayed by several weeks to months (Naish et al., 2014; Wen, 13 Fig. 4: Contribution of year-specific spatial random effect to dengue log relative risk. Lin, Lin, King, & Su, 2006). Bringing together spatio-temporal patterns of dengue transmission compatible with long-term data on climate and other socio-ecological changes by using the Bayesian spatio-temporal models could improve projections of dengue risks and disease control in Costa Rica. Based on the results of fitting the model (1) to the dengue data, it is clear that incorporating all climate covariates greatly improved the model’s performance in accu- rately predicting the number of dengue cases. Using different structures in the model allowed for a deeper understanding of the relationships between dengue transmission and climate factors. The model’s results demonstrate its potential to be a useful tool for decision-making processes in disease control and surveillance methods in Costa Rica, comparing to previous attempts by forecasting dengue cases in each municipal- ity separately. Furthermore, the successful application of the model to monthly data from January 2000 to March 2021 highlights its potential for future predictions and early warning systems. The spatio-temporal random components in the model make us aware that more structural information, such as human mobility and socio-economic factors, could be 14 Fig. 5: Observed (black) and 95% posterior predictive dengue relative risks (red) over the training period. Upper six panels: best municipalities according to NIS metric. Lower three panels: worst municipalities according to NIS metric. Fig. 6: Observed (black) and 95% posterior predictive dengue relative risks (red) over the testing period. Upper six panels: best municipalities according to NIS metric. Lower three panels: worst municipalities according to NIS metric. 15 (a) Relative risk prediction. (b) Absolute percentage error. Fig. 7: Relative risk prediction and their absolute percentage error from January to March 2020 for 81 municipalities in Costa Rica. considered to obtain better predictive results. Access to these data is challenging, and we are working on it in future investigations. Finally, by providing accurate predictions, decision-makers can respond more effec- tively to outbreaks and implement strategies to reduce the impact of the disease on the population. Thus, helping health authorities optimize the typically scarce resources. To maximize their effectiveness, models must be based on high-quality data, continuously updated, and validated jointly with public health officials to reflect environmental changes and other factors, including social determinants, that may impact disease transmission. The continued development of these models will play a critical role in the fight against dengue and other infectious diseases. Declarations • Funding: The authors did not receive support from any organization for the submitted work. 16 • Conflict of interest/Competing interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. • Code availability: The code and datasets is available online under https://github .com/shuwei325/DengueCR Bayesian ST Prediction.git • Authors’ contributions: S.W.C. proposed the main conceptual ideas, performed formal analysis, results validations and writing original draft preparation. L.A.B. worked out in data curation, formal analysis, and writing original draft preparation. P.V. helped with the writing original draft preparation and the public health con- textualization. Y.E.G. work out in the original writing draft. J.G.C. helped write a review and editing the final version. H.G.H. work out in writing, review and editing. And F.S. supervises the research team and works out in writing, review and editing the final version. 17 https://github.com/shuwei325/DengueCR_Bayesian_ST_Prediction.git https://github.com/shuwei325/DengueCR_Bayesian_ST_Prediction.git Appendix A Näıve methods’ predictive metrics of the testing period (from Juanuary to March 2021) Table A1: Predictive metrics of testing data set of the näıve forecasting and negative binomial null model Municipality Naive method Negative Binomial null model NRMSE NRMSE NIS95 Alajuela 2.6506 1.4229 42.9843 Alajuelita 0.4765 2.1888 59.8866 Atenas 89.8765 7.3082 156.7569 Cañas 3.5234 2.9727 69.7955 Carrillo 0.8118 0.9751 26.7633 Corredores 0.3051 0.7459 20.5897 Desamparados 0.2598 22.3581 436.3609 Esparza 0.6795 0.9124 26.0958 Garabito 15.6450 9.0487 189.7400 Golfito 0.9871 3.6079 82.9118 Guacimo 3.6451 3.2932 28.8848 La Cruz 3.5519 17.4492 339.4951 Liberia 0.4927 16.3574 319.1370 Limon 3.4131 2.1836 23.9121 Matina 2.6768 0.8195 30.0298 Montes de Oro 41.8550 8.0778 160.1058 Nicoya 9.2749 17.0730 338.3457 Orotina 2.2117 0.9807 15.6226 Osa 1.2030 0.6291 13.9754 Parrita 15.7029 2.4511 54.8356 Perez Zeledón 1.5211 0.2745 7.9940 Pocoćı 1.0359 0.2630 10.7672 Puntarenas 0.7206 2.0884 52.3785 Quepos 10.8189 9.6902 192.1924 San Jose 0.0267 11.4795 237.4494 Santa Ana 0.4281 19.8108 383.3144 SantaCruz 2.3943 6.6282 144.8392 Sarapiqúı 13.7461 0.0603 3.2807 Siquirres 1.8468 0.2815 9.1988 Talamanca 13.8805 14.4673 35.1501 Turrialba 1.4088 0.3812 14.9645 Upala1 - - - 1 NRMSE and NIS95 for Upala is not shown since the observed relative risks are zero. 18 Appendix B Dengue cases modelling and prediction Fig. B1: Observed (black) and 95% posterior predictive dengue cases (red) over the training period. Upper six panels: best municipalities according to NIS metric. Lower three panels: worst municipalities according to NIS metric. 19 Fig. B2: Observed (black) and 95% posterior predictive dengue cases (red) over the testing period. Upper six panels: best municipalities according to NIS metric. Lower three panels: worst municipalities according to NIS metric. 20 Appendix C Relative risk prediction maps in 2002, 2011 and 2020 Fig. C3: Posterior mean of relative risks from January to December 2002 for 81 municipalities in Costa Rica. 21 Fig. C4: Posterior mean of relative risks from January to December 2011 for 81 municipalities in Costa Rica. 22 Fig. C5: Posterior mean of relative risks from January to December 2020 for 81 municipalities in Costa Rica. 23 Appendix D Absolute percentage error maps in 2002, 2011 and 2020 Fig. D6: Absolute percentage error from January to December 2002 for 81 munici- palities in Costa Rica. 24 Fig. D7: Absolute percentage error from January to December 2011 for 81 munici- palities in Costa Rica. 25 Fig. D8: Absolute percentage error from January to December 2020 for 81 munici- palities in Costa Rica. References Akter, R., Hu, W., Gatton, M., Bambrick, H., Cheng, J., Tong, S. (2021). Climate vari- ability, socio-ecological factors and dengue transmission in tropical queensland, australia: A bayesian spatial analysis. Environmental Research, 195 , 110285, https://doi.org/https://doi.org/10.1016/j.envres.2020.110285 Retrieved from https://www.sciencedirect.com/science/article/pii/S0013935120311828 26 https://doi.org/https://doi.org/10.1016/j.envres.2020.110285 Barboza, L.A., Chou-Chen, S.-W., Vásquez, P., Garćıa, Y.E., Calvo, J.G., Hidalgo, H.G., Sanchez, F. (2023, 01). Assessing dengue fever risk in costa rica by using climate variables and machine learning techniques. PLOS Neglected Tropical Diseases, 17 (1), 1-13, https://doi.org/10.1371/journal.pntd.0011047 Retrieved from https://doi.org/10.1371/journal.pntd.0011047 Barrera, R., Amador, M., Clark, G.G. (2006). Ecological factors influencing aedes aegypti (diptera: Culicidae) productivity in artificial containers in Salinas, Puerto Rico. Journal of medical entomology , 43 (3), 484–492, Besag, J., York, J., Mollié, A. (1991). Bayesian image restoration, with two appli- cations in spatial statistics. Annals of the Institute of Statistical Mathematics, 43 (1), 1–20, https://doi.org/10.1007/BF00116466 Retrieved 2022-08-16, from https://doi.org/10.1007/BF00116466 Bivand, R., Gómez-Rubio, V., Rue, H. (2015). Spatial data analy- sis with r-inla with some extensions. Journal of Statistical Software, 63 (20), 1–31, https://doi.org/10.18637/jss.v063.i20 Retrieved from https://www.jstatsoft.org/index.php/jss/article/view/v063i20 Campbell, K.M., Haldeman, K., Lehnig, C., Munayco, C.V., Halsey, E.S., Laguna- Torres, V.A., . . . Scott, T.W. (2015). Weather regulates location, timing, and intensity of dengue virus transmission between humans and mosquitoes. PLoS neglected tropical diseases, 9 (7), e0003957, Chastel, C. (2012). Eventual role of asymptomatic cases of dengue for the intro- duction and spread of dengue viruses in non-endemic regions. Frontiers in Physiology , 3 , , https://doi.org/10.3389/fphys.2012.00070 Retrieved from https://www.frontiersin.org/articles/10.3389/fphys.2012.00070 Desjardins, M.R., Eastin, M.D., Paul, R., Casas, I., Delmelle, E.M. (2020). Space–time conditional autoregressive modeling to estimate neighborhood-level risks for dengue fever in cali, colombia. The American Journal of Tropical Medicine and Hygiene, 103 (5), 2040 - 2053, https://doi.org/10.4269/ajtmh.20-0080 Retrieved from https://www.ajtmh.org/view/journals/tpmd/103/5/article-p2040.xml Dieng, H., Ahmad, A.H., Mahyoub, J.A., Turkistani, A.M., Mesed, H., Koshike, S., . . . others (2012). Household survey of container–breeding mosquitoes and climatic factors influencing the prevalence of aedes aegypti (diptera: Culicidae) in makkah 27 https://doi.org/10.1371/journal.pntd.0011047 https://doi.org/10.1007/BF00116466 https://doi.org/10.18637/jss.v063.i20 https://doi.org/10.3389/fphys.2012.00070 https://doi.org/10.4269/ajtmh.20-0080 city, saudi arabia. Asian Pacific journal of tropical biomedicine, 2 (11), 849–857, Eidson, M., Kramer, L., Stone, W., Hagiwara, Y., Schmit, K., Team, N.Y.S.W.N.V.A.S., et al. (2001). Dead bird surveillance as an early warning system for west nile virus. Emerging infectious diseases, 7 (4), 631, Enfield, D.B., Mestas-Nuñez, A.M., Mayer, D.A., Cid-Serrano, L. (1999). How ubiq- uitous is the dipole relationship in tropical atlantic sea surface temperatures? Journal of Geophysical Research: Oceans, 104 (C4), 7841–7848, European Centre for Disease Prevention and Control (2022, August). Vbornet– european network for arthropod vector surveillance for human public health. Retrieved from www.vbornet.eu Funk, C., Peterson, P., Landsfeld, M., Pedreros, D., Verdin, J., Shukla, S., . . . Michaelsen, J. (2015). The climate hazards infrared precipitation with stations– a new environmental record for monitoring extremes. Scientific data, 2 (1), 150066–150066, Garcia, Y.E., Chou-Chen, S.-W., Barboza, L.A., Daza-Torres, M.L., Montesinos- Lopez, J.C., Vasquez, P., . . . Sanchez, F. (2023). Common patterns between dengue cases, climate, and local environmental variables in costa rica: A wavelet approach. arXiv. Retrieved from https://arxiv.org/abs/2301.02286 Gasparrini, A. (2011). Distributed lag linear and non-linear models in R: the package dlnm. Journal of Statistical Software, 43 (8), 1–20, Gasparrini, A. (2014). Modeling exposure–lag–response associations with distributed lag non-linear models. Statistics in Medicine, 33 (5), 881-899, https://doi.org/ https://doi.org/10.1002/sim.5963 Gasparrini, A., Armstrong, B., Kenward, M.G. (2010). Distributed lag non-linear models. Statistics in Medicine, 29 (21), 2224-2234, https://doi.org/https:// doi.org/10.1002/sim.3940 Geiger, R. (1954). Klassifikation der klimate nach w. köppen. landolt-börnstein zahlen- werte und funktionen aus physik, chemie, astronomie, geophysik und technik, alte serie. Berlin: Springer , 3 , 603–607, 28 https://doi.org/https://doi.org/10.1002/sim.5963 https://doi.org/https://doi.org/10.1002/sim.5963 https://doi.org/https://doi.org/10.1002/sim.3940 https://doi.org/https://doi.org/10.1002/sim.3940 Gneiting, T., & Raftery, A.E. (2007, mar). Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association, 102 (477), 359– 378, https://doi.org/10.1198/016214506000001437 Gubler, D.J. (2012). The economic burden of dengue. The American journal of tropical medicine and hygiene, 86 (5), 743, Hidalgo, H.G., Alfaro, E.J., Quesada-Montano, B. (2017). Observed (1970–1999) cli- mate variability in central america using a high-resolution meteorological dataset with implication to climate change studies. Clim. Change, 141 (1), 13–28, Lopez, L.F., Amaku, M., Coutinho, F.A.B., Quam, M., Burattini, M.N., Struchiner, C.J., . . . Massad, E. (2016). Modeling importations and exportations of infectious diseases via travelers. Bulletin of mathematical biology , 78 (2), 185–209, Lowe, R., Bailey, T.C., Stephenson, D.B., Graham, R.J., Coelho, C.A., Sá Carvalho, M., Barcellos, C. (2011). Spatio-temporal modelling of climate-sensitive disease risk: Towards an early warning system for dengue in brazil. Computers & Geosciences, 37 (3), 371-381, https://doi.org/https://doi.org/10.1016/j.cageo.2010.01.008 Retrieved from https://www.sciencedirect.com/science/article/pii/S0098300410001445 (Geoinformatics for Environmental Surveillance) Massad, E., Amaku, M., Coutinho, F.A.B., Struchiner, C.J., Burattini, M.N., Khan, K., . . . Wilder-Smith, A. (2018). Estimating the probability of dengue virus introduction and secondary autochthonous cases in europe. Scientific reports, 8 (1), 1–12, Mateus, J.C., & Carrasquilla, G. (2011). Predictors of local malaria outbreaks: an approach to the development of an early warning system in colombia. Memórias do Instituto Oswaldo Cruz , 106 , 107–113, Medlock, J.M., Avenell, D., Barrass, I., Leach, S. (2006). Analysis of the potential for survival and seasonal activity of aedes albopictus (diptera: Culicidae) in the united kingdom. Journal of Vector Ecology , 31 (2), 292–304, Messina, J.P., Brady, O.J., Golding, N., Kraemer, M.U., Wint, G., Ray, S.E., . . . others (2019). The current and future global distribution and population at risk 29 https://doi.org/10.1198/016214506000001437 https://doi.org/https://doi.org/10.1016/j.cageo.2010.01.008 of dengue. Nature microbiology , 4 (9), 1508–1515, Ministerio de Salud (2022a). Sitio web del Ministerio de Salud de Costa rica. bienvenido. https://www.ministeriodesalud.go.cr/. Ministerio de Salud (2022b). Sitio Web del Ministerio de Salud de costa rica. Bienvenido. https://www.ministeriodesalud.go.cr/index.php/biblioteca-de- archivos-left/documentos-ministerio-de-salud/material-informativo/material- publicado/boletines/boletines-vigilancia-vs-enfermedades-de-transmision- vectorial. Morin, C.W., Comrie, A.C., Ernst, K. (2013). Climate and dengue transmission: evidence and implications. Environmental health perspectives, 121 (11-12), 1264– 1272, Muñoz, E., Poveda, G., Arbeláez, M.P., Vélez, I.D. (2021). Spatiotempo- ral dynamics of dengue in colombia in relation to the combined effects of local climate and enso. Acta Tropica, 224 , 106136, https:// doi.org/https://doi.org/10.1016/j.actatropica.2021.106136 Retrieved from https://www.sciencedirect.com/science/article/pii/S0001706X21003156 Murray, N.E.A., Quam, M.B., Wilder-Smith, A. (2013). Epidemiology of dengue: past, present and future prospects. Clinical epidemiology , 5 , 299, Naish, S., Dale, P., Mackenzie, J.S., McBride, J., Mengersen, K., Tong, S. (2014). Climate change and dengue: a critical and systematic review of quantitative modelling approaches. BMC infectious diseases, 14 (1), 1–14, NOAA (n.d.). Climate prediction center. https://www.cpc.ncep.noaa.gov/data/ indices/ersst5.nino.mth.91-20.ascii. (Accessed: 2022-05-01) Outammassine, A., Zouhair, S., Loqman, S. (2022). Global potential distribution of three underappreciated arboviruses vectors (aedes japonicus, aedes vexans and aedes vittatus) under current and future climate conditions. Transboundary and Emerging Diseases, 69 (4), e1160–e1171, Racloz, V., Ramsey, R., Tong, S., Hu, W. (2012). Surveillance of dengue fever virus: a review of epidemiological models and early warning systems. PLoS neglected tropical diseases, 6 (5), e1648, 30 https://doi.org/https://doi.org/10.1016/j.actatropica.2021.106136 https://doi.org/https://doi.org/10.1016/j.actatropica.2021.106136 https://www.cpc.ncep.noaa.gov/data/indices/ersst5.nino.mth.91-20.ascii https://www.cpc.ncep.noaa.gov/data/indices/ersst5.nino.mth.91-20.ascii Romi, R., Severini, F., Toma, L. (2006). Cold acclimation and overwintering of female aedes albopictus in roma. Journal of the American Mosquito Control Association, 22 (1), 149–151, Rue, H., Martino, S., Chopin, N. (2009). Approximate Bayesian Inference for Latent Gaussian Models by Using Integrated Nested Laplace Approximations. Journal of the Royal Statistical Society . Series B ( Methodological ), 71 (2), 319–392, Rueda, L., Patel, K., Axtell, R., Stinner, R. (1990). Temperature-dependent devel- opment and survival rates of culex quinquefasciatus and aedes aegypti (diptera: Culicidae). Journal of medical entomology , 27 (5), 892–898, Sarfraz, M.S., Tripathi, N.K., Tipdecho, T., Thongbu, T., Kerdthong, P., Souris, M. (2012). Analyzing the spatio-temporal relationship between dengue vector larval density and land-use using factor analysis and spatial ring mapping. BMC public health, 12 (1), 1–19, Tuck, S.L., Phillips, H.R., Hintzen, R.E., Scharlemann, J.P., Purvis, A., Hudson, L.N. (2014). Modistools – downloading and processing modis remotely sensed data in r. Ecology and Evolution, 4 (24), 4658–4668, https://doi.org/10.1002/ece3.1273 Tun-Lin, W., Burkot, T., Kay, B. (2000). Effects of temperature and larval diet on development rates and survival of the dengue vector aedes aegypti in north queensland, australia. Medical and veterinary entomology , 14 (1), 31–37, Van Benthem, B.H., Vanwambeke, S.O., Khantikul, N., Burghoorn-Maas, C., Panart, K., Oskam, L., . . . Somboon, P. (2005). Spatial patterns of and risk factors for seropositivity for dengue infection. The American journal of tropical medicine and hygiene, 72 (2), 201–208, Vásquez, P., Loŕıa, A., Sanchez, F., Barboza, L.A. (2020). Climate-driven Statistical Models as effective predictors of local Dengue incidence in Costa Rica: a Gen- eralized Additive Model and Random Forest Approach. Revista de Matematica: Teoria y Aplicaciones, 27 (1), 1–21, Wang, H., Zhao, S., Wang, S., Zheng, Y., Wang, S., Chen, H., . . . Chen, Y. (2022). Global magnitude of encephalitis burden and its evolving pattern over the past 30 years. Journal of Infection, 84 (6), 777–787, 31 https://doi.org/10.1002/ece3.1273 Watts, D.M., Burke, D.S., Harrison, B.A., Whitmire, R.E., Nisalak, A. (1986). Effect of temperature on the vector efficiency of aedes aegypti for dengue 2 virus (Tech. Rep.). ARMY MEDICAL RESEARCH INST OF INFECTIOUS DISEASES FORT DETRICK MD. Wen, T.-H., Lin, N.H., Lin, C.-H., King, C.-C., Su, M.-D. (2006). Spatial mapping of temporal risk characteristics to improve environmental health risk identification: a case study of a dengue epidemic in taiwan. Science of the Total Environment , 367 (2-3), 631–640, Winkler, R.L., & Murphy, A.H. (1968). “Good” Probability Assessors. Journal of Applied Meteorology and Climatology , 7 (5), 751–758, https://doi.org/10.1175/ 1520-0450(1968)007⟨0751:PA⟩2.0.CO;2 World Health Organization (2022, August). Dengue and sever dengue. Retrieved from https://www.who.int/news-room/fact-sheets/detail/dengue-and-severe- dengue Yang, X., Quam, M.B., Zhang, T., Sang, S. (2021). Global burden for dengue and the evolving pattern in the past 30 years. Journal of travel medicine, 28 (8), taab146, 32 https://doi.org/10.1175/1520-0450(1968)007<0751:PA>2.0.CO;2 https://doi.org/10.1175/1520-0450(1968)007<0751:PA>2.0.CO;2 Introduction Methods Data description Dengue Cases Climate variables Model Model selection and prediction Results Discussion Naïve methods' predictive metrics of the testing period (from Juanuary to March 2021) Dengue cases modelling and prediction Relative risk prediction maps in 2002, 2011 and 2020 Absolute percentage error maps in 2002, 2011 and 2020