--- title: "Vignette: Intro to healthiar" output: rmarkdown::html_vignette: toc: true toc_depth: 2 bibliography: ../inst/REFERENCES.bib vignette: > %\VignetteIndexEntry{Vignette: Intro to healthiar} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo=TRUE, eval=TRUE, include=TRUE) ``` ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options(rmarkdown.html_vignette.check_title = FALSE) options(knitr.kable.max_rows = 10) set.seed(1) ## Avoid using pacman here, as it causes error in installation if it's not installed already library(healthiar) library(tibble) library(dplyr) library(purrr) library(tidyr) library(knitr) library(terra) library(sf) ``` Hi there! This vignette will tell you about `healthiar` and show you how to use `healthiar` with the help of examples. *NOTE*: Before using `healthiar`, please read carefully the information provided in the [readme file](https://github.com/SwissTPH/healthiar?tab=readme-ov-file#readme) or the [welcome webpage](https://swisstph.github.io/healthiar/). By using `healthiar`, you agree to the [terms of use and disclaimer](https://github.com/SwissTPH/healthiar?tab=readme-ov-file#readme). ------------------------------------------------------------------------ # About `healthiar` The `healthiar` functions allow you to quantify and monetize the health impacts (or burden of disease) attributable to exposure. The main focus of the EU project that initiated the development of `healthiar` (BEST-COST) has been two environmental exposures: air pollution and noise. However, `healthiar` could be used for other exposures such as green spaces, chemicals, physical activity... See below a an overview of the `healthiar`, which is the first page of the [cheat sheet](https://swisstph.github.io/healthiar/articles/cheatsheet.html). The whole list of functions included in `healthiar` is linked there and available in the [reference](https://swisstph.github.io/healthiar/reference/index.html). ![Figure: Overview of `healthiar`](../man/figures/cheatsheet_healthiar_1st_page.png) # Input & output data ## Input You can enter data in `healthiar` functions using: - hard coded values or - columns inside pre-loaded data frames or tibbles. Let's see some examples calling the most important function in `healthiar`: `attribute_health()`. ### Hard coded vs. columns #### Hard coded Depending on the function argument, you will need to enter numeric or character values. ```{r eval=FALSE, include=TRUE} results_pm_copd <- attribute_health( exp_central = 8.85, rr_central = 1.369, rr_increment = 10, erf_shape = "log_linear", cutoff_central = 5, bhd_central = 30747 ) ``` #### Columns `healthiar` comes with some example data that start with `exdat_` that allow you to test functions. Some of these example data will be used in some examples in this vignette. Now let's `attribute_health()` with input data from the `healthiar` example data. Note that you can easily provide input data to the function argument using the `$` operator. ```{r, eval=FALSE, include=TRUE} results_pm_copd <- attribute_health( erf_shape = "log_linear", rr_central = exdat_pm$relative_risk, rr_increment = 10, exp_central = exdat_pm$mean_concentration, cutoff_central = exdat_pm$cut_off_value, bhd_central = exdat_pm$incidence ) ``` ### Tidy data Be aware that `healthiar` functions are easier to use if your data is prepared in a tidy format, i.e.: - Each variable is a column; each column is a variable. - Each observation is a row; each row is an observation. - Each value is a cell; each cell is a single value. To know more about the concept of tidy format, see the article by [@Wickham2014_jss]. For example, in `attribute health()` the length of the input vectors to be entered in the arguments must be either 1 or the result of the combinations of the different values of: - `geo_id_micro` - `exp_...` - `sex` - `age` - (`info` for further sub-group analysis) ## Output #### Structure The output of the `healthiar`function `attribute_health()` and `attribute_lifetable` consists of two lists ("folders"): - `health_main` contains the main results - `health_detailed` contained detailed results and additional info about the assessment. In other `healthiar` functions you can find a similar output structure but using different prefixes. E.g., `social_`in `socialize()` and `monetization_`in `monetitize()`. #### Access A similar structure can be found in other large functions in `helathiar`, e.g., `attribute_lifetable()`, `compare()`, `socialize()` or `monetize()`. In some functions, different elements are available in the output. For instance, `attribute_lifetable()` creates additional output that is specific to life table calculations. There exist different, equivalent ways of accessing the output: - With `$` operator: `results_pm_copd$health_main$impact_rounded` (as in the example above) - By mouse: go to the *Environment* tab in RStudio and click on the variable you want to inspect, and then open the `health_main` results table - With `[[]]` operator `results_pm_copd[["health_main"]]` - With `pluck()` & `pull()`: use the `purrr::pluck` function to select a list and then the `dplyr::pull` function extract values from a specified column, e.g. `results_pm_copd |> purrr::pluck("health_main") |> dplyr::pull("impact_rounded")` --------------------------------------------------------------------------- # Function examples The descriptions of the [`healthiar` functions](https://swisstph.github.io/healthiar/reference/index.html) provide examples that you can execute (with `healthiar` loaded) by running `example("function_name")`, e.g. `example("attribute_health")`. In the sections below in this vignette, you find additional examples and more detailed explanations. # Relative risk ### Goal E.g., to quantify the COPD cases attributable to PM2.5 (air pollution) exposure in a country. ### Methodology The comparative risk assessment approach [@Murray2003_e] is applied obtaining the population attributable fraction (percent of cases that are attributable to the exposure) based on the relative risk. The exposure scenario is compared with a counter-factual scenario. This approach has been extensive documented and applied [e.g., @WHO2003_report; @Steenland2006-e; @Soares2020_report; @Pozzer2023_gh; @GBD2020_tl; @Lehtomaki_2025_eh]. ![Figure: Relative risk approach](../man/figures/bod_rr.png){width="700"} #### Population attributable fraction General integral form for the **population attributable fraction (PAF)**: $$PAF = \frac{\int rr\_at\_exp(x) \times PE(x)dx - 1}{\int rr\_at\_exp(x) \times pop\_exp(x)dx}$$ Where: * $x$ = exposure level * $PE(x)$ = population distribution of exposure * $rr\_at\exp(x)$ = relative risk at exposure level compared to reference ##### Simplified for categorical exposure distribution If exposure is categorical, the integrals are converted to sums: $$PAF = \frac{\sum rr\_at\_exp_i \times PE_i - 1}{\sum rr\_at\_exp_i \times PE_i}$$ Alternatively, an equivalent form is: $$PAF = \frac{\sum PE_i \times (rr\_at\_exp_i - 1)}{\sum PE_i\times (rr\_at\_exp_i - 1) + 1}$$ ##### Simplified for single exposure value If there is one single single exposure value, corresponding to the population weighted mean concentration, the equation can be simplified as follows: $$PAF = \frac{rr\_at\_exp - 1}{rr\_at\_exp }$$ #### Scaling relative risk How to get this relative risk at exposure level (`rr_at_exp`)? This is normally different to the relative risk published in the epidemiological literature (`rr`) together with the (concentration/dose) `increment` that corresponds to this relative risk. The equations used for scaling relative risk depend on the chosen exposure-response function shapes: - linear [Lehtomaki_2025_eh] $$RRexp = 1 + \frac{rr - 1}{increment} \times (exp - cutoff)$$ - log-linear [@Lehtomaki_2025_eh] $$RRexp = e^{\frac{\log(rr)}{increment} \times (exp - cutoff)}$$ - log-log [@Lehtomaki_2025_eh] $$RRexp = \left( \frac{exp + 1}{cutoff + 1} \right)^{\frac{\log(rr)}{\log(increment + cutoff + 1) - \log(cutoff + 1)}}$$ - linear-log [@Pozzer2023_gh] $$RRexp = 1 + \frac{\log(rr - 1)}{\log(increment + cutoff + 1) - \log(cutoff + 1)} \times \frac{\log(exp + 1)}{\log(cutoff + 1)}$$ The relative risk at exposure level (`rr_at_exp`) and is part of the output of `attribute_health()` and `attribute_lifetable()`. `rr_at_exp` can also be calculated using `get_risk()`. For conversion of hazard ratios and/or odds ratios to relative risks refer to [@VanderWeele2019_biom] and/or use the conversion tools developed by the Teaching group in EBM in 2022 for hazard ratios (https://ebm-helper.cn/en/Conv/HR_RR.html) and/or odds ratios (https://ebm-helper.cn/en/Conv/OR_RR.html). ### Function call ```{r, eval=TRUE, include=TRUE, echo=TRUE} results_pm_copd <- attribute_health( approach_risk = "relative_risk", # If you do not call this argument, "relative_risk" will be assigned by default. erf_shape = "log_linear", rr_central = exdat_pm$relative_risk, rr_increment = 10, exp_central = exdat_pm$mean_concentration, cutoff_central = exdat_pm$cut_off_value, bhd_central = exdat_pm$incidence ) ``` ### Main results ```{r, include=TRUE, eval=TRUE, echo=TRUE} results_pm_copd$health_main ``` It is a table of the format `tibble` of 3 rows and 23 columns. Be aware that this main output contains input data, some intermediate steps and the final results in different formats. Let's zoom in on some relevant aspects. ```{r} results_pm_copd$health_main |> dplyr::select(exp, bhd, rr, erf_ci, pop_fraction, impact_rounded) |> knitr::kable() # For formatting reasons only: prints tibble in nice layout ``` Interpretation: this table shows us that exposure was 8.85 $\mu g/m^3$, the baseline health data (`bhd_central`) was 30747 (COPD incidence in this instance). The 1st row further shows that the impact attributable to this exposure using the central relative risk (`rr_central`) estimate of 1.369 is 3502 COPD cases, or \~11% of all baseline cases. Some of the most results columns include: - *impact_rounded* rounded attributable health impact/burden - *impact* raw impact/burden - *pop_fraction* population attributable fraction (PAF) or population impact fraction (PIF) # Absolute risk ### Goal E.g., to quantify the number incidence cases of high annoyance attributable to (road traffic) noise exposure. ### Methodology In the absolute risk calculation pathway, estimates are based on the size and distribution of the exposed population, rather than on baseline health data, as is the case in the relative risk pathway [@WHO2011_report]. ![Figure: Absolute risk approach](../man/figures/bod_ar.png){width="700"} $$N = \sum AR_i \times PE_i$$ Where: * $N$ = attributed cases * $AR_i$ = absolute risk at category $i$ * $PE_i$ = absolute population exposed at category $i$ ### Function call ```{r} results_noise_ha <- attribute_health( approach_risk = "absolute_risk", # default is "relative_risk" exp_central = c(57.5, 62.5, 67.5, 72.5, 77.5), # mean of the exposure categories pop_exp = c(387500, 286000, 191800, 72200, 7700), # population exposed per exposure category erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" # exposure-response function ) ``` The `erf_eq_central` argument can digest other types of functions (see section on user-defined ERF). ### Main results ```{r echo=FALSE} results_noise_ha |> purrr::pluck("health_main") |> dplyr::select(erf_eq, erf_ci, impact_rounded) |> knitr::kable() # Prints tibble in a minimal layout ``` ### Results per noise exposure band ```{r eval=FALSE, include=TRUE, echo=TRUE} results_noise_ha$health_detailed$results_raw ``` ```{r echo=FALSE, include=TRUE, eval=TRUE} results_noise_ha[["health_detailed"]][["results_raw"]] |> select(exp_category, exp, pop_exp, impact) |> knitr::kable() ``` Remember, that if the equation of the exposure-response function (`erf_eq_...`) requires taking a maximum in a vectorised context, `pmax()` must be used instead of `max()`. `pmax()` should be used whenever an element-wise maximum is required (the output will be a vector), while `max()` returns a single global maximum for the entire vector. For example: ```{r eval=FALSE, include=TRUE, echo=TRUE} erf_eq_central <- "exp(0.2969*log((pmax(0,c-2.4)/1.9+1))/(1+exp(-(pmax(0,c-2.4)-12)/40.2)))" ``` ### One exposure category Alternatively, it's also possible to only assess the absolute risk impacts for one exposure category (e.g., a single noise exposure band). ```{r} results_noise_ha <- attribute_health( approach_risk = "absolute_risk", exp_central = 57.5, pop_exp = 387500, erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" ) ``` ```{r eval=FALSE, include=FALSE, echo=TRUE} results_noise_ha$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_noise_ha[["health_main"]] |> dplyr::select(exp_category, impact) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_noise_ha) ``` # Multiple geographic units ## using relative risk ### Goal E.g., to quantify the disease cases attributable to PM2.5 exposure in multiple cities using one single command. ### Function call - Enter unique ID's as a vector (`numeric` or `character`) to the `geo_id_micro` argument (e.g., municipality names or province abbreviations) - Optional: aggregate unit-specific results by providing higher-level ID's (e.g., region names or country abbreviations) as a vector (`numeric` or `character`) to the `geo_id_macro` argument Input to the other function arguments is specified as usual, either as a vector or a single values (which will be recycled to match the length of the other input vectors). ```{r} results_iteration <- attribute_health( # Names of Swiss cantons geo_id_micro = c("Zurich", "Basel", "Geneva", "Ticino", "Jura"), # Names of languages spoken in the selected Swiss cantons geo_id_macro = c("German","German","French","Italian","French"), rr_central = 1.369, rr_increment = 10, cutoff_central = 5, erf_shape = "log_linear", exp_central = c(11, 11, 10, 8, 7), bhd_central = c(4000, 2500, 3000, 1500, 500) ) ``` In this example we want to aggregate the lower-level geographic units (municipalities) by the higher-level language region (`"German", "French", "Italian"`). ### Main results The main output contains aggregated results ```{r eval=FALSE, include=FALSE, echo=TRUE} results_iteration$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_iteration[["health_main"]] |> dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci, bhd_ci) |> knitr::kable() ``` In this case `health_main` contains the cumulative / summed number of stroke cases attributable to PM2.5 exposure in the 5 geo units, which is `r sum(results_iteration[["health_main"]]$impact_rounded)` (using a relative risk of 1.369). ### Detailed results The geo unit-specific information and results are stored under `health_detailed`\>`results_raw` . ```{r eval=FALSE, include=FALSE, echo=TRUE} results_iteration$health_detailed$results_raw ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_iteration[["health_detailed"]][["results_raw"]] |> dplyr::select(geo_id_micro, impact_rounded, geo_id_macro) |> knitr::kable() ``` `health_detailed` also contains impacts obtained through all combinations of input data central, lower and upper estimates (as usual), besides the results per geo unit (not shown above). ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_iteration) ``` ## using absolute risk ### Goal E.g., to quantify high annoyance cases attributable to noise exposure in rural and urban areas. ### Function call ```{r} data <- exdat_noise |> ## Filter for urban and rural regions dplyr::filter(region == "urban" | region == "rural") ``` ```{r} results_iteration_ar <- attribute_health( # Both the rural and urban areas belong to the higher-level "total" region geo_id_macro = "total", geo_id_micro = data$region, approach_risk = "absolute_risk", exp_central = data$exposure_mean, pop_exp = data$exposed, erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" ) ``` *NOTE*: the length of the input vectors fed to `geo_id_micro`, `exp_central`, `pop_exp` must match and must be (number of geo units) x (number of exposure categories) = 2 x 5 = **10**, because we have 2 geo units (`"rural"` and `"urban"`) and 5 exposure categories. ### Main results `health_main` contains the aggregated results (i.e. sum of impacts in rural and urban areas). ```{r eval=FALSE, include=FALSE, echo=TRUE} results_iteration_ar$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_iteration_ar[["health_main"]] |> dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci) |> knitr::kable() ``` ### Detailed results Impact by geo unit, in this case impact in the rural and in the urban area. ```{r eval=FALSE, include=FALSE, echo=TRUE} results_iteration_ar$health_detailed$results_by_geo_id_micro ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_iteration_ar[["health_detailed"]][["results_by_geo_id_micro"]] |> dplyr::select(geo_id_micro, geo_id_macro, impact) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_iteration_ar, data) ``` # Uncertainty ## Confidence interval ### Goal E.g., to quantify the COPD cases attributable to PM2.5 exposure taking into account uncertainty (lower and upper bound of confidence interval) in several input arguments: relative risk, exposure and baseline health data. ### Function call ```{r} results_pm_copd <- attribute_health( erf_shape = "log_linear", rr_central = 1.369, rr_lower = 1.124, # lower 95% confidence interval (CI) bound of RR rr_upper = 1.664, # upper 95% CI bound of RR rr_increment = 10, exp_central = 8.85, exp_lower = 8, # lower 95% CI bound of exposure exp_upper = 10, # upper 95% CI bound of exposure cutoff_central = 5, bhd_central = 30747, bhd_lower = 28000, # lower 95% confidence interval estimate of BHD bhd_upper = 32000 # upper 95% confidence interval estimate of BHD ) ``` ### Detailed results Let's inspect the detailed results: ```{r eval=FALSE, echo=TRUE, include=FALSE} results_pm_copd$health_detailed$results_raw ``` ```{r echo=FALSE} results_pm_copd$health_detailed$results_raw |> dplyr::select(erf_ci, exp_ci, bhd_ci, impact_rounded) |> dplyr::slice(1:9) |> knitr::kable() # Prints tibble in a minimal layout ``` Each row contains the estimated attributable cases (`impact_rounded`) obtained by the input data specified in the columns ending in "\_ci" and the other calculation pathway specifications in that row (not shown). - The 1st contains the estimated attributable impact when using the central estimates of relative risk, exposure and baseline health data. - The 2nd row shows the impact when using the central estimates of the relative risk, exposure in combination with the lower estimate of the baseline health data. - ... *NOTE*: only `r length(1:9)` of the `r length(results_pm_copd$health_detailed$results_raw$impact)` possible combinations are displayed due to space constraints. *NOTE*: only a selection of columns are shown. ## Monte Carlo simulation ### Goal E.g., to summarize uncertainty of attributable health impacts (i.e. to get a single confidence interval instead of many combinations) by using a Monte Carlo simulation. ### Methodology #### General concepts A Monte Carlo simulation is a statistical method that generates repeated random sampling [@Robert2004_book; @Rubinstein2016_book. In `healthiar`, you can use the function `summarize_uncertainty()` to simulate values in the arguments with uncertainty and estimate a single confidence interval in the results. For each entered input argument that includes a (95%) confidence interval (i.e. `_lower` and `_upper` bound value) a distribution is fitted (see distributions below). The median of the simulated attributable impacts is reported as the central estimate. The 2.5th and 97.5th percentiles of these simulated impacts define the lower and upper bounds of the 95% summary uncertainty interval. Aggregated central, lower and upper estimates are obtained by summing the corresponding values of each lower level unit. #### Distributions used for simulation `summarize_uncertainty()` assumes the following shapes of the distributions in the simulations: - Relative risk: The values are simulated based on an optimized *gamma* distribution, which fits well as relative risks are positive and their distributions are usually right-skewed. The gamma distribution is parametrized such that its mean is equal to the central relative risk estimate (`rate= shape/rr_central`). The shape parameter is then optimized using `stats::optimize()` to match the inputed 95% confidence interval bounds, with `stats::qgamma()` used to evaluate candidate distributions. Finally, `n_sim` relative risk values are simulated using `stats::rgamma()`. - Exposure, cutoff, baseline health data and duration: The values are simulated based on a *normal* distribution using `stats::rnorm()` with `mean = exp_central`, `mean = cutoff_central`, `mean = bhd_central` or `mean = duration_central` and a standard deviation based on corresponding lower and upper 95% exposure confidence interval values. The standard deviation is calculated as $$(upper-lower)/(2*1.96)$$, since for a normal distribution the 95% CI spans approximately two standard deviations on either side of the mean. - Disability weights: The values are simulated based on a *beta* distribution, as both the disability weights and the beta distribution are bounded by 0 and 1. The beta distribution best fitting the inputted central disability weight estimate and corresponding lower and upper 95% confidence interval values is fitted using `stats::qbeta()` (the best fitting distribution parameters `shape1` and `shape2` are determined using `stats::optimize()`). For this purpose, we partly adapted the R function `prevalence::beta_expert` with permission of one of the authors [@Devleesschauwer2022_package]. Finally, `n_sim` disability weight values are simulated using `stats::rbeta()`. For stability of the 95% confidence interval, a large number of simulations (e.g., 10,000) is recommended in practice. The example below uses n_sim = 100 for brevity. ### Function call ```{r} results_pm_copd_summarized <- summarize_uncertainty( output_attribute = results_pm_copd, n_sim = 100 ) ``` ### Main results The outcome of the Monte Carlo analysis is added to the variable entered as the `results` argument, which is `results_pm_copd` in our case. Two lists ("folders") are added: - `uncertainty_main` contains the central estimate and the corresponding 95% confidence intervals obtained through the Monte Carlo assessment and - `uncertainty_detailed` contains all `n_sim` simulations of the Monte Carlo assessment. ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_copd_summarized$uncertainty_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_copd_summarized$uncertainty_main |> knitr::kable() ``` ### Detailed results The folder `uncertainty_detailed` contains all single simulations. Let's look at the impact of the first 10 simulations. The columns `erf_ci`, `exp_ci`, `bhd_ci`, and `cutoff_ci` indicate the source of uncertainty component used for that simulation (in the first 10 simulations, all use central estimates). ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_copd_summarized$uncertainty_detailed$impact_by_sim ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_copd_summarized$uncertainty_detailed$impact_by_sim |> knitr::kable() ``` ```{r, echo=FALSE, eval=TRUE, include=FALSE} rm(results_pm_copd_summarized) ``` # User-defined ERF ### Goal E.g., to quantify COPD cases attributable to air pollution exposure by applying a user-defined exposure-response function (ERF), such as the MR-BRT curves from Global Burden of Disease study. ### Function call In this case, the function arguments `erf_eq_...` require a function as input, so we use an auxiliary function (`splinefun()`) to transform the points on the ERF into type `function`. ```{r} results_pm_copd_mr_brt <- attribute_health( exp_central = 8.85, bhd_central = 30747, cutoff_central = 0, # Specify the function based on x-y point pairs that lie on the ERF erf_eq_central = splinefun( x = c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110), y = c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60), method = "natural") ) ``` The ERF curve created looks as follows ```{r fig.alt="ERF curve", eval=TRUE, include=TRUE, echo=FALSE} x <- c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110) y <- c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60) spline_func <- stats::splinefun( x = x, y = y, method = "natural") ## Generate finer x-values x_dense <- seq(min(x), max(x), length.out = 200) ## Evaluate the spline function at these points y_dense <- spline_func(x_dense) ## Plot plot(x, y, # main = "User-defined ERF", main = "", xlab = "Exposure", ylab = "Relative risk", col = "blue", pch = 19) lines(x_dense, y_dense, col = "red", lwd = 2) legend("topleft", legend = c("Original points", "Spline curve"), col = c("blue", "red"), pch = c(19, NA), lty = c(NA, 1), lwd = c(NA, 2)) ``` Alternatively, other functions (e.g. `approxfun()`) can be used to create the ERF # Sub-group analysis ## by age group ### Goal E.g., to quantify health impacts attributable to air pollution in a country *by age group*. ### Function call To obtain age-group-specific results, the baseline health data (and possibly exposure) must be available by age group. If the `age` argument was specified, age-group-specific results are available under `health_detailed` in the sub-folder `results_by_age_group`. ```{r} results_age_group <- attribute_health( approach_risk = "relative_risk", age = c("below_65", "65_plus"), exp_central = c(8, 7), cutoff_central = c(5, 5), bhd_central = c(1000, 5000), rr_central = 1.06, rr_increment = 10, erf_shape = "log_linear" ) ``` ### Results by age group ```{r} results_age_group$health_detailed$results_by_age_group |> dplyr::select(age_group, impact_rounded, exp, bhd) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_age_group) ``` ## by sex ### Goal E.g., to quantify health impacts attributable to air pollution in a country *by sex*. ### Function call The baseline health data (and possibly exposure) must be entered by sex. ```{r eval=TRUE, include=TRUE, echo=TRUE} results_sex <- attribute_health( approach_risk = "relative_risk", sex = c("female", "male"), exp_central = c(8, 8), cutoff_central = c(5, 5), bhd_central = c(1000, 1100), rr_central = 1.06, rr_increment = 10, erf_shape = "log_linear" ) ``` ### Results by sex If the `sex` argument was specified, sex-specific results are available under `health_detailed` in the sub-folder `results_by_sex`. ```{r eval=TRUE, include=TRUE, echo=TRUE} results_sex$health_detailed$results_by_sex |> dplyr::select(sex, impact_rounded, exp, bhd) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_sex) ``` ## by other sub-groups ### Goal E.g., to quantify attributable health impacts *stratified by a sub-group different to age and sex, e.g., education level*. ### Function call A single vector (or a data frame / tibble with multiple columns) to group the results by can be entered to the `info` argument. In this example, this will be information about the education level. In a second step one can group the results based on one or more columns and so summarize the results by the preferred sub-groups. ```{r eval=TRUE, include=TRUE, echo=TRUE} output_attribute <- healthiar::attribute_health( rr_central = 1.063, rr_increment = 10, erf_shape = "log_linear", cutoff_central = 0, exp_central = c(6, 7, 8, 7, 8, 9, 8, 9, 10, 9, 10, 11), bhd_central = c(600, 700, 800, 700, 800, 900, 800, 900, 1000, 900, 1000, 1100), geo_id_micro = rep(c("a", "b", "c", "d"), each = 3), info = data.frame( education = rep(c("secondary", "bachelor", "master"), times = 4)) # education level ) ``` ### Results by other sub-group ```{r eval=TRUE, include=TRUE, echo=TRUE} output_stratified <- output_attribute$health_detailed$results_raw |> dplyr::group_by(info_column_1) |> dplyr::summarize(mean_impact = mean(impact))|> dplyr::pull(mean_impact) |> print() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(output_attribute, output_stratified) ``` ## by age, sex and other sub-groups ### Goal E.g., to quantify attributable health impacts *stratified by age, sex and additional sub-group e.g. education level*. ### Function call ```{r eval=TRUE, include=TRUE, echo=TRUE} output_attribute <- healthiar::attribute_health( rr_central = 1.063, rr_increment = 10, erf_shape = "log_linear", cutoff_central = 0, age_group = base::rep(c("50_and_younger", "50_plus"), each = 4, times= 2), sex = base::rep(c("female", "male"), each = 2, times = 4), exp_central = c(6, 7, 8, 7, 8, 9, 8, 9, 10, 9, 10, 11, 10, 11, 12, 13), bhd_central = c(600, 700, 800, 700, 800, 900, 800, 900, 1000, 900, 1000, 1100, 1000, 1100, 1200, 1000), geo_id_micro = base::rep(c("a", "b"), each = 8), info = base::data.frame( education = base::rep(c("without_master", "with_master"), times = 8)) # education level ) ``` ### Results by all sub-groups ```{r eval=TRUE, include=TRUE, echo=TRUE} output_stratified <- output_attribute$health_detailed$results_raw |> dplyr::group_by(info_column_1) |> dplyr::summarize(mean_impact = mean(impact))|> dplyr::pull(mean_impact) |> print() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(output_attribute, output_stratified) ``` # YLL & deaths with life table ## YLL ### Goal E.g., to quantify the years of life lost (YLL) due to deaths from COPD attributable to PM2.5 exposure during one year. ### Methodology #### Data preparation The life table approach to obtain YLL and deaths requires population and baseline mortality data to be stratified by *one year* age groups. However, in some cases these data are only available for larger age groups (e.g., 5-year data: 0-4 years old, 5-9 years old, ...). What to do? - If your population and mortality data are *not* available by one-year age group, your data must be prepared by interpolating values. The `healthiar` function `prepare_lifetable()` makes this conversion using the same approach as the WHO tool AirQ+ [@WHO2020_report]. - If your population and death data are stratified by one-year age group, you are lucky, you can ignore this initial step. #### General concept The life table methodology of `attribute_lifetable()` follows that of the WHO tool AirQ+ [@WHO2020_report], which is described in more detail by @Miller2003_jech. In short, two scenarios are compared: a) a scenario with the exposure level specified in the function ("exposed scenario") and b) a scenario with no exposure ("unexposed scenario"). First, the entry and mid-year populations of the (first) year of analysis in the unexposed scenario is determined using modified survival probabilities. Second, age-specific population projections using scenario-specific survival probabilities are done for both scenarios. Third, by subtracting the populations in the unexposed scenario from the populations in the exposed scenario the premature deaths/years of life lost attributable to the exposure are determined. An expansive life table case study for is available in a report of @Miller2010_report. #### Determination of populations in the (first) year of analysis ##### Entry population The entry (i.e. start of year) populations in both scenarios (exposed and unexposed) is determined as follows: $$entry\_population_{year_1} = midyear\_population_{year_1} + \frac{deaths_{year_1}}{2}$$ ##### Survival probabilities ###### Exposed scenario The survival probabilities in the exposed scenario from start of year $i$ to start of year $i+1$ are calculated as follows: $$prob\_survival = \frac{midyear\_population_i - \frac{deaths_i}{2}}{midyear\_population_i + \frac{deaths_i}{2}}$$ Analogously, the probability of survival from start of year $i$ to mid-year $i$: $$prob\_survival\_until\_midyear = 1 - \frac{1 - prob\_survival}{2}$$ ###### Unexposed scenario The survival probabilities in the unexposed scenario are calculated as follows: First, the age-group specific hazard rate in the exposed scenario is calculated using the inputted age-specific mid-year populations and deaths. $$hazard\_rate = \frac{deaths}{mid\_year\_population}$$ Second, the hazard rate is multiplied with the modification factor ($= 1 - PAF$) to obtain the age-specific hazard rate in the unexposed scenario. $$hazard\_rate\_mod = hazard\_rate \times modification\_factor$$ Third, the the age-specific survival probabilities (from the start until the end in a given age group) in the unexposed scenario are calculated as follows (cf. Miller & Hurley 2003): $$prob\_survival\_mod = \frac{2-hazard\_rate\_mod}{2+hazard\_rate\_mod}$$ ##### Mid-year population The mid-year populatios of the (first) year of analysis (year_1) in the unexposed scenario are determined as follows: First, the survival probabilities from start of year $i$ to mid-year $i$ in the unexposed scenario is calculated as: $$prob\_survival\_until\_midyear_{mod} = 1 - \frac{1 - prob\_survival\_mod}{2}$$ Second, the mid-year populations of the (first) year of analysis (year_1) in the unexposed scenario is calculated: $$midyear\_population\_unexposed_{year_1} = entry\_population_{year_1} \times prob\_survival\_until\_midyear_{mod}$$ #### Population projection Using the age group-specific and scenario-specific survival probabilities calculated above, future populations of each age-group under each scenario are calculated. ###### Exposed scenario The population projections for the two possible options of `approach_exposure` (`"single_year"` and `"constant"`) for the unexposed scenario are different. In the case of `"single_year"` exposure, the population projection for the years after the year of exposure is the same as in the unexposed scenario. In the case of `"constant"` the population projection is done as follows: First, the entry population of year $i+1$ is calculated (which is the same as the end of year population of year $i$) using the entry population of year $i$. $$entry\_population_{i+1} = entry\_population_i \times prob\_survival$$ Second, the mid-year population of year $i+1$ is calculated. $$midyear\_population_{i+1} = entry\_population_{i+1} \times prob\_survival\_until\_midyear$$ ###### Unexposed scenario The entry and mid-year population projections of in the exposed scenario is done as follows: First, the entry population of year $i+1$ is calculated (which is the same as the end of year population of year $i$) by multiplying the entry population of year $i$ and the modified survival probabilities. $$entry\_population_{i+1} = entry\_population_i \times prob\_survival\_mod$$ Second, the mid-year population of year $i+1$ is calculated. $$midyear\_population_{i+1} = entry\_population_{i+1} \times prob\_survival\_until\_midyear$$ ### Function call We can use `attribute_lifetable()` combined with life table input data to determine YLL attributable to an environmental stressor. ```{r} results_pm_yll <- attribute_lifetable( year_of_analysis = 2019, health_outcome = "yll", rr_central = 1.118, rr_increment = 10, erf_shape = "log_linear", exp_central = 8.85, cutoff_central = 5, min_age = 20, # age from which population is affected by the exposure # Life table information age_group = exdat_lifetable$age_group, sex = exdat_lifetable$sex, population = exdat_lifetable$midyear_population, # In the life table case, BHD refers to deaths bhd_central = exdat_lifetable$deaths ) ``` ### Main results Total YLL attributable to exposure (sum of sex-specific impacts). ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_yll$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_yll$health_main |> dplyr::slice_head() |> dplyr::select(impact_rounded, erf_ci, exp_ci, bhd_ci) |> knitr::kable() ``` Use the two arguments `approach_exposure` and `approach_newborns` to modify the life table calculation: - `approach_exposure` - `"single_year"` (default) population is exposed for a single year and the impacts of that exposure are calculated - `"constant"` population is exposed every year - `approach_newborns` - `"without_newborns"` (default) assumes that the population in the year of analysis is followed over time, without considering newborns being born - `"with_newborns"` assumes that for each year after the year of analysis n babies are born, with n being equal to the (male and female) population aged 0 that is provided in the argument population ### Detailed results Attributable YLL results - per year - per age (group) - per sex (if sex-specific life table data entered) are available. *Note*: We will inspect the results for females; male results are also available. ### Results per year *NOTE*: only a selection of years is shown. ```{r} results_pm_yll$health_detailed$results_raw |> dplyr::summarize( .by = year, impact = sum(impact, na.rm = TRUE) ) ``` ```{r} results_pm_yll$health_detailed$results_raw |> dplyr::summarize( .by = year, impact = sum(impact, na.rm = TRUE)) |> knitr::kable() ``` #### YLL ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_yll$health_detailed$intermediate_calculations |> dplyr::filter(sex == "female") |> dplyr::pull(impact_by_age_and_year) |> purrr::pluck(1) ``` ```{r echo=FALSE} results_pm_yll$health_detailed$intermediate_calculations |> dplyr::filter(sex == "female") |> dplyr::pull(impact_by_age_and_year) |> purrr::pluck(1) |> dplyr::select(age_start, age_end, impact_2019) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable() ``` #### Population (baseline scenario) Baseline scenario refers to the scenario with exposure (i.e. the scenario specified in the assessment). ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_exposed_by_age_and_year) |> purrr::pluck(1) ``` ```{r echo=FALSE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_exposed_by_age_and_year) |> purrr::pluck(1) |> dplyr::select(age_start, midyear_population_2019, midyear_population_2020, midyear_population_2021, midyear_population_2022) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable() ``` #### Population (unexposed scenario) Impacted scenario refers to the scenario without exposure. ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) ``` ```{r echo=FALSE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) |> dplyr::select(age_start, midyear_population_2019, midyear_population_2020, midyear_population_2021, midyear_population_2022) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable() ``` ## Deaths ### Goal (e.g.) E.g., to determine premature deaths from COPD attributable to PM2.5 exposure during one year. ### Function call See example on YLL for additional info on `attribute_lifetable()` calculations and its output. ```{r} results_pm_deaths <- attribute_lifetable( health_outcome = "deaths", year_of_analysis = 2019, rr_central = 1.118, rr_increment = 10, erf_shape = "log_linear", exp_central = 8.85, cutoff_central = 5, min_age = 20, # age from which population is affected by the exposure # Life table information age_group = exdat_lifetable$age_group, sex = exdat_lifetable$sex, population = exdat_lifetable$midyear_population, bhd_central = exdat_lifetable$deaths ) ``` ### Main results Total premature deaths attributable to exposure (sum of sex-specific impacts). ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_deaths$health_main$impact ``` ```{r, eval=TRUE, include=TRUE, echo=FALSE} results_pm_deaths$health_main |> dplyr::slice_head() |> dplyr::select(impact_rounded, erf_ci, exp_ci, bhd_ci) |> knitr::kable() ``` ### Detailed results Attributable premature deaths results - per year (if argument `approach_exposure = "constant"`) - per age (group) - per sex (if sex-specific life table data entered) are available. *Note*: we will inspect the results for females; male results are also available. *Note*: because we set the function argument `approach_exposure = "constant"` in the function call results are available for one year (the year of analysis). ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_deaths$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_yll$health_detailed$intermediate_calculations|> dplyr::filter(sex == "female") |> dplyr::pull(projection_if_unexposed_by_age_and_year) |> purrr::pluck(1) |> dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_pm_deaths) ``` # YLD ### Goal E.g., to quantify the years lived with disability (YLD) attributable to air pollution exposure using disability weights. ### Methodology To quantify the YLDs, you can use a prevalence-based or an incidence-based approach [@Kim2022_kspm]. * Prevalence-based : Enter `1` (year) in the argument(s) `dw_...` and _prevalence_ cases in `bhd_...`. * Incidence-based: Enter a value _above_ 1 in `dw_...` and _incidence_ cases in `bhd_...`. ### Function call ```{r} results_pm_copd_yld <- attribute_health( rr_central = 1.1, rr_increment = 10, erf_shape = "log_linear", exp_central = 8.85, cutoff_central = 5, bhd_central = 1000, duration_central = 10, dw_central = 0.2 ) ``` ### Main results ```{r eval=FALSE, include=FALSE, echo=TRUE} results_pm_copd_yld$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_pm_copd_yld$health_main |> dplyr::select(erf_ci, impact) |> knitr::kable() ``` # DALY ### Goal (e.g.) E.g., to obtain the disability-adjusted life years (DALY) as the sum of YLLs and YLDs. ### Methodology To obtain the attributable DALY, its two components, i.e. years of life lost (YLL) and years lived with disability (YLD), must be summed [@GBD2020_tl]. $$DALY = YLL + YLD$$ ### Function call This is possible using the function `daly()`. ```{r} results_daly <- daly( output_attribute_yll = results_pm_yll, output_attribute_yld = results_pm_copd_yld ) ``` ### Main results YLL, YLD & DALY ```{r eval=FALSE, include=FALSE, echo=TRUE} results_daly$health_main |> select(impact_yll_rounded, impact_yld_rounded, impact_rounded) ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_daly$health_main |> dplyr::select(impact_yll_rounded, impact_yld_rounded, impact_rounded) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_daly, results_pm_yll, results_pm_copd_yld) ``` # Modification of scenarios ### Goal E.g., to quantify health impacts using `attribute_health`in an scenario B very similar to a previous scenario A. ### Function call ```{r} scenario_A <- attribute_health( exp_central = 8.85, # EXPOSURE 1 cutoff_central = 5, bhd_central = 25000, approach_risk = "relative_risk", erf_shape = "log_linear", rr_central = 1.118, rr_increment = 10) ``` The function `attribute_mod()` can be used to modify one or multiple arguments of `attribute_health`in an existing scenario, e.g. `scenario_A`. ```{r} scenario_B <- attribute_mod( output_attribute = scenario_A, exp_central = 6 ) ``` This is equivalent to building the whole scenario again (see below), but more time and code efficient. ```{r} scenario_B <- attribute_health( exp_central = 6, # EXPOSURE 2 cutoff_central = 5, bhd_central = 25000, approach_risk = "relative_risk", erf_shape = "log_linear", rr_central = 1.118, rr_increment = 10) ``` # Comparison of two health scenarios ### Goal E.g., to compare the health impacts in the scenario "before intervention" vs. "after intervention". ### Methodology Two approaches can be used for the comparison of scenarios: - Delta: Subtraction of health impact in scenario 1 minus in scenarios 2 (i.e. two PAF) [@WHO2014_book] - Population impact fraction (PIF) [@WHO2003_report; @Murray2003_spbm; @Askari2020_ijph]. Note that the PIF comparison approach assumes same baseline health data for scenario 1 and 2 (e.g., comparison of two scenarios at the same time point), while the delta comparison approach, the difference between two scenarios is obtained by subtraction. Therefore, the delta approach is suited for comparison of scenarios in different time points. *IMPORTANT* If your aim is to quantify health impacts from a *policy intervention*, be aware that you should use the *same year of analysis* and therefore *same health baseline data* in both scenarios. The only variable that should change in the second scenario is the exposure (change as a result of the intervention). #### Population Impact Fraction (PIF) The Population Impact Fraction (PIF) is defined as the proportional change in disease or mortality when exposure to a risk factor is changed, for instance due to an intervention. ##### General Integral Form The most general equation describing this mathematically is an integral form [@WHO2003_report; @Murray2003_spbm]: $$PIF = \frac{\int rr\_at\_exp(x)PE(x)dx - \int rr\_at\_exp(x)PE'(x)dx}{\int rr\_at\_exp(x)PE(x)dx}$$ Where: * $x$ = exposure level * $PE(x)$ = population distribution of exposure * $PE'(x)$ = alternative population distribution of exposure * $rr\_at\_exp(x)$ = relative risk at exposure level compared to the reference level ##### Categorical Exposure Form If the population exposure is described as a categorical rather than continuous exposure, the integrals may be converted to sums [@WHO2003_report; Murray2003-spbm]: $$PIF = \frac{\sum rr\_at\_exp_{i} \times PE_{i} - \sum rr\_at\_exp_{i}PE'_{i}}{\sum rr\_at\_exp_{i}PE_{i}}$$ Where: * $i$ = the exposure category (e.g., in bins of 1 $\mu g/m^3$ $PM_{2.5}$ or 5 dB noise exposure) * $PE_i$ = fraction of population in exposure category $i$ * $PE'_i$ = fraction of population in category $i$ for alternative (ideal) exposure scenario * $rr\_at\_exp_i$ = relative risk for exposure category level $i$ compared to the reference level ##### Population weighted mean concentration form Finally, if the exposure is provided as the population weighted mean concentration, the equation for the PIF is reduced to: $$PIF = \frac{rr\_at\_exp - rr\_at\_exp_{alt}}{rr}$$ Where: * $rr\_at\_exp$ = relative risk at the exposure level * $rr\_at\_exp_{alt}$ = relative risk at the exposure level for the alternative exposure scenario ### Function call 1. Use `attribute_health()` to calculate burden of scenarios A & B. ```{r} scenario_A <- attribute_health( exp_central = 8.85, # EXPOSURE 1 cutoff_central = 5, bhd_central = 25000, approach_risk = "relative_risk", erf_shape = "log_linear", rr_central = 1.118, rr_increment = 10) ``` ```{r} scenario_B <- attribute_mod( output_attribute = scenario_A, exp_central = 6 ) ``` 2. Use `compare()` to compare scenarios A & B. ```{r} results_comparison <- healthiar::compare( approach_comparison = "delta", # or "pif" (population impact fraction) output_attribute_scen_1 = scenario_A, output_attribute_scen_2 = scenario_B ) ``` The default value for the argument `approach_comparison` is `"delta"`. The alterntive is `"pif"` (population impact fraction). See the function documentation of `compare()` for more details. ### Main results ```{r eval=FALSE, include=FALSE, echo=TRUE} results_comparison$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_comparison[["health_main"]] |> dplyr::select( impact, impact_rounded, impact_scen_1, impact_scen_2, bhd, dplyr::starts_with("exp_"), -dplyr::starts_with("exp_ci"), # remove col "exp_ci" dplyr::starts_with("rr_con")) |> dplyr::slice_head() |> knitr::kable() ``` ### Detailed results The `compare()` results contain two additional outputs in addition to those we have already seen: - `health_detailed` - `scen_1` contains results of scenario 1 (scenario A in our case) - `scen_2` contains results of scenario 2 (scenario B in our case) ```{r, echo=FALSE, eval=TRUE, include=FALSE} rm(results_comparison, scenario_A, scenario_B) ``` # Two correlated exposures ### Goal E.g., to quantify the total health impact attributable to PM2.5 and NO2. ### Methodology A methodological report of the EU project BEST-COST [@Strak2024_report] identified three approaches to add up attributable health impacts from correlated exposures: - Additive approach [@Steenland2006-e]: $$PAF_{additive} = PAF_{exposure1} + PAF_{exposure2}$$ - Multiplicative approach [@Jerrett2013-oup]: $$PAF_{multiplicative} = \frac{\sum PE \times (rr\_at\_exp_{multiplicative} - 1)}{\sum PE \times (rr\_at\_exp_{multiplicative}-1) + 1}$$ $$rr\_at\_exp_{multiplicative} = rr\_at\_exp_{exposure1} * rr\_at\_exp_{exposure2}$$ - Combined approach [@Steenland2006-e]: $$PAF_{combined} = 1-[(1-PAF_{exposure1}) \times (1-PAF_{exposure2})]$$ *Attention*: To apply any of these approaches, the relative risks for one exposure must be adjusted for the second exposure and the way round. ### Function call For this purpose, you can use the function `multiexpose()`. ```{r} results_pm <- attribute_health( erf_shape = "log_linear", rr_central = 1.369, rr_increment = 10, exp_central = 8.85, cutoff_central = 5, bhd_central = 30747 ) results_no2 <- attribute_mod( output_attribute = results_pm, exp_central = 10.9, rr_central = 1.031 ) results_multiplicative <- multiexpose( output_attribute_exp_1 = results_pm, output_attribute_exp_2 = results_no2, exp_name_1 = "pm2.5", exp_name_2 = "no2", approach_multiexposure = "multiplicative" ) ``` ```{r, echo=FALSE, eval=TRUE, include=FALSE} rm(results_no2, results_pm) ``` ### Main results ```{r eval=FALSE, include=TRUE, echo=TRUE} results_multiplicative$health_main ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} results_multiplicative$health_main |> dplyr::select(impact_rounded) |> knitr::kable() ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results_multiplicative) ``` # Standardization ### Goal E.g., to obtain the age-standardized attributable health impacts of two age groups. ### Methodology Age standardization involves adjusting the observed rates of a particular outcome to a standard population with a specific age structure. This is a technique used to allow the comparison of populations with different age structures [@GBD2020_tldemo; @Ahmad2001_report]. In `healthiar`, the function `standardize()` applies the direct method, where the age-specific rates observed in a study population are applied to a standard (reference) population distribution. The standardized health impact rate is computed as $$ impact\_per\_100k\_inhab_{std} = \sum_{i=1}^{k} (impact\_per\_100k\_inhab_i \times ref\_prop\_pop_i) $$ where: * $impact\_per\_100k\_inhab_{std}$ is the age-standardized health impact rate. * $impact\_per\_100k\_inhab_i$ is the impact rate observed in age group $i$ (e.g., impact per 100,000 inhabitants). * $ref\_prop\_pop_i$ is the proportion of the reference population in age group $i$ . * $k$ is the number of age groups. ### Function call ```{r} output_attribute <- attribute_health( rr_central = 1.063, rr_increment = 10, erf_shape = "log_linear", cutoff_central = 0, age_group = c("below_40", "above_40"), exp_central = c(8.1, 10.9), bhd_central = c(1000, 4000), population = c(100000, 500000) ) results <- standardize( output_attribute = output_attribute, age_group = c("below_40", "above_40"), ref_prop_pop = c(0.5, 0.5) ) ``` ### Main results Age-standardized impact rate: ```{r} print(results$health_main$impact_per_100k_inhab) ``` Age group-specific impact rate: ```{r} print(results$health_detailed$results_raw$impact_per_100k_inhab) ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(results) ``` # Preparation of exposure data ### Goal E.g., to determine population-weighted mean PM2.5 exposure for several neighborhoods of Brussels (Belgium) ### Methodology The `healthiar`function `prepare_exposure()` helps users that do not have the exposure data (needed for `healthiar` functions), but only spatial concentration and population data. The function calculates an average concentration value in each geographic unit, weighted with population at each location. $$ exp = \frac{\sum_{i=1}^{n} (C_i \times population_i)}{\sum_{i=1}^{n} population_i} $$ where: * $exp$ = population-weighted mean exposure for the geographic unit. * $C_i$ = pollutant concentration in grid cell $i$. * $population_i$ = population count in grid cell $i$. * $n$ = total number of grid cells contained by the geographic unit. In case population is entered as count by geographic sub-unit, the function calculates the mean concentration in each sub-unit and aggregates it to higher-level geographic units. If no population data is entered, the function calculates a simple spatial mean concentration as exposure value. The output of `prepare_exposure()` can be entered in the argument `exp_mean`, `exp_lower` and/or `exp_upper` in `healthiar` functions such as `attribute_health()`. ### Function call ```{r} # exdat_pwm_1 = Pollution grid data exdat_pwm_1 <- terra::rast(system.file("extdata", "exdat_pwm_1.tif", package = "healthiar")) # exdat_pwm_2 = Data with the geo units and population data. This is pre-loaded in healthiar. # If your raw data are in .gpkg format, you can use e.g. sf::st_read() pwm <- healthiar::prepare_exposure( poll_grid = exdat_pwm_1, # Formal class SpatRaster, geo_units = exdat_pwm_2, # sf of the geographic sub-units population = sf::st_drop_geometry(exdat_pwm_2$population), # population per geographic sub-unit geo_id_macro = sf::st_drop_geometry(exdat_pwm_2$region)) # higher-level IDs to aggregate ``` ### Main results Within the function output, the list `main` contains the population-weighted mean exposures for the (higher-level) geographic units in the column `exposure_mean` and the total population in each unit in column `population_total`. ```{r echo=FALSE} knitr::kable(pwm$main) ``` ```{r eval=TRUE, include=FALSE, echo=FALSE} rm(exdat_pwm_1) rm(exdat_pwm_2) rm(pwm) ``` # Threshold additional to cut-off ### Goal E.g., to quantify health impacts in the exposure group 55dB+ (calculation threshold) that are affected by a exposure above the effect threshold of 45 dB (cut-off). ### Function call The function arguments `erf_eq_...` require a function as input. Instead of using a `splinefun()` this can also be fulfilled by using a 'function(c)' which is of type 'function'. ```{r} #setting up function parameters threshold_effect <- 45 RR <- 1.055 threshold_calculation <- 55 rr_increment <- 10 # define categorical function, the ifelse condition enables the case distinction erf_function <- function(c){ output <- ifelse(c select(erf_ci, monetized_impact) |> knitr::kable() ``` We see that the monetized impact (discounted) is more than 160 million EURO. Alternatively, you can also monetize (attributable) health impacts from a non-`healthiar` source. ```{r} results <- monetize( impact = 1151, valuation = 100 ) ``` ## Cost-benefit analysis ### Goal (e.g.) E.g., to perform an economic evaluation for an intervention by comparing its benefits and costs via Cost-Benefit Analysis (CBA). ### Methodology The CBA is a type of economic evaluation that compares the costs and the benefits of an intervention, considering both measures expressed in monetary terms. To perform a CBA, you can use the function `cba()`. This approach requires monetizing benefits so they can be directly compared with costs. Since interventions typically generate costs and benefits over multi-year time horizons, discounting is a common practice to obtain the present value of future costs and benefits. Depending on the reference guidelines, the discount rate can be specified as the same for costs and benefits or different across them. The outputs of a Cost-Benefit Analysis can be expressed as three main indicators [@Boardman2018_book]: - intervention’s net benefit: the difference between monetized benefits and costs - Cost-Benefit Ratio (CBR): monetized benefits divided by costs and - Return on Investment (ROI): return generated per unit of expenditure by relating net benefits to the intervention’s costs. An intervention is recommended from a Cost-Benefit Analysis perspective, if it yields a positive net benefit or a positive ROI, or equivalently, a CBR greater than one, meaning that the intervention’s monetized benefits exceed its costs. These three outputs are available when running `cba()` and are calculated considering the following formulas. **Net Benefit** $$net\_benefit = benefit - cost$$ **Cost-Benefit Ratio (CBR)** $$cbr = \frac{benefit}{cost}$$ **Return on Investment (ROI)** $$roi = \frac{benefit - cost}{cost} \times 100$$ ### Function call Let's imagine we design a policy that would reduce air pollution to 5 $\mu g/m^3$, which is the concentration specified in the `cutoff_central` argument in the initial `attribute_health()` call. So we could avoid all COPD cases attributed to air pollution. Considering the cost to implement the policy (estimated at 100 million EURO), what would be the monetary net benefit of such a policy? We can find out using the functions `healthiar` and `cba()` ```{r} cba <- cba( output_attribute = results_pm_copd, valuation = 50000, cost = 100000000, discount_shape = "exponential", discount_rate_benefit = 0.03, discount_rate_cost = 0.03, n_years_benefit = 5, n_years_cost = 5 ) ``` ### Main results The outcome of the CBA is contained in two folders, which are added to the existing assessment: - `cba_main` contains the central estimate and the corresponding 95% confidence intervals obtained - `cba_detailed` contains additional intermediate results for both cost and benefit - `benefit` contains results `by_year` and raw results `health_raw` - `cost` contains the costs of the policy at the end of the period specified in the `n_years_cost` argument ```{r} cba$cba_main |> dplyr::select(benefit, cost, net_benefit) |> knitr::kable() ``` ```{r, echo=FALSE, eval=TRUE, include=FALSE} rm(cba) ``` We see that the central and upper 95% confidence interval estimates of avoided attributable COPD cases result in a net monetary benefit of the policy, while the lower 95% confidence interval estimate results in a net cost! # Social aspects ## Health impact attributable to social indicator ### Goal E.g., to estimate the health impact that is theoretically attributable to the difference in degree of deprivation of the population exposed. ### Methodology Taking into account socio-economic indicators, e.g. a multiple deprivation index [@Mogin2025_ejph], the differences in attributable health impacts across the study areas can be estimated [@Renard2019_bmc; @Otavova_2022_bmc]. Social inequalities are quantified as the difference between the least deprived areas (the last n-quantile) and - the most deprived areas or - the population overall. These differences can be - absolute or - relative. #### Difference most deprived vs. least deprived $$ absolute\_quantile = first - last $$ Where: * $absolute\_quantile$ = Absolute difference between quantiles. * $first$ = Average health impacts in _most_ deprived quantile. * $last$ = Average health impacts in _least_ deprived quantile. $$ relative\_quantile = \frac{absolute\_quantile}{last} $$ #### Difference overall vs. least deprived $$ absolute\_overall = overall - last $$ Where: * $absolute\_overall$ = Absolute difference regarding the overall average. * $overall$ = _Overall_ average health impacts in the study area. * $last$ = Average health impacts in _least_ deprived quantile. If you assume that the least deprived areas are similar to counter-factual cases (no exposure to deprivation), the relative difference regarding the overall average health impact could be interpreted as some kind of relative risk attributable to social inequalities. ### Function call First, quantify health impacts. ```{r eval=TRUE, include=TRUE, echo=TRUE} health_impact <- healthiar::attribute_health( age_group = exdat_socialize$age_group, exp_central = exdat_socialize$pm25_mean, cutoff_central = 0, rr_central = exdat_socialize$rr, erf_shape = "log_linear", rr_increment = 10, bhd_central = exdat_socialize$mortality, population = exdat_socialize$population, geo_id_micro = exdat_socialize$geo_unit) ``` Second, use the function `socialize()` entering the whole output of `attribute_health()` in the argument `output_attribute`. ```{r eval=TRUE, include=TRUE, echo=TRUE} social_t <- healthiar::socialize( output_attribute = health_impact, age_group = exdat_socialize$age_group, # They have to be the same in socialize() and in attribute_health() ref_prop_pop = exdat_socialize$ref_prop_pop, # Population already provided in output_attribute geo_id_micro = exdat_socialize$geo_unit, social_indicator = exdat_socialize$score, n_quantile = 10, increasing_deprivation = TRUE) ``` Alternatively, you can directly enter the health impact in the `socialize()` argument `impact`. ```{r eval=TRUE, include=TRUE, echo=TRUE} social <- healthiar::socialize( impact = health_impact$health_detailed$results_by_age_group$impact, age_group = exdat_socialize$age_group, # They have to be the same in socialize() and in attribute_health() ref_prop_pop = exdat_socialize$ref_prop_pop, geo_id_micro = exdat_socialize$geo_unit, social_indicator = exdat_socialize$score, population = exdat_socialize$population, # Population has to be provided because no output_attribute n_quantile = 10, increasing_deprivation = TRUE) ``` ### Main results ```{r echo=FALSE} social$social_main |> dplyr::select(c(parameter, difference_type, difference_compared_with, difference_value, comment)) |> print() ``` ## Multiple deprivation index ### Goal E.g., to estimate the multiple deprivation index (MDI) to use it for the argument `social_indicator` in the function `socialize()`. ### Methodology Socio-economic indicators (e.g., education level, employment status and family structure) can be condensed into a multiple deprivation index (MDI) [@Mogin2025_ejph]. For this purpose, the indicators can be normalized using min-max scaling. The reliability of the MDI can be assessed using Cronbach's alpha [@Cronbach1951_p]. $$ \alpha = \frac{k}{k - 1} \left( 1 - \frac{\sum_{i=1}^{k} \sigma^2_{y_i}}{\sigma^2_x} \right) $$ where: * $k$ is the number of items/variables. * $\sigma^2_{y_i}$ is the variance of the $i$-th item. * $\sum_{i=1}^{k} \sigma^2_{y_i}$ is the sum of the variances of all items. * $\sigma^2_x$ is the total variance of the observed total scores (the sum of all items). To apply this approach, you should ensure that the data set is as complete as possible. Otherwise, you can try to impute missing data using: - Time-Based Imputation: Linear regression based on historical trends if prior years' data is complete. - Indicator-Based Imputation: Multiple linear regression if the missing indicator correlates strongly with others. Imputation models should have an R^2 greater than or equal to 0.7. If R^2 lower than 0.7, consider alternative data sources or methods. ### Function call ```{r eval=TRUE, include=TRUE, echo=TRUE} mdi <- prepare_mdi( geo_id_micro = exdat_prepare_mdi$id, edu = exdat_prepare_mdi$edu, unemployed = exdat_prepare_mdi$unemployed, single_parent = exdat_prepare_mdi$single_parent, pop_change = exdat_prepare_mdi$pop_change, no_heating = exdat_prepare_mdi$no_heating, n_quantile = 10, verbose = FALSE ) ``` *Note*: `verbose = FALSE` suppresses any output to the console (default: `verbose = TRUE`, i.e. with printing turned on). ### Main results Function output includes: - `mdi_main`, a tibble containing the BEST-COST MDI ```{r eval=FALSE, include=TRUE, echo=TRUE} mdi$mdi_main |> select(geo_id_micro, MDI, MDI_index) ``` ```{r eval=TRUE, include=TRUE, echo=FALSE} mdi$mdi_main |> dplyr::select(geo_id_micro, MDI, MDI_index) |> knitr::kable() ``` The function assesses the reliability of the MDI based on the Cronbach's alpha value as follows: - 0.9 and higher: Excellent reliability - between 0.8 (included) and 0.9: Good reliability - between 0.7 (included) and 0.8: Acceptable reliability - between 0.6 (included) and 0.7: Questionable reliability - lower than 0.6: Poor reliability ### Detailed results - `mdi_detailed` - DESCRIPTIVE STATISTICS - PEARSON'S CORRELATION COEFFICIENTS - CRONBACH'S α, including the reliability rating this value indicates - Code for boxplots of the single indicators - Code for histogram of the MDI's for the geo units with a normal distribution curve To reproduce the boxlots run ```{r fig.alt="Boxplot of Normalized Indicators and MDI", eval=TRUE, include=TRUE, echo=TRUE} eval(mdi$mdi_detailed$boxplot) ``` Analogeously, to reproduce the histogram run ```{r fig.alt="Histogram of MDI with normal curve", eval=TRUE, include=TRUE, echo=TRUE} eval(mdi$mdi_detailed$histogram) ``` ```{r, eval=TRUE, include=FALSE, echo=FALSE} rm(mdi) ``` --------------------------------------------------------------------------- # Inside pipes ## Pipe |> `healthiar` can be used inside the _native_ pipes `|>`. See the example below. ```{r eval=FALSE, include=TRUE, echo = TRUE} exdat_noise |> (\(df) { healthiar::attribute_health( approach_risk = df$risk_estimate_type, exp_central = df$exposure_mean, pop_exp = df$exposed, erf_eq_central = df$erf )$health_main$impact_rounded })() ``` Shorter making used of the base R function `with()`. ```{r} exdat_noise |> (\(df) { with(df, healthiar::attribute_health( approach_risk = risk_estimate_type, exp_central = exposure_mean, pop_exp = exposed, erf_eq_central = erf ))$health_main$impact_rounded })() ``` ## Pipe %>% `healthiar` can also be used inside _magrittr_ pipes `%>%` as follows. ```{r eval=FALSE, include=TRUE, echo = TRUE} exdat_noise %>% { healthiar::attribute_health( approach_risk = .$risk_estimate_type, exp_central = .$exposure_mean, pop_exp = .$exposed, erf_eq_central = .$erf )$health_main$impact_rounded } ``` --------------------------------------------------------------------------- # Export and visualize Exporting and visualizing results is out of scope of `healthiar`. To export and visualize, you can make use of existing functions in other packages beyond `healthiar` as indicated below. ## Export results Export as `.csv` file ```{r eval=FALSE, include=FALSE, echo=TRUE} write.csv(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.csv") ``` Save as `.Rdata` file ```{r eval=FALSE, include=FALSE, echo=TRUE} save(results_pm_copd, file = "exported_results/results_pm_copd.Rdata") ``` Export to Excel (as `.xlsx` file) ```{r eval=FALSE, include=FALSE, echo=TRUE} openxlsx::write.xlsx(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.xlsx") ``` ## Visualize results Visualization is out of scope of `healthiar`. You can visualize in: - R using base programming or packages such as `ggplot2` [@Wickham2016_ggplot], - Excel (export results first) or - Other tools. ------------------------------------------------------------------------ # Abbreviations BHD/bhd = baseline health data CI = confidence interval CBA/cba = cost-benefit analysis exp = exposure ERF = exposure-response function RR/rr = relative risk WHO = World Health Organization YLL/yll = years of life lost ------------------------------------------------------------------------ # References