---
title: "Vignette: Intro to healthiar"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
bibliography: ../inst/REFERENCES.bib
vignette: >
  %\VignetteIndexEntry{Vignette: Intro to healthiar}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, include=TRUE)
```


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
options(rmarkdown.html_vignette.check_title = FALSE)
options(knitr.kable.max_rows = 10)
set.seed(1)

## Avoid using pacman here, as it causes error in installation if it's not installed already
library(healthiar)
library(tibble)
library(dplyr)
library(purrr)
library(tidyr)
library(knitr)
library(terra)
library(sf)
```

Hi there!

This vignette will tell you about `healthiar` and show you how to use `healthiar` with the help of examples.

*NOTE*: Before using `healthiar`, please read carefully the information provided in the [readme file](https://github.com/SwissTPH/healthiar?tab=readme-ov-file#readme) or the [welcome webpage](https://swisstph.github.io/healthiar/). By using `healthiar`, you agree to the [terms of use and disclaimer](https://github.com/SwissTPH/healthiar?tab=readme-ov-file#readme).



------------------------------------------------------------------------

# About `healthiar`

The `healthiar` functions allow you to quantify and monetize the health impacts (or burden of disease) attributable to exposure. The main focus of the EU project that initiated the development of `healthiar` (BEST-COST) has been two environmental exposures:  air pollution and noise. However, `healthiar` could be used for other exposures such as green spaces, chemicals, physical activity...

See below a an overview of the `healthiar`, which is the first page of the [cheat sheet](https://swisstph.github.io/healthiar/articles/cheatsheet.html). The whole list of functions included in `healthiar` is linked there and available in the [reference](https://swisstph.github.io/healthiar/reference/index.html). 


![Figure: Overview of `healthiar`](../man/figures/cheatsheet_healthiar_1st_page.png)

# Input & output data

## Input

You can enter data in `healthiar` functions using: 
- hard coded values or
- columns inside pre-loaded data frames or tibbles.

Let's see some examples calling the most important function in `healthiar`: `attribute_health()`.

### Hard coded vs. columns

#### Hard coded

Depending on the function argument, you will need to enter numeric or character values.

```{r eval=FALSE, include=TRUE}
results_pm_copd <- attribute_health(
  exp_central = 8.85, 
  rr_central = 1.369, 
  rr_increment = 10,  
  erf_shape = "log_linear",
  cutoff_central = 5,
  bhd_central = 30747 
)
```

#### Columns

`healthiar` comes with some example data that start with `exdat_` that allow you to test functions. Some of these example data will be used in some examples in this vignette.

Now let's `attribute_health()` with input data from the `healthiar` example data. Note that you can easily provide input data to the function argument using the `$` operator.

```{r, eval=FALSE, include=TRUE}
results_pm_copd <- attribute_health(
  erf_shape = "log_linear",
  rr_central = exdat_pm$relative_risk, 
  rr_increment = 10, 
  exp_central = exdat_pm$mean_concentration,
  cutoff_central = exdat_pm$cut_off_value,
  bhd_central = exdat_pm$incidence
)
```


### Tidy data

Be aware that `healthiar` functions are easier to use if your data is prepared in a tidy format, i.e.:

- Each variable is a column; each column is a variable.

- Each observation is a row; each row is an observation.

- Each value is a cell; each cell is a single value.

To know more about the concept of tidy format, see the article by [@Wickham2014_jss].

For example, in `attribute health()` the length of the input vectors to be entered in the arguments must be either 1 or the result of the combinations of the different values of:

- `geo_id_micro`

- `exp_...`

- `sex` 

- `age`

- (`info` for further sub-group analysis)


## Output

#### Structure

The output of the `healthiar`function `attribute_health()` and `attribute_lifetable` consists of two lists ("folders"):

-   `health_main` contains the main results

-   `health_detailed` contained detailed results and additional info about the assessment.

In other `healthiar` functions you can find a similar output structure but using different prefixes. E.g., `social_`in `socialize()` and `monetization_`in `monetitize()`.


#### Access

A similar structure can be found in other large functions in `helathiar`, e.g., `attribute_lifetable()`, `compare()`, `socialize()` or `monetize()`. In some functions, different elements are available in the output. For instance,   `attribute_lifetable()` creates additional output that is specific to life table calculations.

There exist different, equivalent ways of accessing the output:

-   With `$` operator: `results_pm_copd$health_main$impact_rounded` (as in the example above)

-   By mouse: go to the *Environment* tab in RStudio and click on the variable you want to inspect, and then open the `health_main` results table

-   With `[[]]` operator `results_pm_copd[["health_main"]]`

-   With `pluck()` & `pull()`: use the `purrr::pluck` function to select a list and then the `dplyr::pull` function extract values from a specified column, e.g. `results_pm_copd |> purrr::pluck("health_main") |> dplyr::pull("impact_rounded")`




---------------------------------------------------------------------------

# Function examples

The descriptions of the [`healthiar` functions](https://swisstph.github.io/healthiar/reference/index.html) provide examples that you can execute (with `healthiar` loaded) by running `example("function_name")`, e.g. `example("attribute_health")`. In the sections below in this vignette, you find additional examples and more detailed explanations.


# Relative risk 

### Goal

E.g., to quantify the COPD cases attributable to PM2.5 (air pollution) exposure in a country.

### Methodology
The comparative risk assessment approach [@Murray2003_e] is applied obtaining the population attributable fraction 
(percent of cases that are attributable to the exposure) based on the relative risk. 
The exposure scenario is compared with a counter-factual scenario.

This approach has been extensive documented and applied [e.g., @WHO2003_report; 
@Steenland2006-e; @Soares2020_report; @Pozzer2023_gh; @GBD2020_tl; @Lehtomaki_2025_eh]. 

![Figure: Relative risk approach](../man/figures/bod_rr.png){width="700"}

#### Population attributable fraction
General integral form for the **population attributable fraction (PAF)**:

$$PAF = \frac{\int rr\_at\_exp(x) \times PE(x)dx - 1}{\int rr\_at\_exp(x) \times pop\_exp(x)dx}$$

Where:

* $x$ = exposure level

* $PE(x)$ = population distribution of exposure

* $rr\_at\exp(x)$ = relative risk at exposure level compared to reference

##### Simplified for categorical exposure distribution
If exposure is categorical, the integrals are converted to sums:

$$PAF = \frac{\sum rr\_at\_exp_i \times PE_i - 1}{\sum rr\_at\_exp_i \times PE_i}$$

Alternatively, an equivalent form is:

$$PAF = \frac{\sum PE_i \times (rr\_at\_exp_i - 1)}{\sum PE_i\times (rr\_at\_exp_i - 1) + 1}$$

##### Simplified for single exposure value 
If there is one single single exposure value, corresponding to the population weighted mean concentration, the equation can be simplified as follows: 

$$PAF = \frac{rr\_at\_exp - 1}{rr\_at\_exp }$$
#### Scaling relative risk
How to get this relative risk at exposure level (`rr_at_exp`)? This is normally different to the relative risk published in the epidemiological literature (`rr`) together with the (concentration/dose) `increment` that corresponds to this relative risk. The equations used for scaling relative risk depend on the chosen exposure-response function shapes:

- linear [Lehtomaki_2025_eh]
$$RRexp = 1 + \frac{rr - 1}{increment} \times (exp - cutoff)$$

- log-linear [@Lehtomaki_2025_eh]
$$RRexp = e^{\frac{\log(rr)}{increment} \times (exp - cutoff)}$$

- log-log [@Lehtomaki_2025_eh]
$$RRexp = \left( \frac{exp + 1}{cutoff + 1} \right)^{\frac{\log(rr)}{\log(increment + cutoff + 1) - \log(cutoff + 1)}}$$

- linear-log [@Pozzer2023_gh]
$$RRexp = 1 + \frac{\log(rr - 1)}{\log(increment + cutoff + 1) - \log(cutoff + 1)} \times \frac{\log(exp + 1)}{\log(cutoff + 1)}$$


The relative risk at exposure level (`rr_at_exp`) and is part of the output of `attribute_health()` and `attribute_lifetable()`. `rr_at_exp` can also be calculated using `get_risk()`.


For conversion of hazard ratios and/or odds ratios to relative risks refer to [@VanderWeele2019_biom] and/or use the conversion tools developed by the Teaching group in EBM in 2022 for hazard ratios (https://ebm-helper.cn/en/Conv/HR_RR.html) and/or odds ratios (https://ebm-helper.cn/en/Conv/OR_RR.html).


### Function call

```{r, eval=TRUE, include=TRUE, echo=TRUE}
results_pm_copd <- attribute_health(
  approach_risk = "relative_risk", # If you do not call this argument, "relative_risk" will be assigned by default.
  erf_shape = "log_linear",
  rr_central = exdat_pm$relative_risk, 
  rr_increment = 10, 
  exp_central = exdat_pm$mean_concentration,
  cutoff_central = exdat_pm$cut_off_value,
  bhd_central = exdat_pm$incidence
)
```


### Main results
```{r, include=TRUE, eval=TRUE, echo=TRUE}
results_pm_copd$health_main
```

It is a table of the format `tibble` of 3 rows and 23 columns. Be aware that this main output contains input data, some intermediate steps and the final results in different formats.

Let's zoom in on some relevant aspects.

```{r}
results_pm_copd$health_main |> 
  dplyr::select(exp, bhd, rr, erf_ci, pop_fraction, impact_rounded) |> 
  knitr::kable() # For formatting reasons only: prints tibble in nice layout
```

Interpretation: this table shows us that exposure was 8.85 $\mu g/m^3$, the baseline health data (`bhd_central`) was 30747 (COPD incidence in this instance). The 1st row further shows that the impact attributable to this exposure using the central relative risk (`rr_central`) estimate of 1.369 is 3502 COPD cases, or \~11% of all baseline cases.

Some of the most results columns include:

-   *impact_rounded* rounded attributable health impact/burden
-   *impact* raw impact/burden
-   *pop_fraction* population attributable fraction (PAF) or population impact fraction (PIF)




# Absolute risk

### Goal
E.g., to quantify the number incidence cases of high annoyance attributable to (road traffic) noise exposure.

### Methodology
In the absolute risk calculation pathway, estimates are based on the size and distribution 
of the exposed population, rather than on baseline health data, 
as is the case in the relative risk pathway [@WHO2011_report]. 

![Figure: Absolute risk approach](../man/figures/bod_ar.png){width="700"}

$$N = \sum AR_i \times PE_i$$

Where:

* $N$ = attributed cases

* $AR_i$ = absolute risk at category $i$

* $PE_i$ = absolute population exposed at category $i$


### Function call

```{r}
results_noise_ha <- attribute_health(
  approach_risk = "absolute_risk", # default is "relative_risk"
  exp_central = c(57.5, 62.5, 67.5, 72.5, 77.5), # mean of the exposure categories
  pop_exp = c(387500, 286000, 191800, 72200, 7700), # population exposed per exposure category
  erf_eq_central = "78.9270-3.1162*c+0.0342*c^2" # exposure-response function
)
```

The `erf_eq_central` argument can digest other types of functions (see section on user-defined ERF).

### Main results

```{r echo=FALSE}
results_noise_ha |> 
  purrr::pluck("health_main") |>
  dplyr::select(erf_eq, erf_ci, impact_rounded) |> 
  knitr::kable() # Prints tibble in a minimal layout
```

### Results per noise exposure band

```{r eval=FALSE, include=TRUE, echo=TRUE}
results_noise_ha$health_detailed$results_raw
```

```{r echo=FALSE, include=TRUE, eval=TRUE}
results_noise_ha[["health_detailed"]][["results_raw"]] |> 
  select(exp_category, exp, pop_exp, impact) |> 
  knitr::kable()
```

Remember, that if the equation of the exposure-response function (`erf_eq_...`) 
requires taking a maximum in a vectorised context, `pmax()` must be used instead of `max()`. `pmax()` 
should be used whenever an element-wise maximum is required (the output will be a vector), 
while `max()` returns a single global maximum for the entire vector. For example:

```{r eval=FALSE, include=TRUE, echo=TRUE}
erf_eq_central <- 
  "exp(0.2969*log((pmax(0,c-2.4)/1.9+1))/(1+exp(-(pmax(0,c-2.4)-12)/40.2)))"  
```


### One exposure category

Alternatively, it's also possible to only assess the absolute risk impacts 
for one exposure category (e.g., a single noise exposure band).

```{r}
results_noise_ha <- attribute_health(
  approach_risk = "absolute_risk",
  exp_central = 57.5,
  pop_exp = 387500,
  erf_eq_central = "78.9270-3.1162*c+0.0342*c^2"
)
```

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_noise_ha$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_noise_ha[["health_main"]] |> 
  dplyr::select(exp_category, impact) |> 
  knitr::kable()
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_noise_ha)
```



# Multiple geographic units

## using relative risk

### Goal 

E.g., to quantify the disease cases attributable to PM2.5 exposure in multiple cities using one single command.


### Function call

-   Enter unique ID's as a vector (`numeric` or `character`) to the `geo_id_micro` argument (e.g., municipality names or province abbreviations)

-   Optional: aggregate unit-specific results by providing higher-level ID's (e.g., region names or country abbreviations) as a vector (`numeric` or `character`) to the `geo_id_macro` argument

Input to the other function arguments is specified as usual, either as a vector or a single values (which will be recycled to match the length of the other input vectors).

```{r}
results_iteration <- attribute_health(
    # Names of Swiss cantons
    geo_id_micro = c("Zurich", "Basel", "Geneva", "Ticino", "Jura"),
    # Names of languages spoken in the selected Swiss cantons
    geo_id_macro = c("German","German","French","Italian","French"),
    rr_central = 1.369,
    rr_increment = 10, 
    cutoff_central = 5,
    erf_shape = "log_linear",
    exp_central = c(11, 11, 10, 8, 7),
    bhd_central = c(4000, 2500, 3000, 1500, 500)
)
```

In this example we want to aggregate the lower-level geographic units (municipalities) by the higher-level language region (`"German", "French", "Italian"`).

### Main results

The main output contains aggregated results

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_iteration$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_iteration[["health_main"]] |> 
  dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci, bhd_ci) |> 
  knitr::kable()
```

In this case `health_main` contains the cumulative / summed number of stroke cases attributable to PM2.5 exposure in the 5 geo units, which is `r sum(results_iteration[["health_main"]]$impact_rounded)` (using a relative risk of 1.369).

### Detailed results

The geo unit-specific information and results are stored under `health_detailed`\>`results_raw` .

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_iteration$health_detailed$results_raw
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_iteration[["health_detailed"]][["results_raw"]] |> 
  dplyr::select(geo_id_micro, impact_rounded, geo_id_macro) |> 
  knitr::kable()
```

`health_detailed` also contains impacts obtained through all combinations of input data central, lower and upper estimates (as usual), besides the results per geo unit (not shown above).

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_iteration)
```

## using absolute risk

### Goal 

E.g., to quantify high annoyance cases attributable to noise exposure in rural and urban areas.

### Function call

```{r}
data <- exdat_noise |> 
  ## Filter for urban and rural regions
  dplyr::filter(region == "urban" | region == "rural")
```

```{r}
results_iteration_ar <- attribute_health( 
    # Both the rural and urban areas belong to the higher-level "total" region
    geo_id_macro = "total",
    geo_id_micro = data$region,
    approach_risk = "absolute_risk",
    exp_central = data$exposure_mean,
    pop_exp = data$exposed,
    erf_eq_central = "78.9270-3.1162*c+0.0342*c^2"
)
```

*NOTE*: the length of the input vectors fed to `geo_id_micro`, `exp_central`, `pop_exp` must match and must be

(number of geo units) x (number of exposure categories) = 2 x 5 = **10**,

because we have 2 geo units (`"rural"` and `"urban"`) and 5 exposure categories.

### Main results

`health_main` contains the aggregated results (i.e. sum of impacts in rural and urban areas).

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_iteration_ar$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_iteration_ar[["health_main"]] |> 
  dplyr::select(geo_id_macro, impact_rounded, erf_ci, exp_ci) |> 
  knitr::kable()
```

### Detailed results

Impact by geo unit, in this case impact in the rural and in the urban area.

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_iteration_ar$health_detailed$results_by_geo_id_micro
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_iteration_ar[["health_detailed"]][["results_by_geo_id_micro"]] |> 
  dplyr::select(geo_id_micro, geo_id_macro, impact) |> 
  knitr::kable()
```

```{r  eval=TRUE, include=FALSE, echo=FALSE}
rm(results_iteration_ar, data)
```

# Uncertainty

## Confidence interval

### Goal 

E.g., to quantify the COPD cases attributable to PM2.5 exposure taking into account uncertainty (lower and upper bound of confidence interval) in several input arguments: relative risk, exposure and baseline health data.

### Function call

```{r}
results_pm_copd <- attribute_health(
    erf_shape = "log_linear",
    rr_central = 1.369, 
    rr_lower = 1.124, # lower 95% confidence interval (CI) bound of RR
    rr_upper = 1.664, # upper 95% CI bound of RR
    rr_increment = 10, 
    exp_central = 8.85, 
    exp_lower = 8, # lower 95% CI bound of exposure
    exp_upper = 10, # upper 95% CI bound of exposure
    cutoff_central = 5,
    bhd_central = 30747, 
    bhd_lower = 28000, # lower 95% confidence interval estimate of BHD
    bhd_upper = 32000 # upper 95% confidence interval estimate of BHD
) 
```

### Detailed results

Let's inspect the detailed results:

```{r eval=FALSE, echo=TRUE, include=FALSE}
results_pm_copd$health_detailed$results_raw
```

```{r echo=FALSE}
results_pm_copd$health_detailed$results_raw |> 
  dplyr::select(erf_ci, exp_ci, bhd_ci, impact_rounded) |> 
  dplyr::slice(1:9) |> 
  knitr::kable() # Prints tibble in a minimal layout
```

Each row contains the estimated attributable cases (`impact_rounded`) obtained by the input data specified in the columns ending in "\_ci" and the other calculation pathway specifications in that row (not shown).

-   The 1st contains the estimated attributable impact when using the central estimates of relative risk, exposure and baseline health data.

-   The 2nd row shows the impact when using the central estimates of the relative risk, exposure in combination with the lower estimate of the baseline health data.

-   ...

*NOTE*: only `r length(1:9)` of the `r length(results_pm_copd$health_detailed$results_raw$impact)` possible combinations are displayed due to space constraints.

*NOTE*: only a selection of columns are shown.


## Monte Carlo simulation

### Goal

E.g., to summarize uncertainty of attributable health impacts 
(i.e. to get a single confidence interval instead of many combinations) 
by using a Monte Carlo simulation.

### Methodology

#### General concepts
A Monte Carlo simulation is a statistical method that 
generates repeated random sampling [@Robert2004_book; @Rubinstein2016_book. 
In `healthiar`, you can use the function `summarize_uncertainty()` 
to simulate values in the arguments with uncertainty and 
estimate a single confidence interval in the results.

For each entered input argument that includes a (95%) confidence interval 
(i.e. `_lower` and `_upper` bound value) 
a distribution is fitted (see distributions below).
The median of the simulated attributable impacts is reported as the central estimate. 
The 2.5th and 97.5th percentiles of these simulated impacts define 
the lower and upper bounds of the 95% summary uncertainty interval.
Aggregated central, lower and upper estimates are obtained 
by summing the corresponding values of each lower level unit.


#### Distributions used for simulation

`summarize_uncertainty()` assumes the following shapes of the distributions 
in the simulations:

- Relative risk: The values are simulated based on an optimized *gamma* distribution, 
which fits well as relative risks are positive and their distributions are usually right-skewed.
The gamma distribution is parametrized such that its mean is equal to the central relative risk estimate 
(`rate= shape/rr_central`). The shape parameter is then optimized using 
`stats::optimize()` to match the inputed 95% confidence interval bounds, 
with `stats::qgamma()` used to evaluate candidate distributions.
Finally, `n_sim` relative risk values are simulated using `stats::rgamma()`.

- Exposure, cutoff, baseline health data and duration: 
The values are simulated based on a *normal* distribution 
using `stats::rnorm()` with 
`mean = exp_central`, `mean = cutoff_central`, `mean = bhd_central` or `mean = duration_central` 
and a standard deviation based on corresponding lower and upper 95% exposure 
confidence interval values. The standard deviation is calculated as $$(upper-lower)/(2*1.96)$$,
since for a normal distribution the 95% CI spans approximately two standard deviations on either side of the mean. 

- Disability weights: The values are simulated based on a *beta* distribution, 
as both the disability weights and the beta distribution are bounded by 0 and 1. 
The beta distribution best fitting the inputted central disability weight estimate and 
corresponding lower and upper 95% confidence interval values is fitted using 
`stats::qbeta()` (the best fitting distribution parameters 
`shape1` and `shape2` are determined using `stats::optimize()`). For this purpose, 
we partly adapted the R function `prevalence::beta_expert` 
with permission of one of the authors [@Devleesschauwer2022_package].
Finally, `n_sim` disability weight values are simulated using `stats::rbeta()`.

For stability of the 95% confidence interval, a large number of simulations 
(e.g., 10,000) is recommended in practice. 
The example below uses n_sim = 100 for brevity.

### Function call

```{r}
results_pm_copd_summarized <- 
  summarize_uncertainty(
    output_attribute = results_pm_copd,
    n_sim = 100
)
```

### Main results

The outcome of the Monte Carlo analysis is added to the variable entered as the `results` argument, which is `results_pm_copd` in our case.

Two lists ("folders") are added:

-   `uncertainty_main` contains the central estimate and the corresponding 95% confidence intervals obtained through the Monte Carlo assessment and

-   `uncertainty_detailed` contains all `n_sim` simulations of the Monte Carlo assessment.

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_copd_summarized$uncertainty_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_pm_copd_summarized$uncertainty_main |> 
  knitr::kable()
```


### Detailed results

The folder `uncertainty_detailed` contains all single simulations. 
Let's look at the impact of the first 10 simulations.

The columns `erf_ci`, `exp_ci`, `bhd_ci`, and `cutoff_ci` indicate 
the source of uncertainty component used for that simulation 
(in the first 10 simulations, all use central estimates).

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_copd_summarized$uncertainty_detailed$impact_by_sim
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_pm_copd_summarized$uncertainty_detailed$impact_by_sim |> 
  knitr::kable()
```

```{r, echo=FALSE, eval=TRUE, include=FALSE}
rm(results_pm_copd_summarized)
```



# User-defined ERF

### Goal 

E.g., to quantify COPD cases attributable to air pollution exposure by applying a user-defined exposure-response function (ERF), such as the MR-BRT curves from Global Burden of Disease study. 

### Function call
In this case, the function arguments `erf_eq_...` require a function as input, so we use an auxiliary function (`splinefun()`) to transform the points on the ERF into type `function`.
```{r}
results_pm_copd_mr_brt <- attribute_health(
  exp_central = 8.85,
  bhd_central = 30747,
  cutoff_central = 0,
  # Specify the function based on x-y point pairs that lie on the ERF
  erf_eq_central = splinefun(
    x = c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110),
    y = c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60),
    method = "natural")
)
```

The ERF curve created looks as follows

```{r fig.alt="ERF curve", eval=TRUE, include=TRUE, echo=FALSE}

x <- c(0, 5, 10, 15, 20, 25, 30, 50, 70, 90, 110)
y <- c(1.00, 1.04, 1.08, 1.12, 1.16, 1.20, 1.23, 1.35, 1.45, 1.53, 1.60)
spline_func <- 
  stats::splinefun(
    x = x,
    y = y,
    method = "natural")
## Generate finer x-values
x_dense <- seq(min(x), max(x), length.out = 200)
## Evaluate the spline function at these points
y_dense <- spline_func(x_dense)
## Plot
plot(x, 
     y, 
     # main = "User-defined ERF", 
     main = "", 
     xlab = "Exposure",
     ylab = "Relative risk",
     col = "blue", 
     pch = 19)
lines(x_dense, y_dense, col = "red", lwd = 2)
legend("topleft", legend = c("Original points", "Spline curve"),
       col = c("blue", "red"), pch = c(19, NA), lty = c(NA, 1), lwd = c(NA, 2))
```

Alternatively, other functions (e.g. `approxfun()`) can be used to create the ERF

# Sub-group analysis 
## by age group

### Goal

E.g., to quantify health impacts attributable to air pollution in a country *by age group*.


### Function call

To obtain age-group-specific results, the baseline health data (and possibly exposure) must be available by age group.

If the `age` argument was specified, age-group-specific results are available under `health_detailed` in the sub-folder `results_by_age_group`.
```{r}
results_age_group <- attribute_health(
        approach_risk = "relative_risk",
        age = c("below_65", "65_plus"),
        exp_central = c(8, 7),
        cutoff_central = c(5, 5),
        bhd_central = c(1000, 5000),
        rr_central = 1.06,
        rr_increment = 10,
        erf_shape = "log_linear"
      )
```

### Results by age group
```{r}
results_age_group$health_detailed$results_by_age_group |> 
  dplyr::select(age_group, impact_rounded, exp, bhd) |> 
  knitr::kable()
```


```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_age_group)
```


## by sex

### Goal

E.g., to quantify health impacts attributable to air pollution in a country *by sex*.

### Function call

The baseline health data (and possibly exposure) must be entered by sex. 

```{r eval=TRUE, include=TRUE, echo=TRUE}
results_sex <- attribute_health(
        approach_risk = "relative_risk",
        sex = c("female", "male"),
        exp_central = c(8, 8),
        cutoff_central = c(5, 5),
        bhd_central = c(1000, 1100),
        rr_central = 1.06,
        rr_increment = 10,
        erf_shape = "log_linear"
      )
```

### Results by sex

If the `sex` argument was specified, sex-specific results are available under `health_detailed` in the sub-folder `results_by_sex`.

```{r eval=TRUE, include=TRUE, echo=TRUE}
results_sex$health_detailed$results_by_sex |> 
  dplyr::select(sex, impact_rounded, exp, bhd) |> 
  knitr::kable()
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_sex)
```

## by other sub-groups

### Goal 

E.g., to quantify attributable health impacts *stratified by a sub-group different to age and sex, e.g., education level*.

### Function call
A single vector (or a data frame / tibble with multiple columns) to group the results by can be entered to the `info` argument. In this example, this will be information about the education level.

In a second step one can group the results based on one or more columns and so summarize the results by the preferred sub-groups.

```{r eval=TRUE, include=TRUE, echo=TRUE}
output_attribute <- healthiar::attribute_health(
    rr_central = 1.063,
    rr_increment = 10,
    erf_shape = "log_linear",
    cutoff_central =  0,
    exp_central = c(6, 7, 8,
                    7, 8, 9,
                    8, 9, 10,
                    9, 10, 11),
    bhd_central = c(600, 700, 800,
                    700, 800, 900,
                    800, 900, 1000,
                    900, 1000, 1100),
    geo_id_micro = rep(c("a", "b", "c", "d"), each = 3),
    info = data.frame(
      education = rep(c("secondary", "bachelor", "master"), times = 4)) # education level
  )
```

### Results by other sub-group
```{r eval=TRUE, include=TRUE, echo=TRUE}
output_stratified <- output_attribute$health_detailed$results_raw |>
      dplyr::group_by(info_column_1) |>
      dplyr::summarize(mean_impact = mean(impact))|>
      dplyr::pull(mean_impact) |>
      print()
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(output_attribute, output_stratified)
```


## by age, sex and other sub-groups

### Goal 

E.g., to quantify attributable health impacts *stratified by age, sex and additional sub-group e.g. education level*.

### Function call

```{r eval=TRUE, include=TRUE, echo=TRUE}
output_attribute <- healthiar::attribute_health(
    rr_central = 1.063,
    rr_increment = 10,
    erf_shape = "log_linear",
    cutoff_central =  0,
    age_group = base::rep(c("50_and_younger", "50_plus"), each = 4, times= 2),
    sex = base::rep(c("female", "male"), each = 2, times = 4),
    exp_central = c(6, 7, 8, 7, 8, 9, 8, 9,
                    10, 9, 10, 11, 10, 11, 12, 13),
    bhd_central = c(600, 700, 800, 700, 800, 900, 800, 900,
                    1000, 900, 1000, 1100, 1000, 1100, 1200, 1000),
    geo_id_micro = base::rep(c("a", "b"), each = 8),
    info = base::data.frame(
      education = base::rep(c("without_master", "with_master"), times = 8)) # education level
  )
```

### Results by all sub-groups

```{r eval=TRUE, include=TRUE, echo=TRUE}
output_stratified <- output_attribute$health_detailed$results_raw |>
      dplyr::group_by(info_column_1) |>
      dplyr::summarize(mean_impact = mean(impact))|>
      dplyr::pull(mean_impact) |>
      print()
```


```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(output_attribute, output_stratified)
```


# YLL & deaths with life table

## YLL

### Goal

E.g., to quantify the years of life lost (YLL) due to deaths from COPD attributable to PM2.5 exposure during one year.

### Methodology

#### Data preparation
The life table approach to obtain YLL and deaths requires population and baseline mortality data to be 
stratified by *one year* age groups. However, in some cases these data 
are only available for larger age groups (e.g., 5-year data: 0-4 years old, 5-9 years old, ...). What to do?

- If your population and mortality data are *not* available by one-year age group, your data must be prepared by interpolating values. The `healthiar` function `prepare_lifetable()` makes this conversion using the same approach as the WHO tool
AirQ+ [@WHO2020_report]. 

- If your population and death data are stratified by one-year age group, you are lucky, you can ignore this initial step.

#### General concept
The life table methodology of `attribute_lifetable()` follows that of the WHO tool AirQ+ [@WHO2020_report], which is described in more detail by @Miller2003_jech. 

In short, two scenarios are compared: 

a) a scenario with the exposure level specified in the function ("exposed scenario") and 

b) a scenario with no exposure ("unexposed scenario").

First, the entry and mid-year populations of the (first) year of analysis in the unexposed scenario is determined using modified survival probabilities. Second, age-specific population projections using scenario-specific survival probabilities are done for both scenarios. Third, by subtracting the populations in the unexposed scenario from the populations in the exposed scenario the premature deaths/years of life lost attributable to the exposure are determined.

An expansive life table case study for is available in a report of @Miller2010_report.


#### Determination of populations in the (first) year of analysis

##### Entry population

The entry (i.e. start of year) populations in both scenarios (exposed and unexposed) is determined as follows:

$$entry\_population_{year_1} = midyear\_population_{year_1} + \frac{deaths_{year_1}}{2}$$

##### Survival probabilities
###### Exposed scenario

The survival probabilities in the exposed scenario from start of year $i$ to start of year $i+1$ are calculated as follows:

$$prob\_survival = \frac{midyear\_population_i - \frac{deaths_i}{2}}{midyear\_population_i + \frac{deaths_i}{2}}$$

Analogously, the probability of survival from start of year $i$ to mid-year $i$:

$$prob\_survival\_until\_midyear = 1 - \frac{1 - prob\_survival}{2}$$

###### Unexposed scenario

The survival probabilities in the unexposed scenario are calculated as follows:

First, the age-group specific hazard rate in the exposed scenario is calculated using the inputted age-specific mid-year populations and deaths.

$$hazard\_rate = \frac{deaths}{mid\_year\_population}$$

Second, the hazard rate is multiplied with the modification factor ($= 1 - PAF$) to obtain the age-specific hazard rate in the unexposed scenario.

$$hazard\_rate\_mod = hazard\_rate \times modification\_factor$$

Third, the the age-specific survival probabilities (from the start until the end in a given age group) in the unexposed scenario are calculated as follows (cf. Miller & Hurley 2003):

$$prob\_survival\_mod = \frac{2-hazard\_rate\_mod}{2+hazard\_rate\_mod}$$

##### Mid-year population
The mid-year populatios of the (first) year of analysis (year_1) in the unexposed scenario are determined as follows:

First, the survival probabilities from start of year $i$ to mid-year $i$ in the unexposed scenario is calculated as:

$$prob\_survival\_until\_midyear_{mod} = 1 - \frac{1 - prob\_survival\_mod}{2}$$

Second, the mid-year populations of the (first) year of analysis (year_1) in the unexposed scenario is calculated:

$$midyear\_population\_unexposed_{year_1} = entry\_population_{year_1} \times prob\_survival\_until\_midyear_{mod}$$

#### Population projection

Using the age group-specific and scenario-specific survival probabilities calculated above, future populations of each age-group under each scenario are calculated.

###### Exposed scenario
The population projections for the two possible options of `approach_exposure` (`"single_year"` and `"constant"`) for the unexposed scenario are different. In the case of `"single_year"` exposure, the population projection for the years after the year of exposure is the same as in the unexposed scenario.

In the case of `"constant"` the population projection is done as follows:

First, the entry population of year $i+1$ is calculated (which is the same as the end of year population of year $i$) using the entry population of year $i$.

$$entry\_population_{i+1} = entry\_population_i \times prob\_survival$$

Second, the mid-year population of year $i+1$ is calculated.

$$midyear\_population_{i+1} = entry\_population_{i+1} \times prob\_survival\_until\_midyear$$

###### Unexposed scenario

The entry and mid-year population projections of in the exposed scenario is done as follows:

First, the entry population of year $i+1$ is calculated (which is the same as the end of year population of year $i$) by multiplying the entry population of year $i$ and the modified survival probabilities.

$$entry\_population_{i+1} = entry\_population_i \times prob\_survival\_mod$$

Second, the mid-year population of year $i+1$ is calculated.

$$midyear\_population_{i+1} = entry\_population_{i+1} \times prob\_survival\_until\_midyear$$


### Function call

We can use `attribute_lifetable()` combined with life table input data to determine YLL attributable to an environmental stressor.

```{r}
results_pm_yll <- attribute_lifetable(
  year_of_analysis = 2019, 
  health_outcome = "yll",
  rr_central =  1.118, 
  rr_increment = 10,
  erf_shape = "log_linear",
  exp_central = 8.85,
  cutoff_central = 5,
  min_age = 20, # age from which population is affected by the exposure
  # Life table information
  age_group = exdat_lifetable$age_group,
  sex = exdat_lifetable$sex,
  population = exdat_lifetable$midyear_population,
  # In the life table case, BHD refers to deaths
  bhd_central = exdat_lifetable$deaths
) 
```

### Main results

Total YLL attributable to exposure (sum of sex-specific impacts).

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_yll$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_pm_yll$health_main |> 
  dplyr::slice_head() |> 
  dplyr::select(impact_rounded, erf_ci, exp_ci, bhd_ci) |> 
  knitr::kable()
```

Use the two arguments `approach_exposure` and `approach_newborns` to modify the life table calculation:

-   `approach_exposure`

    -   `"single_year"` (default) population is exposed for a single year and the impacts of that exposure are calculated

    -   `"constant"` population is exposed every year

-   `approach_newborns`

    -   `"without_newborns"` (default) assumes that the population in the year of analysis is followed over time, without considering newborns being born

    -   `"with_newborns"` assumes that for each year after the year of analysis n babies are born, with n being equal to the (male and female) population aged 0 that is provided in the argument population

### Detailed results

Attributable YLL results

-   per year

-   per age (group)

-   per sex (if sex-specific life table data entered)

are available.

*Note*: We will inspect the results for females; male results are also available.

### Results per year

*NOTE*: only a selection of years is shown.

```{r}
results_pm_yll$health_detailed$results_raw |>
  dplyr::summarize(
    .by = year, 
    impact = sum(impact, na.rm = TRUE)
  )
```

```{r}
results_pm_yll$health_detailed$results_raw |>
  dplyr::summarize(
    .by = year, 
    impact = sum(impact, na.rm = TRUE)) |>
  knitr::kable()
```



#### YLL

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_yll$health_detailed$intermediate_calculations |> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(impact_by_age_and_year) |>
  purrr::pluck(1)
```

```{r echo=FALSE}
results_pm_yll$health_detailed$intermediate_calculations |> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(impact_by_age_and_year) |>
  purrr::pluck(1) |>
  dplyr::select(age_start, age_end, impact_2019) |> 
  dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |> 
  knitr::kable()
```

#### Population (baseline scenario)

Baseline scenario refers to the scenario with exposure (i.e. the scenario specified in the assessment).

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_exposed_by_age_and_year) |>
  purrr::pluck(1)
```

```{r echo=FALSE}
results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_exposed_by_age_and_year) |>
  purrr::pluck(1) |>   
  dplyr::select(age_start, midyear_population_2019, midyear_population_2020, 
                midyear_population_2021, midyear_population_2022) |>    
  dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |>  
  knitr::kable()
```

#### Population (unexposed scenario)

Impacted scenario refers to the scenario without exposure.

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_unexposed_by_age_and_year) |>
  purrr::pluck(1)
```

```{r echo=FALSE}
results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_unexposed_by_age_and_year) |>
  purrr::pluck(1) |>   
  dplyr::select(age_start, midyear_population_2019, midyear_population_2020, 
                midyear_population_2021, midyear_population_2022) |>    
  dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |>  
  knitr::kable()
```

## Deaths

### Goal (e.g.)

E.g., to determine premature deaths from COPD attributable to PM2.5 exposure during one year.


### Function call
See example on YLL for additional info on `attribute_lifetable()` calculations and its output.

```{r}
results_pm_deaths <- attribute_lifetable(
  health_outcome = "deaths",
  year_of_analysis = 2019,
  rr_central =  1.118, 
  rr_increment = 10,
  erf_shape = "log_linear",
  exp_central = 8.85,
  cutoff_central = 5,
  min_age = 20, # age from which population is affected by the exposure   
  # Life table information
  age_group = exdat_lifetable$age_group,   
  sex = exdat_lifetable$sex,
  population = exdat_lifetable$midyear_population, 
  bhd_central = exdat_lifetable$deaths
)
```

### Main results

Total premature deaths attributable to exposure (sum of sex-specific impacts).

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_deaths$health_main$impact
```

```{r, eval=TRUE, include=TRUE, echo=FALSE}
results_pm_deaths$health_main |> 
  dplyr::slice_head() |> 
  dplyr::select(impact_rounded, erf_ci, exp_ci, bhd_ci) |> 
  knitr::kable()
```

### Detailed results

Attributable premature deaths results

-   per year (if argument `approach_exposure = "constant"`)

-   per age (group)

-   per sex (if sex-specific life table data entered)

are available.

*Note*: we will inspect the results for females; male results are also available.

*Note*: because we set the function argument `approach_exposure = "constant"` in the function call results are available for one year (the year of analysis).

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_deaths$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_unexposed_by_age_and_year) |>
  purrr::pluck(1)
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_pm_yll$health_detailed$intermediate_calculations|> 
  dplyr::filter(sex == "female") |>
  dplyr::pull(projection_if_unexposed_by_age_and_year) |>
  purrr::pluck(1) |>
  dplyr::slice( ( dplyr::n() - 8 ) : dplyr::n() ) |>  
  knitr::kable()
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_pm_deaths)
```



# YLD

### Goal 

E.g., to quantify the years lived with disability (YLD) attributable to air pollution exposure using disability weights.

### Methodology

To quantify the YLDs, you can use a prevalence-based or an incidence-based approach [@Kim2022_kspm].

* Prevalence-based : Enter `1` (year) in the argument(s) `dw_...` and _prevalence_ cases in `bhd_...`.

* Incidence-based: Enter a value _above_ 1 in `dw_...` and _incidence_ cases in `bhd_...`.  

### Function call

```{r}
results_pm_copd_yld  <- attribute_health(
  rr_central = 1.1, 
  rr_increment = 10, 
  erf_shape = "log_linear",  
  exp_central = 8.85,
  cutoff_central = 5,
  bhd_central = 1000,
  duration_central = 10,
  dw_central = 0.2
)
```

### Main results

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_pm_copd_yld$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_pm_copd_yld$health_main |> 
  dplyr::select(erf_ci, impact) |> 
  knitr::kable()
```

# DALY

### Goal (e.g.)

E.g., to obtain the disability-adjusted life years (DALY) as the sum of YLLs and YLDs.

### Methodology

To obtain the attributable DALY, its two components, i.e. years of life lost (YLL) and years lived with disability (YLD), must be summed [@GBD2020_tl].

$$DALY = YLL + YLD$$

### Function call

This is possible using the function `daly()`.

```{r}
results_daly <- daly(
     output_attribute_yll = results_pm_yll,
     output_attribute_yld = results_pm_copd_yld
)
```

### Main results

YLL, YLD & DALY

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_daly$health_main |> 
  select(impact_yll_rounded, impact_yld_rounded, impact_rounded)
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_daly$health_main |> 
  dplyr::select(impact_yll_rounded, impact_yld_rounded, impact_rounded) |> 
  knitr::kable()
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_daly, results_pm_yll, results_pm_copd_yld)
```

# Modification of scenarios

### Goal 

E.g., to quantify health impacts using `attribute_health`in an scenario B very similar to a previous scenario A.

### Function call
```{r}
scenario_A <- attribute_health(
    exp_central = 8.85,   # EXPOSURE 1
    cutoff_central = 5, 
    bhd_central = 25000,
    approach_risk = "relative_risk",
    erf_shape = "log_linear",
    rr_central = 1.118,
    rr_increment = 10)
```

The function `attribute_mod()` can be used to modify one or multiple arguments of `attribute_health`in an existing scenario, e.g. `scenario_A`.

```{r}
scenario_B <- attribute_mod(
  output_attribute = scenario_A, 
  exp_central = 6
)
```

This is equivalent to building the whole scenario again (see below), but more time and code efficient.
```{r}
scenario_B <- attribute_health(
    exp_central = 6,     # EXPOSURE 2
    cutoff_central = 5, 
    bhd_central = 25000,
    approach_risk = "relative_risk",
    erf_shape = "log_linear",
    rr_central = 1.118,
    rr_increment = 10)
```



# Comparison of two health scenarios

### Goal 

E.g., to compare the health impacts in the scenario "before intervention" vs. "after intervention".

### Methodology

Two approaches can be used for the comparison of scenarios:

- Delta: Subtraction of health impact in scenario 1 minus in  scenarios 2 (i.e. two PAF) [@WHO2014_book]

- Population impact fraction (PIF) [@WHO2003_report; @Murray2003_spbm; @Askari2020_ijph].


Note that the PIF comparison approach assumes same baseline health data 
for scenario 1 and 2 (e.g., comparison of two scenarios at the same time point), 
while the delta comparison approach, the difference between two scenarios is obtained by subtraction. 
Therefore, the delta approach is suited for comparison of scenarios in different time points.

*IMPORTANT* 
If your aim is to quantify health impacts from a *policy intervention*, 
be aware that you should use the *same year of analysis* 
and therefore *same health baseline data* in both scenarios. 
The only variable that should change in the second scenario is the exposure 
(change as a result of the intervention).

#### Population Impact Fraction (PIF)

The Population Impact Fraction (PIF) is defined as the proportional change in disease or mortality 
when exposure to a risk factor is changed, for instance due to an intervention. 

##### General Integral Form
The most general equation describing this mathematically is an integral form [@WHO2003_report; @Murray2003_spbm]:

$$PIF = \frac{\int rr\_at\_exp(x)PE(x)dx - \int rr\_at\_exp(x)PE'(x)dx}{\int rr\_at\_exp(x)PE(x)dx}$$

Where:

* $x$ = exposure level
* $PE(x)$ = population distribution of exposure
* $PE'(x)$ = alternative population distribution of exposure
* $rr\_at\_exp(x)$ = relative risk at exposure level compared to the reference level

##### Categorical Exposure Form
If the population exposure is described as a categorical rather than continuous exposure, the integrals may be converted to sums [@WHO2003_report; Murray2003-spbm]:

$$PIF = \frac{\sum rr\_at\_exp_{i} \times PE_{i} - \sum rr\_at\_exp_{i}PE'_{i}}{\sum rr\_at\_exp_{i}PE_{i}}$$

Where:

* $i$ = the exposure category (e.g., in bins of 1 $\mu g/m^3$ $PM_{2.5}$ or 5 dB noise exposure)
* $PE_i$ = fraction of population in exposure category $i$
* $PE'_i$ = fraction of population in category $i$ for alternative (ideal) exposure scenario
* $rr\_at\_exp_i$ = relative risk for exposure category level $i$ compared to the reference level


##### Population weighted mean concentration form
Finally, if the exposure is provided as the population weighted mean concentration, the equation for the PIF is reduced to:

$$PIF = \frac{rr\_at\_exp - rr\_at\_exp_{alt}}{rr}$$

Where:

* $rr\_at\_exp$ = relative risk at the exposure level
* $rr\_at\_exp_{alt}$ = relative risk at the exposure level for the alternative exposure scenario

### Function call

1.  Use `attribute_health()` to calculate burden of scenarios A & B.

```{r}
scenario_A <- attribute_health(
    exp_central = 8.85,   # EXPOSURE 1
    cutoff_central = 5, 
    bhd_central = 25000,
    approach_risk = "relative_risk",
    erf_shape = "log_linear",
    rr_central = 1.118,
    rr_increment = 10)
```

```{r}
scenario_B <- attribute_mod(
  output_attribute = scenario_A, 
  exp_central = 6
)
```

2.  Use `compare()` to compare scenarios A & B.

```{r}

results_comparison <- healthiar::compare(
  approach_comparison = "delta", # or "pif" (population impact fraction)
  output_attribute_scen_1 = scenario_A,
  output_attribute_scen_2 = scenario_B
)
```

The default value for the argument `approach_comparison` is `"delta"`. The alterntive is `"pif"` (population impact fraction). See the function documentation of `compare()` for more details.

### Main results

```{r eval=FALSE, include=FALSE, echo=TRUE}
results_comparison$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_comparison[["health_main"]] |> 
  dplyr::select(
    impact, impact_rounded,
    impact_scen_1, impact_scen_2,
    bhd,
    dplyr::starts_with("exp_"),
    -dplyr::starts_with("exp_ci"), # remove col "exp_ci"
    dplyr::starts_with("rr_con")) |> 
  dplyr::slice_head() |> 
  knitr::kable()
```

### Detailed results

The `compare()` results contain two additional outputs in addition to those we have already seen:

-   `health_detailed`

    -   `scen_1` contains results of scenario 1 (scenario A in our case)

    -   `scen_2` contains results of scenario 2 (scenario B in our case)

```{r, echo=FALSE, eval=TRUE, include=FALSE}
rm(results_comparison, scenario_A, scenario_B)
```


# Two correlated exposures 

### Goal 

E.g., to quantify the total health impact attributable to PM2.5 and NO2.

### Methodology
 
A methodological report of the EU project BEST-COST [@Strak2024_report] identified
three approaches to add up attributable health impacts from correlated exposures:

- Additive approach [@Steenland2006-e]:

$$PAF_{additive} = PAF_{exposure1} + PAF_{exposure2}$$


- Multiplicative approach [@Jerrett2013-oup]:

$$PAF_{multiplicative} = \frac{\sum PE \times (rr\_at\_exp_{multiplicative} - 1)}{\sum PE \times (rr\_at\_exp_{multiplicative}-1) + 1}$$


$$rr\_at\_exp_{multiplicative} = rr\_at\_exp_{exposure1} * rr\_at\_exp_{exposure2}$$

- Combined approach [@Steenland2006-e]:

$$PAF_{combined} = 1-[(1-PAF_{exposure1}) \times (1-PAF_{exposure2})]$$

*Attention*: To apply any of these approaches, the relative risks for one exposure must be adjusted for the second exposure and the way round.


### Function call

For this purpose, you can use the function `multiexpose()`.

```{r}
results_pm <- attribute_health(
  erf_shape = "log_linear",
  rr_central = 1.369, 
  rr_increment = 10,
  exp_central = 8.85,
  cutoff_central = 5,
  bhd_central = 30747
) 

results_no2 <- attribute_mod(
  output_attribute = results_pm,
  exp_central = 10.9,
  rr_central = 1.031
)

results_multiplicative <- multiexpose(
  output_attribute_exp_1 = results_pm,
  output_attribute_exp_2 = results_no2,
  exp_name_1 = "pm2.5",
  exp_name_2 = "no2",
  approach_multiexposure = "multiplicative"
)
```

```{r, echo=FALSE, eval=TRUE, include=FALSE}
rm(results_no2, results_pm)
```

### Main results

```{r eval=FALSE, include=TRUE, echo=TRUE}
results_multiplicative$health_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
results_multiplicative$health_main |> 
  dplyr::select(impact_rounded) |> 
  knitr::kable()
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results_multiplicative)
```

# Standardization

### Goal

E.g., to obtain the age-standardized attributable health impacts of two age groups.

### Methodology

Age standardization involves adjusting the observed rates of a particular 
outcome to a standard population with a specific age structure. 
This is a technique used to allow the comparison of populations 
with different age structures [@GBD2020_tldemo; @Ahmad2001_report]. 
In `healthiar`, the function `standardize()` applies the direct method, 
where the age-specific rates observed in a study population 
are applied to a standard (reference) population distribution.

The standardized health impact rate is computed as 
$$ impact\_per\_100k\_inhab_{std} = \sum_{i=1}^{k} (impact\_per\_100k\_inhab_i \times ref\_prop\_pop_i) $$

where:

* $impact\_per\_100k\_inhab_{std}$ is the age-standardized health impact rate.
* $impact\_per\_100k\_inhab_i$ is the impact rate observed in age group $i$ (e.g., impact per 100,000 inhabitants).
* $ref\_prop\_pop_i$ is the proportion of the reference population in age group $i$ .
* $k$ is the number of age groups.
 

### Function call

```{r}
output_attribute <- attribute_health(
  rr_central = 1.063,
  rr_increment = 10,
  erf_shape = "log_linear",
  cutoff_central =  0,
  age_group = c("below_40", "above_40"),
  exp_central = c(8.1, 10.9),
  bhd_central = c(1000, 4000),
  population = c(100000, 500000)
  )

results <- standardize(
  output_attribute = output_attribute,
  age_group = c("below_40", "above_40"),
  ref_prop_pop = c(0.5, 0.5)
  )


```


### Main results

Age-standardized impact rate:
```{r}
print(results$health_main$impact_per_100k_inhab)  
```

Age group-specific impact rate:

```{r}
print(results$health_detailed$results_raw$impact_per_100k_inhab)  
```


```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(results)
```



# Preparation of exposure data

### Goal

E.g., to determine population-weighted mean PM2.5 exposure for several neighborhoods of Brussels (Belgium)

### Methodology
The `healthiar`function `prepare_exposure()` helps users that do not have the 
exposure data (needed for `healthiar` functions), but only spatial 
concentration and population data. 
The function calculates an average concentration value in each geographic unit, 
weighted with population at each location. 

$$ exp = \frac{\sum_{i=1}^{n} (C_i \times population_i)}{\sum_{i=1}^{n} population_i} $$

where:

* $exp$ = population-weighted mean exposure for the geographic unit.
* $C_i$ = pollutant concentration in grid cell $i$.
* $population_i$ = population count in grid cell $i$.
* $n$ = total number of grid cells contained by the geographic unit.

In case population is entered as count by geographic sub-unit, 
the function calculates the mean concentration in each sub-unit and 
aggregates it to higher-level geographic units. 
If no population data is entered, 
the function calculates a simple spatial mean concentration as exposure value.

The output of `prepare_exposure()` can be entered in the argument 
`exp_mean`, `exp_lower` and/or `exp_upper` in `healthiar` functions 
such as `attribute_health()`.

### Function call

```{r}
# exdat_pwm_1 = Pollution grid data
exdat_pwm_1 <- terra::rast(system.file("extdata", "exdat_pwm_1.tif", package = "healthiar"))

# exdat_pwm_2 = Data with the geo units and population data. This is pre-loaded in healthiar.
# If your raw data are in .gpkg format, you can use e.g.  sf::st_read() 

pwm <- healthiar::prepare_exposure(
  poll_grid = exdat_pwm_1, # Formal class SpatRaster,
  geo_units = exdat_pwm_2, # sf of the geographic sub-units
  population = sf::st_drop_geometry(exdat_pwm_2$population), # population per geographic sub-unit
  geo_id_macro = sf::st_drop_geometry(exdat_pwm_2$region)) # higher-level IDs to aggregate
```


### Main results

Within the function output, the list `main` contains 
the population-weighted mean exposures for the (higher-level) geographic units 
in the column `exposure_mean` and the total population in each unit in column `population_total`.

```{r echo=FALSE}
 knitr::kable(pwm$main)
```

```{r eval=TRUE, include=FALSE, echo=FALSE}
rm(exdat_pwm_1)
rm(exdat_pwm_2)
rm(pwm)
```




# Threshold additional to cut-off


### Goal

E.g., to quantify health impacts in the exposure group 55dB+ (calculation threshold) 
that are affected by a exposure above the effect threshold of 45 dB (cut-off).

### Function call

The function arguments `erf_eq_...` require a function as input. Instead of using a `splinefun()` this can also be fulfilled by using a 'function(c)' which is of type 'function'.

```{r}
#setting up function parameters
threshold_effect <- 45
RR <- 1.055
threshold_calculation <- 55
rr_increment <- 10

# define categorical function, the ifelse condition enables the case distinction
erf_function <- function(c){
  output <- ifelse(c<threshold_calculation, 1, exp((log(RR)/rr_increment)*(c-threshold_effect)))
  return(output)
}
# attribute_health
results_catERF_different_calc_thesh <- healthiar::attribute_health(
  approach_risk = "relative_risk",
  erf_eq_central = erf_function,
  prop_pop_exp = c(300000,200000,150000,120000,100000,70000,60000)/10000000,
  exp_central = c(47,52,57,62,67,72,77),
  cutoff_central=0,
  bhd_central=50000)$health_main$impact_rounded
```

The used function is equal to
$$
f(c) =
\begin{cases}
1, & c < \text{threshold} \\
\exp\left( \frac{\log(RR)}{rr_{increment}} (c - threshold_{effect}) \right), & c \ge \text{threshold}
\end{cases}
$$

The categorical ERF curve created looks as follows.

```{r fig.alt="ERF curve", eval=TRUE, include=TRUE, echo=FALSE}

# Evaluate exp_central points
x <- c(47,52,57,62,67,72,77) # central exposure values
y <- erf_function(x)

# Generate finer x-values
x_dense <- seq(min(x), max(x), length.out = 200)
# Evaluate exp_central points (finer version)
y_dense <- erf_function(x_dense)

## Plot
plot(x,
     y # main = "User-defined ERF"
     ,
     main = "",
     xlab = "Exposure",
     ylab = "Relative risk",
     col = "blue",
     pch = 19)

lines(x_dense,y_dense,col = "red",lwd = 2)

legend("topleft",
       legend = c("Original points", "Categorical ERF curve"),
       col = c("blue", "red"),
       pch = c(19, NA),
       lty = c(NA, 1),
       lwd = c(NA, 2))

```



# Economic dimension

## Monetization 

### Goal

E.g., to monetize the attributable health impact of a policy that will have health benefits five years from now.

### Methodology
In health economic evaluations and economic burden of disease assessments, 
health impacts may need to be converted into monetary values. 
For this purpose, you can use  `monetize()`. 

Several valuation metrics are available, 
depending on how outcomes are quantified in natural or health units 
(e.g. cases reduced, deaths prevented, reductions in mortality risk, 
life-years gained, QALYs gained, DALYs averted). 
Common metrics include the Value of a Statistical Life (VSL) [@OECD2025_book] , 
the Value of a Life-Year (VOLY) [@Hammitt2007_reep] and 
the Value of a Quality-Adjusted Life-Year (VAQALY) [@Bobinac2010_vh].

Discounting is the practice of converting future costs 
(or health impacts as previous step to valuating them) into their present value. 
The underlying rationale is that the value placed on outcomes 
declines as they occur further in the future. 
Therefore, future costs and effects are converted into present-value terms 
to make them comparable over time [@Attema2018_p]. 

Discounting is implemented by selecting a discount rate, 
which is used to compute a discount factor for each time period. 
This factor is then multiplied by the corresponding future cost (or effect) 
to express it in present-value terms. 

If you need the discounted values of a cost or health outcome, 
you can call the `healthiar` function `discount()`. 
If you just need the discount factor, you can alternatively call `get_discount_factor()` (entering `is_deflation = TRUE`).
If you just need the inflation factor, you can `get_inflation_factor()`. 


Different functional forms can be used to apply discounting. 
The most common is exponential discounting, also referred to as constant discounting,
since outcomes are discounted proportionally as time increases. 
An alternative is hyperbolic discounting, 
which tends to better capture human behavior by discounting the near present 
more heavily than outcomes further in the future [@Lipman2024_jru] 

See below the equations that are used behind these functions. 


$$monetized\_impact = impact \times valuation \times discount\_factor \times deflator\_factor \times real\_growth\_factor$$
The arguments `discount_factor`, `deflator_factor` and `real_growth_factor` 
are only used if a value is entered in the arguments 
`discount_rate`, `ingflation_rate` and `real_growth_rate` respectively  
(otherwise ignored).


#### Discount factor

*Exponential discounting*

As suggested by @Frederick2002_jel

$$discount\_factor = \frac{1}{(1 + discount\_rate)^{n\_years}}$$

*Hyperbolic discounting Harvey*

As suggested by @Harvey1986_ms

$$discount\_factor = \frac{1}{(1 + n\_years)^{discount\_rate}}$$

*Hyperbolic discounting Mazur*

As suggested by @Mazur1987_book

$$discount\_factor = \frac{1}{1 + (discount\_rate \times n\_years)}$$

#### Deflation factor

Inflation can be handled in `monetize()` by applying a deflator on future values, 
projected in nominal terms, in order to convert them into real values 
(i.e., express these values in terms of constant prices of a single base year). 
Therefore, if the user of the function provides a value for the `inflation_rate` argument,
a deflator factor [@HMTreasury2022_greenbook; @Brealey2023_book; @Samuelson1937_res]
is applied according to the following formulas 

$$deflator\_factor = \frac{1}{(1 + inflation\_rate)^{n\_years}}$$



#### Real valuation growth
If a rising societal value of health over time is required in the monetization,
unlike inflation, this represents a "real" increase in value. 
Thus, as societies become wealthier, their willingness to pay
to avoid mortality and morbidity risks tends to rise [@OECD2012_book].

For this use case, you have enter a value in the argument `real_growth_rate` in `monetize()`,
which allows you to project this growth by applying a valuation growth factor to base-year unit values:

$$real\_growth\_factor =(1 + real\_growth\_rate)^{n\_years}$$

Where $real\_growth\_rate$ represents the annual real growth rate in health valuation. 
This ensures that long-term environmental impacts are not undervalued. 





### Function call

```{r}
monetized_pm_copd <- monetize(
    output_attribute = results_pm_copd,
    discount_shape = "exponential",
    discount_rate = 0.03,
    n_years = 5,
    valuation = 50000 # E.g. EURO
)
```


### Main results

The outcome of the monetization is added to the variable entered to the `output_attribute` argument, which is `results_pm_copd` in our case.

Two folders are added:

-   `monetization_main` contains the central monetization estimate and the corresponding 95% confidence intervals obtained through the specified monetization.

-   `monetization_detailed` contains the monetized results for each unique combination of the input variable estimates that were provided to the initial `attribute_health()` call.

```{r eval=FALSE, include=FALSE, echo=TRUE}
monetized_pm_copd$monetization_main
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
monetized_pm_copd$monetization_main |> 
  select(erf_ci, monetized_impact) |> 
  knitr::kable()
```

We see that the monetized impact (discounted) is more than 160 million EURO.

Alternatively, you can also monetize (attributable) health impacts from a non-`healthiar` source.

```{r}
results <- monetize(
  impact = 1151,
  valuation = 100
)
```

## Cost-benefit analysis

### Goal (e.g.)

E.g., to perform an economic evaluation for an intervention 
by comparing its benefits and costs via Cost-Benefit Analysis (CBA).

### Methodology

The CBA is a type of economic evaluation that compares 
the costs and the benefits of an intervention, 
considering both measures expressed in monetary terms.

To perform a CBA, you can use the function `cba()`. 
This approach requires monetizing benefits so they can be directly compared with costs. 
Since interventions typically generate costs and benefits over multi-year time horizons,
discounting is a common practice to obtain the present value of future costs and benefits.
Depending on the reference guidelines, 
the discount rate can be specified as the same for costs and benefits or different across them.
The outputs of a Cost-Benefit Analysis can be expressed 
as three main indicators [@Boardman2018_book]: 
  - intervention’s net benefit: the difference between monetized benefits and costs
  - Cost-Benefit Ratio (CBR): monetized benefits divided by costs and 
  - Return on Investment (ROI): return generated per unit of expenditure by relating net benefits to the intervention’s costs.
  
An intervention is recommended from a Cost-Benefit Analysis perspective, 
if it yields a positive net benefit or a positive ROI, or equivalently, 
a CBR greater than one, meaning that the intervention’s monetized benefits exceed its costs.
These three outputs are available when running `cba()` 
and are calculated considering the following formulas.

**Net Benefit**
$$net\_benefit = benefit - cost$$

**Cost-Benefit Ratio (CBR)**
$$cbr = \frac{benefit}{cost}$$

**Return on Investment (ROI)**
$$roi = \frac{benefit - cost}{cost} \times 100$$





### Function call

Let's imagine we design a policy that would reduce air pollution to 5 $\mu g/m^3$, which is the concentration specified in the `cutoff_central` argument in the initial `attribute_health()` call. So we could avoid all COPD cases attributed to air pollution.

Considering the cost to implement the policy (estimated at 100 million EURO), what would be the monetary net benefit of such a policy? We can find out using the functions `healthiar` and `cba()` 

```{r}
cba <- cba(
    output_attribute = results_pm_copd,
    valuation = 50000,
    cost = 100000000,
    discount_shape = "exponential",
    discount_rate_benefit = 0.03,
    discount_rate_cost = 0.03,
    n_years_benefit = 5,
    n_years_cost = 5
)
```

### Main results
The outcome of the CBA is contained in two folders, which are added to the existing assessment:

-   `cba_main` contains the central estimate and the corresponding 95% confidence intervals obtained

-   `cba_detailed` contains additional intermediate results for both cost and benefit

    -   `benefit` contains results `by_year` and raw results `health_raw`

    -   `cost` contains the costs of the policy at the end of the period specified in the `n_years_cost` argument
    
```{r}
cba$cba_main |>  
  dplyr::select(benefit, cost, net_benefit) |> 
  knitr::kable()
```

```{r, echo=FALSE, eval=TRUE, include=FALSE}
rm(cba)
```

We see that the central and upper 95% confidence interval estimates of avoided attributable COPD cases result in a net monetary benefit of the policy, while the lower 95% confidence interval estimate results in a net cost!

# Social aspects

## Health impact attributable to social indicator

### Goal

E.g., to estimate the health impact that is theoretically attributable to the difference in degree of deprivation of the population exposed.

### Methodology

Taking into account socio-economic indicators, e.g. a multiple deprivation index [@Mogin2025_ejph], the differences in attributable health impacts
across the study areas 
can be estimated [@Renard2019_bmc; @Otavova_2022_bmc]. 

Social inequalities are quantified as the difference between 
the least deprived areas (the last n-quantile) and  

- the most deprived areas or 

- the population overall.


These differences can be

- absolute or 

- relative.

#### Difference most deprived vs. least deprived

$$ absolute\_quantile = first - last $$
Where:

* $absolute\_quantile$ = Absolute difference between quantiles.
* $first$ = Average health impacts in _most_ deprived quantile.
* $last$ = Average health impacts in _least_ deprived quantile.

$$ relative\_quantile = \frac{absolute\_quantile}{last} $$

#### Difference overall vs. least deprived

$$ absolute\_overall = overall - last $$
Where:

* $absolute\_overall$ = Absolute difference regarding the overall average.
* $overall$ = _Overall_ average health impacts in the study area.
* $last$ = Average health impacts in _least_ deprived quantile.



If you assume that the least deprived areas are similar to counter-factual cases (no exposure to deprivation), 
the relative difference regarding the overall average health impact 
could be interpreted as some kind of relative risk attributable to social inequalities.


### Function call

First, quantify health impacts.

```{r eval=TRUE, include=TRUE, echo=TRUE}
 health_impact <- healthiar::attribute_health(
   age_group = exdat_socialize$age_group,
   exp_central = exdat_socialize$pm25_mean,
   cutoff_central = 0,
   rr_central = exdat_socialize$rr,
   erf_shape = "log_linear",
   rr_increment = 10,
   bhd_central = exdat_socialize$mortality,
   population = exdat_socialize$population,
   geo_id_micro = exdat_socialize$geo_unit)
```

Second, use the function `socialize()` entering the whole output of `attribute_health()` in the argument `output_attribute`. 

```{r eval=TRUE, include=TRUE, echo=TRUE}
social_t <- healthiar::socialize(
  output_attribute = health_impact,
  age_group = exdat_socialize$age_group, # They have to be the same in socialize() and in attribute_health()
  ref_prop_pop = exdat_socialize$ref_prop_pop, # Population already provided in output_attribute
  geo_id_micro = exdat_socialize$geo_unit,
  social_indicator = exdat_socialize$score,
  n_quantile = 10,
  increasing_deprivation = TRUE)

```

Alternatively, you can directly enter the health impact in the `socialize()` argument `impact`.

```{r eval=TRUE, include=TRUE, echo=TRUE}
social <- healthiar::socialize(
  impact = health_impact$health_detailed$results_by_age_group$impact,
  age_group = exdat_socialize$age_group, # They have to be the same in socialize() and in attribute_health()
  ref_prop_pop = exdat_socialize$ref_prop_pop,
  geo_id_micro = exdat_socialize$geo_unit,
  social_indicator = exdat_socialize$score,
  population = exdat_socialize$population, # Population has to be provided because no output_attribute
  n_quantile = 10,
  increasing_deprivation = TRUE)

```


### Main results
```{r echo=FALSE}
social$social_main |> 
  dplyr::select(c(parameter, difference_type, 
                  difference_compared_with, difference_value, 
                  comment)) |>
  print()
```



## Multiple deprivation index

### Goal 

E.g., to estimate the multiple deprivation index (MDI) to use it for the argument `social_indicator` in the function `socialize()`.

### Methodology

Socio-economic indicators (e.g., education level, employment status and family structure) can be condensed into a multiple deprivation index (MDI) [@Mogin2025_ejph]. For this purpose, the indicators can be normalized using min-max scaling.

The reliability of the MDI can be assessed using Cronbach's alpha [@Cronbach1951_p].

$$ \alpha = \frac{k}{k - 1} \left( 1 - \frac{\sum_{i=1}^{k} \sigma^2_{y_i}}{\sigma^2_x} \right) $$
where:

* $k$ is the number of items/variables.
* $\sigma^2_{y_i}$ is the variance of the $i$-th item.
* $\sum_{i=1}^{k} \sigma^2_{y_i}$ is the sum of the variances of all items.
* $\sigma^2_x$ is the total variance of the observed total scores (the sum of all items).


To apply this approach, you should ensure that the data set is as complete as possible.
Otherwise, you can try to impute missing data using:
- Time-Based Imputation: Linear regression based on historical trends if prior years' data is complete.
- Indicator-Based Imputation: Multiple linear regression if the missing indicator correlates strongly with others.

Imputation models should have an R^2 greater than or equal to 0.7. If R^2 lower than 0.7, consider alternative data sources or methods.


### Function call

```{r eval=TRUE, include=TRUE, echo=TRUE}
mdi <- prepare_mdi(
  geo_id_micro = exdat_prepare_mdi$id,
  edu = exdat_prepare_mdi$edu,
  unemployed = exdat_prepare_mdi$unemployed,
  single_parent = exdat_prepare_mdi$single_parent,
  pop_change = exdat_prepare_mdi$pop_change,
  no_heating = exdat_prepare_mdi$no_heating,
  n_quantile = 10,
  verbose = FALSE
)
```

*Note*: `verbose = FALSE` suppresses any output to the console (default: `verbose = TRUE`, i.e. with printing turned on).

### Main results

Function output includes:

-   `mdi_main`, a tibble containing the BEST-COST MDI

```{r eval=FALSE, include=TRUE, echo=TRUE}
mdi$mdi_main |> 
  select(geo_id_micro, MDI, MDI_index)
```

```{r eval=TRUE, include=TRUE, echo=FALSE}
mdi$mdi_main |> 
  dplyr::select(geo_id_micro, MDI, MDI_index) |> 
  knitr::kable()
```


The function assesses the reliability of the MDI based on the Cronbach's alpha value as follows:
- 0.9 and higher: Excellent reliability
- between 0.8 (included) and 0.9: Good reliability
- between 0.7 (included) and 0.8: Acceptable reliability
- between 0.6 (included) and 0.7: Questionable reliability
- lower than 0.6: Poor reliability



### Detailed results

-   `mdi_detailed`

    -   DESCRIPTIVE STATISTICS

    -   PEARSON'S CORRELATION COEFFICIENTS

    -   CRONBACH'S α, including the reliability rating this value indicates

    -   Code for boxplots of the single indicators

    -   Code for histogram of the MDI's for the geo units with a normal distribution curve

To reproduce the boxlots run
```{r fig.alt="Boxplot of Normalized Indicators and MDI", eval=TRUE, include=TRUE, echo=TRUE}
eval(mdi$mdi_detailed$boxplot)
```
Analogeously, to reproduce the histogram run
```{r fig.alt="Histogram of MDI with normal curve", eval=TRUE, include=TRUE, echo=TRUE}
eval(mdi$mdi_detailed$histogram)
```


```{r, eval=TRUE, include=FALSE, echo=FALSE}
rm(mdi)
```

---------------------------------------------------------------------------

# Inside pipes

## Pipe |>

`healthiar` can be used inside the _native_ pipes `|>`. See the example below.

```{r eval=FALSE, include=TRUE, echo = TRUE}
exdat_noise |>
  (\(df) {
    healthiar::attribute_health(
      approach_risk = df$risk_estimate_type,
      exp_central = df$exposure_mean,
      pop_exp = df$exposed,
      erf_eq_central = df$erf
      )$health_main$impact_rounded
    })()
```


Shorter making used of the base R function `with()`.

```{r}
exdat_noise |>
      (\(df) {
        with(df, healthiar::attribute_health(
         approach_risk = risk_estimate_type,
         exp_central = exposure_mean,
         pop_exp = exposed,
         erf_eq_central = erf
         ))$health_main$impact_rounded
        })()
```



## Pipe %>%

`healthiar` can also be used inside _magrittr_ pipes `%>%` as follows.

```{r eval=FALSE, include=TRUE, echo = TRUE}
exdat_noise %>%
  {
    healthiar::attribute_health(
      approach_risk = .$risk_estimate_type,
      exp_central = .$exposure_mean,
      pop_exp = .$exposed,
      erf_eq_central = .$erf
    )$health_main$impact_rounded
  }
```

---------------------------------------------------------------------------

# Export and visualize

Exporting and visualizing results is out of scope of `healthiar`. To export and visualize, you can make use of existing functions in other packages beyond `healthiar` as indicated below.

## Export results

Export as `.csv` file

```{r eval=FALSE, include=FALSE, echo=TRUE}
write.csv(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.csv")
```

Save as `.Rdata` file

```{r eval=FALSE, include=FALSE, echo=TRUE}
save(results_pm_copd, file = "exported_results/results_pm_copd.Rdata")
```

Export to Excel (as `.xlsx` file)

```{r eval=FALSE, include=FALSE, echo=TRUE}
openxlsx::write.xlsx(x = results_pm_copd$health_main, file = "exported_results/results_pm_copd.xlsx")
```

## Visualize results

Visualization is out of scope of `healthiar`. You can visualize in:

-   R using base programming or packages such as `ggplot2` [@Wickham2016_ggplot],
-   Excel (export results first) or
-   Other tools.

------------------------------------------------------------------------

# Abbreviations

BHD/bhd = baseline health data 

CI = confidence interval

CBA/cba = cost-benefit analysis

exp = exposure

ERF = exposure-response function

RR/rr = relative risk

WHO = World Health Organization

YLL/yll = years of life lost

------------------------------------------------------------------------

# References