---
title: "Frequency tables and cross-tabulations in R"
description: >
  Create frequency tables and cross-tabulations with chi-squared tests,
  effect sizes, and weighted counts in R using the spicy package.
  Covers sorting, percentages, grouping, and labelled data.
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Frequency tables and cross-tabulations in R}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(spicy)
```

`freq()` and `cross_tab()` are the core tabulation functions in spicy.
They handle factors, labelled variables (from haven or labelled),
weights, and missing values out of the box. This vignette covers the
main options using the bundled `sochealth` dataset.

## Frequency tables with freq()

### Basic usage

Pass a data frame and a variable name to get counts and percentages:

```{r freq-basic}
freq(sochealth, education)
```

### Sorting

Sort by frequency with `sort = "-"` (decreasing) or `sort = "+"`
(increasing). Sort alphabetically with `sort = "name+"` or
`sort = "name-"`:

```{r freq-sort}
freq(sochealth, education, sort = "-")
```

Sort alphabetically:

```{r freq-sort-name}
freq(sochealth, education, sort = "name+")
```

### Cumulative percentages

Add cumulative columns with `cum = TRUE`:

```{r freq-cum}
freq(sochealth, smoking, cum = TRUE)
```

### Weighted frequencies

Supply a weight variable with `weights`. By default, `rescale = TRUE`
adjusts the weighted total to match the unweighted sample size:

```{r freq-weighted}
freq(sochealth, education, weights = weight)
```

Set `rescale = FALSE` to keep the raw weighted counts:

```{r freq-weighted-raw}
freq(sochealth, education, weights = weight, rescale = FALSE)
```

### Labelled variables

When a variable has value labels (e.g., imported from SPSS or Stata
with haven), `freq()` shows them by default with the `[code] label`
format. Control this with `labelled_levels`:

```{r freq-labelled}
# Create a labelled version of the smoking variable
sh <- sochealth
sh$smoking_lbl <- labelled::labelled(
  ifelse(sh$smoking == "Yes", 1L, 0L),
  labels = c("Non-smoker" = 0L, "Current smoker" = 1L)
)

# Default: [code] label
freq(sh, smoking_lbl)

# Labels only (no codes)
freq(sh, smoking_lbl, labelled_levels = "labels")

# Codes only (no labels)
freq(sh, smoking_lbl, labelled_levels = "values")
```

### Custom missing values

Treat specific values as missing with `na_val`:

```{r freq-naval}
freq(sochealth, income_group, na_val = "High")
```

## Cross-tabulations with cross_tab()

### Basic two-way table

Cross two variables to get a contingency table with a chi-squared test
and effect size:

```{r cross-basic}
cross_tab(sochealth, smoking, education)
```

### Row and column percentages

Use `percent = "row"` or `percent = "col"` to display percentages
instead of raw counts:

```{r cross-col}
cross_tab(sochealth, smoking, education, percent = "col")
```

```{r cross-row}
cross_tab(sochealth, smoking, education, percent = "row")
```

### Grouping with by

Stratify the table by a third variable:

```{r cross-by}
cross_tab(sochealth, smoking, education, by = sex)
```

For more than one grouping variable, use `interaction()`:

```{r cross-by-interaction}
cross_tab(sochealth, smoking, education,
          by = interaction(sex, age_group))
```

### Ordinal variables

When both variables are ordered factors, `cross_tab()` automatically
switches from Cramer's V to Kendall's Tau-b:

```{r cross-ordinal}
cross_tab(sochealth, self_rated_health, education)
```

You can override the automatic selection with `assoc_measure`:

```{r cross-assoc}
cross_tab(sochealth, self_rated_health, education, assoc_measure = "gamma")
```

### Confidence intervals for effect sizes

Add a 95% confidence interval for the association measure with
`assoc_ci = TRUE`:

```{r cross-ci}
cross_tab(sochealth, smoking, education, assoc_ci = TRUE)
```

### Weighted cross-tabulations

Weights work the same as in `freq()`. Without rescaling, the table
shows raw weighted counts:

```{r cross-weighted-raw}
cross_tab(sochealth, smoking, education, weights = weight)
```

With `rescale = TRUE`, the weighted total matches the unweighted
sample size:

```{r cross-weighted}
cross_tab(sochealth, smoking, education, weights = weight, rescale = TRUE)
```

### Monte Carlo simulation

When expected cell counts are small, use simulated p-values:

```{r cross-simulate}
cross_tab(sochealth, smoking, education,
          simulate_p = TRUE, simulate_B = 5000)
```

### Data frame output

Set `styled = FALSE` to get a plain data frame for further processing:

```{r cross-df}
cross_tab(sochealth, smoking, education,
          percent = "col", styled = FALSE)
```

## Setting global defaults

You can set package-wide defaults with `options()` so you don't have to
repeat arguments:

```{r options, eval = FALSE}
options(
  spicy.percent   = "column",
  spicy.simulate_p = TRUE,
  spicy.rescale   = TRUE
)
```

## Learn more

- `vignette("association-measures")` - choosing the right effect size
  for your contingency table.
- `vignette("table-categorical")` - building publication-ready
  categorical tables.
- `?freq` and `?cross_tab` for the full argument reference.