--- title: "Introduction to the TukeyC Package" author: "Faria, J. C., Jelihovschi, E. G., Allaman, I. B." date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to the TukeyC Package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = '#>', fig.width = 6, fig.height = 4, fig.align = 'center' ) ``` ## Overview The **TukeyC** package implements Tukey's Honestly Significant Difference (HSD) test as a multiple comparison method in the context of Analysis of Variance (ANOVA). The package follows the conventional approach widely used in experimental statistics: treatment means are compared using the Studentized range, then labelled with letters that may overlap when means are not significantly different. The result is an easy-to-read table and plot showing mean estimates, minimum significant differences (MSD), and optional matrices of pairwise differences and adjusted p-values. ```{r library} library(TukeyC) ``` --- ## 1. Quick Start - Completely Randomized Design (CRD) `CRD1` contains simulated data for a balanced CRD with **4 treatment levels** and **6 replicates** per treatment. The main function `TukeyC()` accepts a model formula, an `aov` object, or an `lm` object. The `which` argument names the factor to be compared. ```{r crd-quick} data(CRD1) tk1 <- with(CRD1, TukeyC(y ~ x, data = dfm, which = 'x')) summary(tk1) ``` The summary shows, for each level, the mean and the group letter assigned by the algorithm. Levels sharing the same letter do not differ significantly at the default 5 % level. A single call to `plot()` produces the canonical dot plot with group letters displayed above each point: ```{r crd-plot, fig.cap="CRD1: treatment means with min-max dispersion bars and TukeyC groups."} plot(tk1, dispersion = 'mm', d.col = 'steelblue') ``` --- ## 2. Accepted Input Classes `TukeyC()` dispatches on the class of its first argument. The same grouping can be obtained from a `formula`, an `aov` object, or an `lm` object. ```{r input-classes} ## From: aov av1 <- with(CRD1, aov(y ~ x, data = dfm)) tk2 <- TukeyC(av1, which = 'x') summary(tk2) ## From: lm lm1 <- with(CRD1, lm(y ~ x, data = dfm)) tk3 <- TukeyC(lm1, which = 'x') summary(tk3) ``` --- ## 3. Unbalanced Data When observations are missing, `TukeyC()` automatically adjusts the means using the Least-Squares Means methodology (via the **emmeans** package). The analysis proceeds identically to the balanced case. ```{r crd-unbalanced} ## Remove the first observation to create an unbalanced dataset u_tk1 <- with(CRD1, TukeyC(y ~ x, data = dfm[-1, ], which = 'x')) summary(u_tk1) ``` The number of replicates shown at the bottom of the plot reflects the actual (unequal) sample sizes: ```{r plot-unbal, fig.cap="CRD1 (unbalanced): adjusted means with SD bars."} plot(u_tk1, dispersion = 'sd', d.col = 'tomato') ``` --- ## 4. Randomized Complete Block Design (RCBD) `RCBD` contains simulated data for a design with **5 treatment levels** and **4 blocks**. The blocking factor `blk` is included in the formula; `which` selects the factor of interest for the comparison. ```{r rcbd} data(RCBD) tk4 <- with(RCBD, TukeyC(y ~ blk + tra, data = dfm, which = 'tra')) summary(tk4) ``` ```{r rcbd-plot, fig.cap="RCBD: treatment means with CI bars."} plot(tk4, dispersion = 'ci', d.col = 'darkgreen', d.lty = 2) ``` --- ## 5. Significance Level The default significance level is `sig.level = 0.05`. Stricter or looser levels lead to fewer or more groups, respectively. ```{r sig-level} ## alpha = 0.01 (stricter) tk_01 <- with(RCBD, TukeyC(y ~ blk + tra, data = dfm, which = 'tra', sig.level = 0.01)) ## alpha = 0.10 (looser) tk_10 <- with(RCBD, TukeyC(y ~ blk + tra, data = dfm, which = 'tra', sig.level = 0.10)) cat('--- sig.level = 0.01 ---\n') summary(tk_01) cat('--- sig.level = 0.10 ---\n') summary(tk_10) ``` --- ## 6. Factorial Experiment (FE) `FE` contains simulated data for a **3-factor factorial** design (N, P, K), each at 2 levels, in 4 blocks. `TukeyC()` supports both main-effect and nested comparisons using colon notation in `which` and the `fl1` / `fl2` arguments to select the level of the nesting factor. ```{r fe-main} data(FE) ## Main effect: factor N tk5 <- with(FE, TukeyC(y ~ blk + N*P*K, data = dfm, which = 'N')) summary(tk5) ``` ```{r fe-nested} ## Nested: levels of N within level 1 of P tk6 <- with(FE, TukeyC(y ~ blk + N*P*K, data = dfm, which = 'P:N', fl1 = 1)) summary(tk6) ## Nested: levels of N within level 2 of P tk7 <- with(FE, TukeyC(y ~ blk + N*P*K, data = dfm, which = 'P:N', fl1 = 2)) summary(tk7) ``` --- ## 7. Split-Plot Experiment (SPE) `SPE` contains simulated data for a design with **3 whole plots** (factor P) and **4 sub-plot treatments** (factor SP). When testing the whole-plot factor, the appropriate error term must be specified via the `error` argument. ```{r spe} data(SPE) ## Sub-plot factor SP (residual error, default) tk8 <- with(SPE, TukeyC(y ~ blk + P*SP + Error(blk/P), data = dfm, which = 'SP')) summary(tk8) ## Whole-plot factor P (must specify the blk:P error term) tk9 <- with(SPE, TukeyC(y ~ blk + P*SP + Error(blk/P), data = dfm, which = 'P', error = 'blk:P')) summary(tk9) ``` --- ## 8. Visualisation Options ### 8.1 Dispersion bars Four dispersion options are available for `plot.TukeyC()`: | Option | Description | |---------|--------------------------------------------| | `'mm'` | Min-max range (default) | | `'sd'` | ± 1 standard deviation | | `'ci'` | Individual 95 % confidence interval | | `'cip'` | Pooled 95 % confidence interval (uses MSE) | `CRD2` provides a more visually rich example with **45 treatment levels**: ```{r plot-crd2, fig.width=8, fig.height=5, fig.cap="CRD2: 45 treatment means with pooled CI bars."} data(CRD2) tk10 <- with(CRD2, TukeyC(y ~ x, data = dfm, which = 'x')) plot(tk10, id.las = 2, yl = FALSE, dispersion = 'cip', d.col = 'steelblue') ``` ### 8.2 Comparing all four options ```{r plot-four, fig.width=8, fig.height=7, fig.cap="The four dispersion options applied to CRD1. (A) mm: min-max range; (B) sd: standard deviation; (C) ci: individual confidence interval; (D) cip: pooled confidence interval."} op <- par(mfrow = c(2, 2), mar = c(4, 3, 4, 1)) plot(tk1, dispersion = 'mm', d.col = 'steelblue') mtext('(A)', side = 3, adj = 0, line = 2, font = 2) plot(tk1, dispersion = 'sd', d.col = 'tomato') mtext('(B)', side = 3, adj = 0, line = 2, font = 2) plot(tk1, dispersion = 'ci', d.col = 'darkgreen') mtext('(C)', side = 3, adj = 0, line = 2, font = 2) plot(tk1, dispersion = 'cip', d.col = 'purple') mtext('(D)', side = 3, adj = 0, line = 2, font = 2) par(op) ``` ### 8.3 Boxplot `boxplot.TukeyC()` extends the standard boxplot by overlaying the TukeyC group letters above the frame and drawing the treatment mean inside each box. ```{r boxplot, fig.cap="CRD1: boxplot with TukeyC group labels and means (red line)."} ## boxplot.TukeyC re-evaluates the data argument from the original call; ## pass CRD1$dfm directly so it is findable in any environment. tk1_bp <- TukeyC(y ~ x, data = CRD1$dfm, which = 'x') boxplot(tk1_bp, mean.col = 'red', mean.lwd = 2, args.legend = list(x = 'topright')) ``` --- ## 9. Tabular Output `xtable()` converts a `TukeyC` result to an `xtable` object for inclusion in LaTeX or HTML documents. ```{r xtable, results='asis'} library(xtable) tb <- xtable(tk4, caption = 'RCBD: Tukey HSD grouping of treatment means.', digits = 3) print(tb, type = 'html', html.table.attributes = 'border="1" style="border-collapse:collapse; padding:4px;"', caption.placement = 'top', include.rownames = FALSE) ``` --- ## 10. Mixed Models with lme4 `TukeyC()` also accepts `lmerMod` objects from the **lme4** package, useful when random effects need to be modelled explicitly. ```{r lmer, eval=requireNamespace('lme4', quietly=TRUE)} library(lme4) data(RCBD) lmer1 <- with(RCBD, lmer(y ~ (1|blk) + tra, data = dfm)) tk11 <- TukeyC(lmer1, which = 'tra') summary(tk11) ``` --- ## References Tukey, J. W. (1949). Comparing individual means in the analysis of variance. *Biometrics*, **5**(2), 99--114. doi:10.2307/3001913 Miller, R. G. (1981). *Simultaneous Statistical Inference*. Springer.