---
title: "Custom Infix Operators"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Custom Infix Operators}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  eval     = FALSE
)
```

## Overview

The ops module provides five infix operators that cover four common tasks:

| Task | Operators |
|---|---|
| String concatenation | `%p%` |
| Set membership | `%nin%` |
| Case-insensitive matching | `%match%`, `%map%` |
| Strict equality | `%is%` |

```{r load}
library(evanverse)
```

> **Note:** All code examples in this vignette are static (`eval = FALSE`).
> Output is hand-written to reflect the current implementation. If you modify
> the operators, re-verify the examples manually or switch chunks to `eval = TRUE`.

All operators that accept character input validate their arguments and raise an
informative error for non-character, `NA`, or empty inputs. `%nin%` and `%is%`
are unrestricted — they mirror base R behaviour for any type.

---

## 1 String Concatenation

### `%p%` — Paste with a space

Concatenates two character vectors element-wise with a single space. Equivalent
to `paste(lhs, rhs, sep = " ")` but reads more naturally in pipelines.

```{r p-basic}
"Hello" %p% "world"
#> [1] "Hello world"

c("good", "hello") %p% c("morning", "world")
#> [1] "good morning" "hello world"
```

A length-1 operand is recycled over the longer vector in the usual R fashion:

```{r p-recycle}
"Gene:" %p% c("TP53", "BRCA1", "MYC")
#> [1] "Gene: TP53"  "Gene: BRCA1" "Gene: MYC"
```

Empty strings are valid — the space is always inserted:

```{r p-empty}
"" %p% "world"
#> [1] " world"
```

`NA` values and non-character inputs are rejected:

```{r p-error}
"Hello" %p% NA
#> Error in `%p%()`:
#> ! `rhs` must be a non-empty character vector without NA values.

123 %p% "world"
#> Error in `%p%()`:
#> ! `lhs` must be a non-empty character vector without NA values.
```

---

## 2 Set Membership

### `%nin%` — Not-in operator

Returns `TRUE` for every element of `x` that is **not** present in `table`.
A concise alternative to `!(x %in% table)`.

```{r nin-basic}
c("A", "B", "C") %nin% c("B", "D")
#> [1]  TRUE FALSE  TRUE

1:5 %nin% c(2, 4)
#> [1]  TRUE FALSE  TRUE FALSE  TRUE
```

`%nin%` mirrors `%in%` exactly — it accepts any type and follows base R
semantics for `NA` and type coercion:

```{r nin-na}
# NA matches NA in the table
NA %nin% c(NA, 1)
#> [1] FALSE

# NA does not match non-NA elements
NA %nin% c(1, 2)
#> [1] TRUE
```

R coerces types before comparing, so character strings can match numeric values
by their printed representation:

```{r nin-coerce}
c("1", "2") %nin% c(1, 2)
#> [1] FALSE FALSE
```

Empty vectors return zero-length results without error:

```{r nin-empty}
c("a", "b") %nin% character(0)   # nothing to be in
#> [1] TRUE TRUE

character(0) %nin% c("a", "b")
#> logical(0)
```

---

## 3 Case-Insensitive Matching

Both `%match%` and `%map%` lower-case both sides before comparing, so `"tp53"`
and `"TP53"` are treated as the same string. Both require non-`NA`, non-empty
character vectors on both sides.

### `%match%` — Return match indices

Like `base::match()`, but case-insensitive. Returns an integer vector of
positions; unmatched elements become `NA`.

```{r match-basic}
c("tp53", "BRCA1", "egfr") %match% c("TP53", "EGFR", "MYC")
#> [1]  1 NA  2
```

When `table` contains duplicates the index of the **first** match is returned,
matching base R behaviour:

```{r match-dupe-table}
c("x") %match% c("X", "x", "X")
#> [1] 1
```

Duplicate elements in `x` are each matched independently:

```{r match-dupe-x}
c("tp53", "tp53") %match% c("TP53", "EGFR")
#> [1] 1 1
```

Non-character inputs, `NA` values, and empty vectors are rejected on both sides:

```{r match-error}
# empty x
character(0) %match% c("TP53")
#> Error in `%match%()`:
#> ! `x` must be a non-empty character vector without NA values.

# NA in x
c("tp53", NA) %match% c("TP53")
#> Error in `%match%()`:
#> ! `x` must be a non-empty character vector without NA values.

# empty table
c("tp53") %match% character(0)
#> Error in `%match%()`:
#> ! `table` must be a non-empty character vector without NA values.

# NA in table
c("tp53") %match% c("TP53", NA)
#> Error in `%match%()`:
#> ! `table` must be a non-empty character vector without NA values.
```

---

### `%map%` — Return a named character vector

Like `%match%`, but returns a **named character vector** instead of indices.
Names are the canonical entries from `table`; values are the original elements
from `x`. Unmatched entries are silently dropped. Output order follows `x`.

```{r map-basic}
c("tp53", "brca1", "egfr") %map% c("TP53", "EGFR", "MYC")
#>   TP53   EGFR
#> "tp53" "egfr"
```

Output order follows `x`, not `table`:

```{r map-order}
c("egfr", "tp53") %map% c("TP53", "EGFR")
#>   EGFR   TP53
#> "egfr" "tp53"
```

Unmatched elements are dropped rather than returned as `NA`:

```{r map-drop}
c("akt1", "tp53") %map% c("TP53", "EGFR")
#>   TP53
#> "tp53"
```

When nothing matches, an empty named character vector is returned:

```{r map-none}
c("none1", "none2") %map% c("TP53", "EGFR")
#> named character(0)
```

Duplicate elements in `x` that match are both retained:

```{r map-dupe}
c("tp53", "tp53") %map% c("TP53", "EGFR")
#>   TP53   TP53
#> "tp53" "tp53"
```

The same error rules as `%match%` apply on both sides:

```{r map-error}
# empty x
character(0) %map% c("TP53")
#> Error in `%map%()`:
#> ! `x` must be a non-empty character vector without NA values.

# NA in x
c("tp53", NA) %map% c("TP53")
#> Error in `%map%()`:
#> ! `x` must be a non-empty character vector without NA values.

# empty table
c("tp53") %map% character(0)
#> Error in `%map%()`:
#> ! `table` must be a non-empty character vector without NA values.

# NA in table
c("tp53") %map% c("TP53", NA)
#> Error in `%map%()`:
#> ! `table` must be a non-empty character vector without NA values.
```

---

## 4 Strict Equality

### `%is%` — Identical comparison

Wraps `base::identical()`. Returns a single `TRUE` or `FALSE` with no
tolerance for type or attribute differences.

```{r is-basic}
1:3 %is% 1:3
#> [1] TRUE

"hello" %is% "hello"
#> [1] TRUE

list(a = 1) %is% list(a = 1)
#> [1] TRUE
```

Unlike `==`, `%is%` distinguishes types, names, and storage mode:

```{r is-mismatch}
1:3 %is% c(1, 2, 3)          # integer vs double
#> [1] FALSE

c(a = 1, b = 2) %is% c(b = 1, a = 2)   # same values, different names
#> [1] FALSE
```

`NULL` and `NA` variants are handled correctly:

```{r is-null-na}
NULL %is% NULL
#> [1] TRUE

NA %is% NA
#> [1] TRUE

NA %is% NA_real_   # logical NA vs double NA
#> [1] FALSE
```

`%is%` accepts any type — there is no input restriction.

---

## 5 A Combined Workflow

The operators compose naturally in bioinformatics pipelines. The example below
filters a gene table to canonical symbols, maps aliases to their official form,
then labels each gene's match status.

```{r workflow}
library(evanverse)

canonical <- c("TP53", "BRCA1", "EGFR", "MYC", "PTEN")
query     <- c("tp53", "brca1", "AKT1", "egfr", "unknown")

# 1. Which queries are not in the canonical set (case-insensitive)?
missing_idx <- which(is.na(query %match% canonical))
query[missing_idx]
#> [1] "AKT1"    "unknown"

# 2. Map matched queries to their canonical names
query %map% canonical
#>    TP53   BRCA1    EGFR
#> "tp53" "brca1"  "egfr"

# 3. Build an annotation column using %p%
anno <- "Gene:" %p% canonical
anno
#> [1] "Gene: TP53"  "Gene: BRCA1" "Gene: EGFR"  "Gene: MYC"   "Gene: PTEN"

# 4. Check that the canonical list hasn't changed
canonical %is% c("TP53", "BRCA1", "EGFR", "MYC", "PTEN")
#> [1] TRUE
```

---

## Getting Help

- `?"%p%"`, `?"%nin%"`, `?"%match%"`, `?"%map%"`, `?"%is%"`
- [GitHub Issues](https://github.com/evanbio/evanverse/issues)