--- title: "Custom Infix Operators" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Custom Infix Operators} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Overview The ops module provides five infix operators that cover four common tasks: | Task | Operators | |---|---| | String concatenation | `%p%` | | Set membership | `%nin%` | | Case-insensitive matching | `%match%`, `%map%` | | Strict equality | `%is%` | ```{r load} library(evanverse) ``` > **Note:** All code examples in this vignette are static (`eval = FALSE`). > Output is hand-written to reflect the current implementation. If you modify > the operators, re-verify the examples manually or switch chunks to `eval = TRUE`. All operators that accept character input validate their arguments and raise an informative error for non-character, `NA`, or empty inputs. `%nin%` and `%is%` are unrestricted — they mirror base R behaviour for any type. --- ## 1 String Concatenation ### `%p%` — Paste with a space Concatenates two character vectors element-wise with a single space. Equivalent to `paste(lhs, rhs, sep = " ")` but reads more naturally in pipelines. ```{r p-basic} "Hello" %p% "world" #> [1] "Hello world" c("good", "hello") %p% c("morning", "world") #> [1] "good morning" "hello world" ``` A length-1 operand is recycled over the longer vector in the usual R fashion: ```{r p-recycle} "Gene:" %p% c("TP53", "BRCA1", "MYC") #> [1] "Gene: TP53" "Gene: BRCA1" "Gene: MYC" ``` Empty strings are valid — the space is always inserted: ```{r p-empty} "" %p% "world" #> [1] " world" ``` `NA` values and non-character inputs are rejected: ```{r p-error} "Hello" %p% NA #> Error in `%p%()`: #> ! `rhs` must be a non-empty character vector without NA values. 123 %p% "world" #> Error in `%p%()`: #> ! `lhs` must be a non-empty character vector without NA values. ``` --- ## 2 Set Membership ### `%nin%` — Not-in operator Returns `TRUE` for every element of `x` that is **not** present in `table`. A concise alternative to `!(x %in% table)`. ```{r nin-basic} c("A", "B", "C") %nin% c("B", "D") #> [1] TRUE FALSE TRUE 1:5 %nin% c(2, 4) #> [1] TRUE FALSE TRUE FALSE TRUE ``` `%nin%` mirrors `%in%` exactly — it accepts any type and follows base R semantics for `NA` and type coercion: ```{r nin-na} # NA matches NA in the table NA %nin% c(NA, 1) #> [1] FALSE # NA does not match non-NA elements NA %nin% c(1, 2) #> [1] TRUE ``` R coerces types before comparing, so character strings can match numeric values by their printed representation: ```{r nin-coerce} c("1", "2") %nin% c(1, 2) #> [1] FALSE FALSE ``` Empty vectors return zero-length results without error: ```{r nin-empty} c("a", "b") %nin% character(0) # nothing to be in #> [1] TRUE TRUE character(0) %nin% c("a", "b") #> logical(0) ``` --- ## 3 Case-Insensitive Matching Both `%match%` and `%map%` lower-case both sides before comparing, so `"tp53"` and `"TP53"` are treated as the same string. Both require non-`NA`, non-empty character vectors on both sides. ### `%match%` — Return match indices Like `base::match()`, but case-insensitive. Returns an integer vector of positions; unmatched elements become `NA`. ```{r match-basic} c("tp53", "BRCA1", "egfr") %match% c("TP53", "EGFR", "MYC") #> [1] 1 NA 2 ``` When `table` contains duplicates the index of the **first** match is returned, matching base R behaviour: ```{r match-dupe-table} c("x") %match% c("X", "x", "X") #> [1] 1 ``` Duplicate elements in `x` are each matched independently: ```{r match-dupe-x} c("tp53", "tp53") %match% c("TP53", "EGFR") #> [1] 1 1 ``` Non-character inputs, `NA` values, and empty vectors are rejected on both sides: ```{r match-error} # empty x character(0) %match% c("TP53") #> Error in `%match%()`: #> ! `x` must be a non-empty character vector without NA values. # NA in x c("tp53", NA) %match% c("TP53") #> Error in `%match%()`: #> ! `x` must be a non-empty character vector without NA values. # empty table c("tp53") %match% character(0) #> Error in `%match%()`: #> ! `table` must be a non-empty character vector without NA values. # NA in table c("tp53") %match% c("TP53", NA) #> Error in `%match%()`: #> ! `table` must be a non-empty character vector without NA values. ``` --- ### `%map%` — Return a named character vector Like `%match%`, but returns a **named character vector** instead of indices. Names are the canonical entries from `table`; values are the original elements from `x`. Unmatched entries are silently dropped. Output order follows `x`. ```{r map-basic} c("tp53", "brca1", "egfr") %map% c("TP53", "EGFR", "MYC") #> TP53 EGFR #> "tp53" "egfr" ``` Output order follows `x`, not `table`: ```{r map-order} c("egfr", "tp53") %map% c("TP53", "EGFR") #> EGFR TP53 #> "egfr" "tp53" ``` Unmatched elements are dropped rather than returned as `NA`: ```{r map-drop} c("akt1", "tp53") %map% c("TP53", "EGFR") #> TP53 #> "tp53" ``` When nothing matches, an empty named character vector is returned: ```{r map-none} c("none1", "none2") %map% c("TP53", "EGFR") #> named character(0) ``` Duplicate elements in `x` that match are both retained: ```{r map-dupe} c("tp53", "tp53") %map% c("TP53", "EGFR") #> TP53 TP53 #> "tp53" "tp53" ``` The same error rules as `%match%` apply on both sides: ```{r map-error} # empty x character(0) %map% c("TP53") #> Error in `%map%()`: #> ! `x` must be a non-empty character vector without NA values. # NA in x c("tp53", NA) %map% c("TP53") #> Error in `%map%()`: #> ! `x` must be a non-empty character vector without NA values. # empty table c("tp53") %map% character(0) #> Error in `%map%()`: #> ! `table` must be a non-empty character vector without NA values. # NA in table c("tp53") %map% c("TP53", NA) #> Error in `%map%()`: #> ! `table` must be a non-empty character vector without NA values. ``` --- ## 4 Strict Equality ### `%is%` — Identical comparison Wraps `base::identical()`. Returns a single `TRUE` or `FALSE` with no tolerance for type or attribute differences. ```{r is-basic} 1:3 %is% 1:3 #> [1] TRUE "hello" %is% "hello" #> [1] TRUE list(a = 1) %is% list(a = 1) #> [1] TRUE ``` Unlike `==`, `%is%` distinguishes types, names, and storage mode: ```{r is-mismatch} 1:3 %is% c(1, 2, 3) # integer vs double #> [1] FALSE c(a = 1, b = 2) %is% c(b = 1, a = 2) # same values, different names #> [1] FALSE ``` `NULL` and `NA` variants are handled correctly: ```{r is-null-na} NULL %is% NULL #> [1] TRUE NA %is% NA #> [1] TRUE NA %is% NA_real_ # logical NA vs double NA #> [1] FALSE ``` `%is%` accepts any type — there is no input restriction. --- ## 5 A Combined Workflow The operators compose naturally in bioinformatics pipelines. The example below filters a gene table to canonical symbols, maps aliases to their official form, then labels each gene's match status. ```{r workflow} library(evanverse) canonical <- c("TP53", "BRCA1", "EGFR", "MYC", "PTEN") query <- c("tp53", "brca1", "AKT1", "egfr", "unknown") # 1. Which queries are not in the canonical set (case-insensitive)? missing_idx <- which(is.na(query %match% canonical)) query[missing_idx] #> [1] "AKT1" "unknown" # 2. Map matched queries to their canonical names query %map% canonical #> TP53 BRCA1 EGFR #> "tp53" "brca1" "egfr" # 3. Build an annotation column using %p% anno <- "Gene:" %p% canonical anno #> [1] "Gene: TP53" "Gene: BRCA1" "Gene: EGFR" "Gene: MYC" "Gene: PTEN" # 4. Check that the canonical list hasn't changed canonical %is% c("TP53", "BRCA1", "EGFR", "MYC", "PTEN") #> [1] TRUE ``` --- ## Getting Help - `?"%p%"`, `?"%nin%"`, `?"%match%"`, `?"%map%"`, `?"%is%"` - [GitHub Issues](https://github.com/evanbio/evanverse/issues)