--- title: "Getting started with highMLR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with highMLR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Overview `highMLR` provides a single, unified interface for high-dimensional feature selection when the outcome is a (possibly censored) survival time. The same `highmlr()` call dispatches to one of several machine learning methods: * `"coxnet"` -- Cox elastic net (`glmnet`) * `"rsf"` -- random survival forest (`ranger`) * `"aorsf"` -- accelerated oblique random survival forest (`aorsf`) * `"xgboost"` -- gradient-boosted Cox (`xgboost`) * `"stability"` -- stability selection (`stabs`) * `"univariate"` -- classical univariate Cox screening * `"pseudo"` -- pseudo-observation bridge to an arbitrary regression learner * `"finegray"` -- Fine-Gray competing-risks selection All methods return a `highmlr_fit` object with a common structure, so the downstream verbs (`print()`, `summary()`, `plot()`, `coef()`, `predict()`) and the companion functions (`highmlr_compare()`, `highmlr_stability()`, `highmlr_explain()`, `highmlr_screen()`, `highmlr_report()`) work identically regardless of which method produced the fit. ## A first fit The package ships with two bundled high-dimensional survival datasets, `hnscc` and `srdata`. Both use `OS` for the survival time; the event indicator is `Death` in `hnscc` and `event` in `srdata` (1 = event, 0 = censored). ```{r, eval = FALSE} library(highMLR) data(hnscc) fit <- highmlr( hnscc, time = "OS", status = "Death", method = "coxnet", resampling = "cv", folds = 5 ) print(fit) plot(fit, top_n = 20) ``` The examples in this vignette are not evaluated at build time because the underlying learners (`glmnet`, `ranger`, `aorsf`, `xgboost`, `grf`, `survex`) can be slow on high-dimensional data. Copy the chunks into an interactive session to run them. ## Comparing methods `highmlr_compare()` runs several methods on the same data and returns a tidy side-by-side summary: ```{r, eval = FALSE} cmp <- highmlr_compare( hnscc, "OS", "Death", methods = c("coxnet", "rsf", "univariate") ) cmp$summary ``` ## Pre-screening when p is very large For very wide data, reduce the candidate set first: ```{r, eval = FALSE} data(srdata) keep <- highmlr_screen(srdata, "OS", "event", filter = "variance", keep = 500) fit <- highmlr(srdata, "OS", "event", features = keep, method = "coxnet") ``` ## Explaining a fit Time-dependent SHAP values (SurvSHAP(t)) are available via `highmlr_explain()`, and a one-file biomarker report can be generated with `highmlr_report()`. ```{r, eval = FALSE} ex <- highmlr_explain(fit, new_data = hnscc, method = "survshap") print(ex) plot(ex) ``` ## Session information ```{r} sessionInfo() ```