Read the phenotype table and a saved RGChannelSet, estimate surrogate variables from ENmix control probes, analyze their association with Sentrix chip and position factors, and return a structured in-memory result. Legacy CSV, .RData, text-summary, and figure outputs are written only when saveOutputs = TRUE.

svaEnmix(
  phenoFile = "data/preprocessingMinfiEwasWater/phenoLC.csv",
  rgsetData = "rData/preprocessingMinfiEwasWater/objects/RGSet.RData",
  sepType = "",
  outputLogs = "logs",
  nSamples = NA,
  SampleID = "Sample_Name",
  arrayType = "IlluminaHumanMethylationEPICv2",
  annotationVersion = "20a1.hg38",
  SentrixIDColumn = "Sentrix_ID",
  SentrixPositionColumn = "Sentrix_Position",
  ctrlSvaPercVar = 0.9,
  ctrlSvaFlag = 1,
  scriptLabel = "svaEnmix",
  tiffWidth = 2000,
  tiffHeight = 1000,
  tiffRes = 150,
  figureBaseDir = "figures",
  dataBaseDir = "data",
  rBaseDir = "rData",
  display = FALSE,
  verbose = FALSE,
  logs = FALSE,
  saveOutputs = FALSE
)

Arguments

phenoFile

Character. Path to the phenotype file with cell-composition data.

rgsetData

Character. Path to a saved RGChannelSet object. Both .RData and .rds files are supported.

sepType

Character. Field separator used in phenoFile. Use "" for a comma-separated file, "\\t" for a tab-delimited file, or another separator accepted by utils::read.csv().

outputLogs

Character. Directory used for log files when logs = TRUE.

nSamples

Integer or NA. Number of rows to keep from the phenotype table. Use NA to keep all samples.

SampleID

Character. Name of the phenotype column containing sample identifiers.

arrayType

Character. Illumina array identifier assigned to Biobase::annotation(RGSet).

annotationVersion

Character. Annotation build assigned to Biobase::annotation(RGSet).

SentrixIDColumn

Character. Name of the chip identifier column in the phenotype data.

SentrixPositionColumn

Character. Name of the chip position column in the phenotype data.

ctrlSvaPercVar

Numeric. Proportion of control-probe variance explained when running ENmix::ctrlsva().

ctrlSvaFlag

Integer. Control-probe flag passed to ENmix::ctrlsva().

scriptLabel

Character. Label used to name output folders when saveOutputs = TRUE.

tiffWidth

Integer. Width of saved TIFF plots in pixels.

tiffHeight

Integer. Height of saved TIFF plots in pixels.

tiffRes

Integer. Resolution in DPI for saved TIFF plots.

figureBaseDir

Character. Base directory used for saved figure outputs when saveOutputs = TRUE.

dataBaseDir

Character. Base directory used for saved CSV and text outputs when saveOutputs = TRUE.

rBaseDir

Character. Base directory used for saved .RData outputs when saveOutputs = TRUE.

display

Logical. If TRUE, draw plots on the active graphics device.

verbose

Logical. If TRUE, emit progress messages with message(). The default is FALSE.

logs

Logical. If TRUE, write the same progress messages to outputLogs. The default is FALSE.

saveOutputs

Logical. If TRUE, write the legacy CSV, .RData, text, and TIFF outputs to disk. The default is FALSE.

Value

A list with class "dnaEPICO_svaEnmix".

targets

Phenotype table read from phenoFile after any optional row subsetting.

RGSet

Loaded RGChannelSet with sample names realigned to targets[[SampleID]].

svaData

Object returned by estimateSvaEnmixControls() containing the surrogate-variable matrix and the control-probe settings used to estimate it.

mergedPheno

Phenotype table returned by mergeSvaTargetsEnmix() after the surrogate variables were appended as additional columns.

analysisData

Object returned by analyzeSvaEnmix() containing the surrogate-variable association models, ANOVA tables, and Sentrix metadata.

plotFiles

Named list describing the plot file paths requested for the SVA figures. When saveOutputs = FALSE, the entries are typically NULL.

savedFiles

Object returned by writeSvaEnmixOutputs() when saveOutputs = TRUE, otherwise NULL.

logFile

Resolved path to the optional log file, or NULL when logging was disabled.

See dnaEPICO_svaEnmix for a class-level overview.

Examples

tmp <- tempdir()
stopifnot(dir.exists(tmp))

if (requireNamespace("minfiData", quietly = TRUE)) {
  ex <- dnaEPICO:::exampleMinfiBaseDataDnaEpico()
  pheno_file <- file.path(tmp, "pheno.csv")
  rgset_path <- file.path(tmp, "RGSet.RData")
  RGSet <- ex$RGSet
  utils::write.csv(ex$targets, pheno_file, row.names = FALSE)
  save(RGSet, file = rgset_path)
  sva_result <- svaEnmix(
    phenoFile = pheno_file,
    rgsetData = rgset_path,
    SampleID = "Sample_Name",
    arrayType = "IlluminaHumanMethylation450k",
    annotationVersion = "ilmn12.hg19",
    SentrixIDColumn = "Sentrix_ID",
    SentrixPositionColumn = "Sentrix_Position",
    outputLogs = file.path(tmp, "logs"),
    figureBaseDir = file.path(tmp, "figures"),
    dataBaseDir = file.path(tmp, "data"),
    rBaseDir = file.path(tmp, "rData"),
    saveOutputs = FALSE
  )
  stopifnot(inherits(sva_result, "dnaEPICO_svaEnmix"))
}
#> 3  surrogate variables explain  100 % of 
#>     data variation
#> Warning: attempting model selection on an essentially perfect fit is nonsense
#> Warning: attempting model selection on an essentially perfect fit is nonsense
#> Warning: attempting model selection on an essentially perfect fit is nonsense
#> Warning: ANOVA F-tests on an essentially perfect fit are unreliable
#> Warning: ANOVA F-tests on an essentially perfect fit are unreliable
#> Warning: ANOVA F-tests on an essentially perfect fit are unreliable
#> Warning: ANOVA F-tests on an essentially perfect fit are unreliable
#> Warning: ANOVA F-tests on an essentially perfect fit are unreliable
#> Warning: ANOVA F-tests on an essentially perfect fit are unreliable