Read the phenotype table and a saved RGChannelSet, estimate surrogate
variables from ENmix control probes, analyze their association with Sentrix
chip and position factors, and return a structured in-memory result. Legacy
CSV, .RData, text-summary, and figure outputs are written only when
saveOutputs = TRUE.
svaEnmix(
phenoFile = "data/preprocessingMinfiEwasWater/phenoLC.csv",
rgsetData = "rData/preprocessingMinfiEwasWater/objects/RGSet.RData",
sepType = "",
outputLogs = "logs",
nSamples = NA,
SampleID = "Sample_Name",
arrayType = "IlluminaHumanMethylationEPICv2",
annotationVersion = "20a1.hg38",
SentrixIDColumn = "Sentrix_ID",
SentrixPositionColumn = "Sentrix_Position",
ctrlSvaPercVar = 0.9,
ctrlSvaFlag = 1,
scriptLabel = "svaEnmix",
tiffWidth = 2000,
tiffHeight = 1000,
tiffRes = 150,
figureBaseDir = "figures",
dataBaseDir = "data",
rBaseDir = "rData",
display = FALSE,
verbose = FALSE,
logs = FALSE,
saveOutputs = FALSE
)Character. Path to the phenotype file with cell-composition data.
Character. Path to a saved RGChannelSet object. Both
.RData and .rds files are supported.
Character. Field separator used in phenoFile. Use "" for
a comma-separated file, "\\t" for a tab-delimited file, or another
separator accepted by utils::read.csv().
Character. Directory used for log files when logs = TRUE.
Integer or NA. Number of rows to keep from the phenotype
table. Use NA to keep all samples.
Character. Name of the phenotype column containing sample identifiers.
Character. Illumina array identifier assigned to
Biobase::annotation(RGSet).
Character. Annotation build assigned to
Biobase::annotation(RGSet).
Character. Name of the chip identifier column in the phenotype data.
Character. Name of the chip position column in the phenotype data.
Numeric. Proportion of control-probe variance explained
when running ENmix::ctrlsva().
Integer. Control-probe flag passed to ENmix::ctrlsva().
Character. Label used to name output folders when
saveOutputs = TRUE.
Integer. Width of saved TIFF plots in pixels.
Integer. Height of saved TIFF plots in pixels.
Integer. Resolution in DPI for saved TIFF plots.
Character. Base directory used for saved figure outputs
when saveOutputs = TRUE.
Character. Base directory used for saved CSV and text
outputs when saveOutputs = TRUE.
Character. Base directory used for saved .RData outputs
when saveOutputs = TRUE.
Logical. If TRUE, draw plots on the active graphics device.
Logical. If TRUE, emit progress messages with message().
The default is FALSE.
Logical. If TRUE, write the same progress messages to
outputLogs. The default is FALSE.
Logical. If TRUE, write the legacy CSV, .RData, text,
and TIFF outputs to disk. The default is FALSE.
A list with class "dnaEPICO_svaEnmix".
Phenotype table read from phenoFile after any optional row
subsetting.
Loaded RGChannelSet with sample names realigned to
targets[[SampleID]].
Object returned by estimateSvaEnmixControls() containing
the surrogate-variable matrix and the control-probe settings used to
estimate it.
Phenotype table returned by mergeSvaTargetsEnmix()
after the surrogate variables were appended as additional columns.
Object returned by analyzeSvaEnmix() containing the
surrogate-variable association models, ANOVA tables, and Sentrix metadata.
Named list describing the plot file paths requested for the
SVA figures. When saveOutputs = FALSE, the entries are typically NULL.
Object returned by writeSvaEnmixOutputs() when
saveOutputs = TRUE, otherwise NULL.
Resolved path to the optional log file, or NULL when
logging was disabled.
See dnaEPICO_svaEnmix for a class-level overview.
tmp <- tempdir()
stopifnot(dir.exists(tmp))
if (requireNamespace("minfiData", quietly = TRUE)) {
ex <- dnaEPICO:::exampleMinfiBaseDataDnaEpico()
pheno_file <- file.path(tmp, "pheno.csv")
rgset_path <- file.path(tmp, "RGSet.RData")
RGSet <- ex$RGSet
utils::write.csv(ex$targets, pheno_file, row.names = FALSE)
save(RGSet, file = rgset_path)
sva_result <- svaEnmix(
phenoFile = pheno_file,
rgsetData = rgset_path,
SampleID = "Sample_Name",
arrayType = "IlluminaHumanMethylation450k",
annotationVersion = "ilmn12.hg19",
SentrixIDColumn = "Sentrix_ID",
SentrixPositionColumn = "Sentrix_Position",
outputLogs = file.path(tmp, "logs"),
figureBaseDir = file.path(tmp, "figures"),
dataBaseDir = file.path(tmp, "data"),
rBaseDir = file.path(tmp, "rData"),
saveOutputs = FALSE
)
stopifnot(inherits(sva_result, "dnaEPICO_svaEnmix"))
}
#> 3 surrogate variables explain 100 % of
#> data variation
#> Warning: attempting model selection on an essentially perfect fit is nonsense
#> Warning: attempting model selection on an essentially perfect fit is nonsense
#> Warning: attempting model selection on an essentially perfect fit is nonsense
#> Warning: ANOVA F-tests on an essentially perfect fit are unreliable
#> Warning: ANOVA F-tests on an essentially perfect fit are unreliable
#> Warning: ANOVA F-tests on an essentially perfect fit are unreliable
#> Warning: ANOVA F-tests on an essentially perfect fit are unreliable
#> Warning: ANOVA F-tests on an essentially perfect fit are unreliable
#> Warning: ANOVA F-tests on an essentially perfect fit are unreliable