R/preprocessingMinfiEwasWater.R
preprocessingMinfiEwasWater.RdRun the dnaEPICO preprocessing workflow as a convenience wrapper around the
smaller minfi/ENmix/wateRmelon helper functions in this package. The wrapper
now returns a structured result object containing the in-memory outputs from
each stage. Legacy files are written only when saveOutputs = TRUE.
preprocessingMinfiEwasWater(
phenoFile = "data/preprocessingMinfiEwasWater/pheno.csv",
idatFolder = "data/preprocessingMinfiEwasWater/idats",
outputLogs = "logs",
nSamples = NA,
SampleID = "Sample_Name",
arrayType = "IlluminaHumanMethylationEPICv2",
annotationVersion = "20a1.hg38",
scriptLabel = "preprocessingMinfiEwasWater",
baseDataFolder = "rData",
figureBaseDir = "figures",
sepType = "",
tiffWidth = 2000,
tiffHeight = 1000,
tiffRes = 150,
qcCutoff = 10.5,
detPtype = "m+u",
detPThreshold = 0.05,
normMethods = "adjustedfunnorm",
sexColumn = "Sex",
pvalThreshold = 0.01,
chrToRemove = "chrX,chrY",
snpsToRemove = "SBE,CpG",
mafThreshold = 0.1,
crossReactivePath = "data/preprocessingMinfiEwasWater/12864_2024_10027_MOESM8_ESM.csv",
plotGroupVar = "Sex",
lcRef = "salivaEPIC",
phenoOrder = "Sample_Name;Timepoint;Sex;PredSex;Basename;Sentrix_ID;Sentrix_Position",
lcPhenoDir = "data/preprocessingMinfiEwasWater",
display = FALSE,
verbose = FALSE,
logs = FALSE,
saveOutputs = FALSE
)Character. Path to the phenotype CSV file.
Character. Directory containing the IDAT files.
Character. Directory used for log files when logs = TRUE.
Integer or NA. Number of rows to keep from the phenotype
table. Use NA to keep all samples.
Character. Name of the phenotype column containing sample identifiers.
Character. Illumina array identifier passed to
Biobase::annotation(), for example "IlluminaHumanMethylationEPICv2".
Character. Annotation build passed to
Biobase::annotation(), for example "20a1.hg38" or "ilmn12.hg19".
Character. Label used to name output folders when
saveOutputs = TRUE.
Character. Base directory used for saved .RData
outputs when saveOutputs = TRUE.
Character. Base directory used for saved figure outputs
when saveOutputs = TRUE.
Character. Field separator used in phenoFile. Use "" for
a comma-separated file, "\\t" for a tab-delimited file, or another
separator accepted by utils::read.csv().
Integer. Width of saved TIFF plots in pixels.
Integer. Height of saved TIFF plots in pixels.
Integer. Resolution in DPI for saved TIFF plots.
Numeric. QC cutoff passed to minfi::plotQC().
Character. Detection P-value mode passed to
minfi::detectionP(). Common values in minfi workflows are "m+u" and
"negative". The default here is "m+u".
Numeric. Samples with mean detection P value above this threshold are removed.
Character vector or semicolon-separated string of
normalization methods. Supported values are "adjustedfunnorm",
"funnorm", "illumina", "quantile", and "swan".
Character. Name of the phenotype column containing reported sex.
Numeric. Probe-level detection P-value threshold used in the probe filter.
Character vector or comma-separated string of chromosome
names to remove, for example "chrX,chrY".
Character vector or comma-separated string of SNP probe
types to remove, for example "SBE,CpG".
Numeric. Minor allele frequency threshold passed to
minfi::dropLociWithSnps().
Character. Path to a CSV file containing a ProbeID
column of cross-reactive probes to remove.
Character. Phenotype column used for density and MDS grouping plots.
Character. Reference panel used for cell composition estimation.
"saliva" and "salivaEPIC" use estimateLC(). Other values are passed
to ENmix::estimateCellProp().
Character vector or semicolon-separated string describing
which phenotype columns should appear first in the merged phenoLC table.
Character. Directory used for the saved phenoLC.csv file
when saveOutputs = TRUE.
Logical. If TRUE, draw plots on the active graphics device.
Logical. If TRUE, emit progress messages with message().
The default is FALSE.
Logical. If TRUE, write log messages to outputLogs. The
default is FALSE.
Logical. If TRUE, write the legacy .RData, figure, and
phenoLC.csv outputs to disk. The default is FALSE, so the function can
be used in the more traditional in-memory Bioconductor style.
A list with class "dnaEPICO_preprocessingMinfiEwasWater".
Filtered phenotype table aligned to the retained samples.
Filtered RGChannelSet used in downstream preprocessing and
available for direct interactive inspection.
Object returned by buildRawMinfiEwasWater() containing the
raw MSet, RatioSet, and genome-mapped object derived from RGSet.
Object returned by assessSamplesMinfiEwasWater()
containing detection P values, QC summaries, and failed-sample tracking.
Object returned by predictSexMinfiEwasWater() containing
predicted sex labels, mismatch summaries, and plotting data.
Object returned by normalizeMinfiEwasWater() containing
the requested normalized objects and metadata on the methods that were run.
Object returned by filterProbesMinfiEwasWater()
containing the probe-filtered methylation objects at each filtering stage.
Object returned by extractMetricsMinfiEwasWater()
containing the beta-value, M-value, and copy-number matrices used by later
workflow steps.
Object returned by estimateLCMinfiEwasWater() containing
the estimated cell-type proportions and the phenotype table augmented with
those proportions.
Resolved path to the optional log file, or NULL when
logging was disabled.
See dnaEPICO_preprocessingMinfiEwasWater for a class-level overview.
if (requireNamespace("minfiData", quietly = TRUE) &&
requireNamespace("IlluminaHumanMethylation450kmanifest", quietly = TRUE) &&
requireNamespace("IlluminaHumanMethylation450kanno.ilmn12.hg19", quietly = TRUE)) {
ex <- dnaEPICO:::exampleMinfiIdatInputsDnaEpico(n = 4)
result <- preprocessingMinfiEwasWater(
phenoFile = ex$phenoFile,
idatFolder = ex$idatFolder,
outputLogs = file.path(ex$tempDir, "logs"),
nSamples = 4,
SampleID = "Sample_Name",
arrayType = ex$arrayType,
annotationVersion = ex$annotationVersion,
scriptLabel = "preprocessingMinfiEwasWater",
baseDataFolder = file.path(ex$tempDir, "rData"),
figureBaseDir = file.path(ex$tempDir, "figures"),
detPThreshold = 1,
normMethods = "quantile",
sexColumn = "Sex",
pvalThreshold = 1,
chrToRemove = "",
snpsToRemove = "SBE",
mafThreshold = 1,
crossReactivePath = ex$crossReactivePath,
plotGroupVar = "Sex",
lcRef = "saliva",
phenoOrder = "Sample_Name;Sex;Basename;Sentrix_ID;Sentrix_Position",
lcPhenoDir = ex$tempDir,
saveOutputs = FALSE,
verbose = FALSE,
logs = FALSE
)
inherits(result, "dnaEPICO_preprocessingMinfiEwasWater")
}
#> [preprocessQuantile] Mapping to genome.
#> [preprocessQuantile] Fixing outliers.
#> [preprocessQuantile] Quantile normalizing.
#> [1] TRUE