Run the dnaEPICO preprocessing workflow as a convenience wrapper around the smaller minfi/ENmix/wateRmelon helper functions in this package. The wrapper now returns a structured result object containing the in-memory outputs from each stage. Legacy files are written only when saveOutputs = TRUE.

preprocessingMinfiEwasWater(
  phenoFile = "data/preprocessingMinfiEwasWater/pheno.csv",
  idatFolder = "data/preprocessingMinfiEwasWater/idats",
  outputLogs = "logs",
  nSamples = NA,
  SampleID = "Sample_Name",
  arrayType = "IlluminaHumanMethylationEPICv2",
  annotationVersion = "20a1.hg38",
  scriptLabel = "preprocessingMinfiEwasWater",
  baseDataFolder = "rData",
  figureBaseDir = "figures",
  sepType = "",
  tiffWidth = 2000,
  tiffHeight = 1000,
  tiffRes = 150,
  qcCutoff = 10.5,
  detPtype = "m+u",
  detPThreshold = 0.05,
  normMethods = "adjustedfunnorm",
  sexColumn = "Sex",
  pvalThreshold = 0.01,
  chrToRemove = "chrX,chrY",
  snpsToRemove = "SBE,CpG",
  mafThreshold = 0.1,
  crossReactivePath = "data/preprocessingMinfiEwasWater/12864_2024_10027_MOESM8_ESM.csv",
  plotGroupVar = "Sex",
  lcRef = "salivaEPIC",
  phenoOrder = "Sample_Name;Timepoint;Sex;PredSex;Basename;Sentrix_ID;Sentrix_Position",
  lcPhenoDir = "data/preprocessingMinfiEwasWater",
  display = FALSE,
  verbose = FALSE,
  logs = FALSE,
  saveOutputs = FALSE
)

Arguments

phenoFile

Character. Path to the phenotype CSV file.

idatFolder

Character. Directory containing the IDAT files.

outputLogs

Character. Directory used for log files when logs = TRUE.

nSamples

Integer or NA. Number of rows to keep from the phenotype table. Use NA to keep all samples.

SampleID

Character. Name of the phenotype column containing sample identifiers.

arrayType

Character. Illumina array identifier passed to Biobase::annotation(), for example "IlluminaHumanMethylationEPICv2".

annotationVersion

Character. Annotation build passed to Biobase::annotation(), for example "20a1.hg38" or "ilmn12.hg19".

scriptLabel

Character. Label used to name output folders when saveOutputs = TRUE.

baseDataFolder

Character. Base directory used for saved .RData outputs when saveOutputs = TRUE.

figureBaseDir

Character. Base directory used for saved figure outputs when saveOutputs = TRUE.

sepType

Character. Field separator used in phenoFile. Use "" for a comma-separated file, "\\t" for a tab-delimited file, or another separator accepted by utils::read.csv().

tiffWidth

Integer. Width of saved TIFF plots in pixels.

tiffHeight

Integer. Height of saved TIFF plots in pixels.

tiffRes

Integer. Resolution in DPI for saved TIFF plots.

qcCutoff

Numeric. QC cutoff passed to minfi::plotQC().

detPtype

Character. Detection P-value mode passed to minfi::detectionP(). Common values in minfi workflows are "m+u" and "negative". The default here is "m+u".

detPThreshold

Numeric. Samples with mean detection P value above this threshold are removed.

normMethods

Character vector or semicolon-separated string of normalization methods. Supported values are "adjustedfunnorm", "funnorm", "illumina", "quantile", and "swan".

sexColumn

Character. Name of the phenotype column containing reported sex.

pvalThreshold

Numeric. Probe-level detection P-value threshold used in the probe filter.

chrToRemove

Character vector or comma-separated string of chromosome names to remove, for example "chrX,chrY".

snpsToRemove

Character vector or comma-separated string of SNP probe types to remove, for example "SBE,CpG".

mafThreshold

Numeric. Minor allele frequency threshold passed to minfi::dropLociWithSnps().

crossReactivePath

Character. Path to a CSV file containing a ProbeID column of cross-reactive probes to remove.

plotGroupVar

Character. Phenotype column used for density and MDS grouping plots.

lcRef

Character. Reference panel used for cell composition estimation. "saliva" and "salivaEPIC" use estimateLC(). Other values are passed to ENmix::estimateCellProp().

phenoOrder

Character vector or semicolon-separated string describing which phenotype columns should appear first in the merged phenoLC table.

lcPhenoDir

Character. Directory used for the saved phenoLC.csv file when saveOutputs = TRUE.

display

Logical. If TRUE, draw plots on the active graphics device.

verbose

Logical. If TRUE, emit progress messages with message(). The default is FALSE.

logs

Logical. If TRUE, write log messages to outputLogs. The default is FALSE.

saveOutputs

Logical. If TRUE, write the legacy .RData, figure, and phenoLC.csv outputs to disk. The default is FALSE, so the function can be used in the more traditional in-memory Bioconductor style.

Value

A list with class "dnaEPICO_preprocessingMinfiEwasWater".

targets

Filtered phenotype table aligned to the retained samples.

RGSet

Filtered RGChannelSet used in downstream preprocessing and available for direct interactive inspection.

rawData

Object returned by buildRawMinfiEwasWater() containing the raw MSet, RatioSet, and genome-mapped object derived from RGSet.

assessment

Object returned by assessSamplesMinfiEwasWater() containing detection P values, QC summaries, and failed-sample tracking.

sexData

Object returned by predictSexMinfiEwasWater() containing predicted sex labels, mismatch summaries, and plotting data.

normData

Object returned by normalizeMinfiEwasWater() containing the requested normalized objects and metadata on the methods that were run.

filterData

Object returned by filterProbesMinfiEwasWater() containing the probe-filtered methylation objects at each filtering stage.

metricsData

Object returned by extractMetricsMinfiEwasWater() containing the beta-value, M-value, and copy-number matrices used by later workflow steps.

lcData

Object returned by estimateLCMinfiEwasWater() containing the estimated cell-type proportions and the phenotype table augmented with those proportions.

logFile

Resolved path to the optional log file, or NULL when logging was disabled.

See dnaEPICO_preprocessingMinfiEwasWater for a class-level overview.

Examples

if (requireNamespace("minfiData", quietly = TRUE) &&
    requireNamespace("IlluminaHumanMethylation450kmanifest", quietly = TRUE) &&
    requireNamespace("IlluminaHumanMethylation450kanno.ilmn12.hg19", quietly = TRUE)) {
  ex <- dnaEPICO:::exampleMinfiIdatInputsDnaEpico(n = 4)
  result <- preprocessingMinfiEwasWater(
    phenoFile = ex$phenoFile,
    idatFolder = ex$idatFolder,
    outputLogs = file.path(ex$tempDir, "logs"),
    nSamples = 4,
    SampleID = "Sample_Name",
    arrayType = ex$arrayType,
    annotationVersion = ex$annotationVersion,
    scriptLabel = "preprocessingMinfiEwasWater",
    baseDataFolder = file.path(ex$tempDir, "rData"),
    figureBaseDir = file.path(ex$tempDir, "figures"),
    detPThreshold = 1,
    normMethods = "quantile",
    sexColumn = "Sex",
    pvalThreshold = 1,
    chrToRemove = "",
    snpsToRemove = "SBE",
    mafThreshold = 1,
    crossReactivePath = ex$crossReactivePath,
    plotGroupVar = "Sex",
    lcRef = "saliva",
    phenoOrder = "Sample_Name;Sex;Basename;Sentrix_ID;Sentrix_Position",
    lcPhenoDir = ex$tempDir,
    saveOutputs = FALSE,
    verbose = FALSE,
    logs = FALSE
  )
  inherits(result, "dnaEPICO_preprocessingMinfiEwasWater")
}
#> [preprocessQuantile] Mapping to genome.
#> [preprocessQuantile] Fixing outliers.
#> [preprocessQuantile] Quantile normalizing.
#> [1] TRUE