Read the phenotype table and the preprocessed beta, M-value, and copy-number matrices; align them by sample identifier; split them by timepoint; prepare combined longitudinal objects; and build Clock Foundation export tables. The function returns a structured in-memory result, while legacy files are written only when saveOutputs = TRUE.

preprocessingPheno(
  phenoFile = "data/preprocessingMinfiEwasWater/phenoLC.csv",
  sepType = "",
  betaPath =
    "rData/preprocessingMinfiEwasWater/metrics/beta_NomFilt_MSetF_Flt_Rxy_Ds_Rc.RData",
  mPath = "rData/preprocessingMinfiEwasWater/metrics/m_NomFilt_MSetF_Flt_Rxy_Ds_Rc.RData",
  cnPath =
    "rData/preprocessingMinfiEwasWater/metrics/cn_NomFilt_MSetF_Flt_Rxy_Ds_Rc.RData",
  SampleID = "Sample_Name",
  timeVar = "Timepoint",
  timepoints = "1,2",
  combineTimepoints = "1,2",
  outputPheno = "data/preprocessingPheno",
  outputRData = "rData/preprocessingPheno/metrics",
  outputRDataMerge = "rData/preprocessingPheno/mergeData",
  sexColumn = "Sex",
  outputLogs = "logs",
  outputDir = "data/preprocessingPheno",
  verbose = FALSE,
  logs = FALSE,
  saveOutputs = FALSE
)

Arguments

phenoFile

Character. Path to the phenotype CSV file.

sepType

Character. Field separator used in phenoFile. Use "" for a comma-separated file, "\\t" for a tab-delimited file, or another separator accepted by utils::read.csv().

betaPath

Character. Path to the saved beta-value object. Both .RData and .rds files are supported.

mPath

Character. Path to the saved M-value object. Both .RData and .rds files are supported.

cnPath

Character. Path to the saved copy-number object. Both .RData and .rds files are supported.

SampleID

Character. Name of the phenotype column containing sample identifiers used to align phenotype and methylation data.

timeVar

Character. Name of the phenotype column containing timepoint labels.

timepoints

Character vector or comma-separated string of timepoints to retain and split into separate in-memory subsets.

combineTimepoints

Character vector or comma-separated string of timepoints to combine into the longitudinal phenotype-plus-beta object.

outputPheno

Character. Directory used for saved phenotype CSV files when saveOutputs = TRUE.

outputRData

Character. Directory used for saved metric .RData files when saveOutputs = TRUE.

outputRDataMerge

Character. Directory used for saved merged phenotype-plus-beta .RData files when saveOutputs = TRUE.

sexColumn

Character. Name of the phenotype sex column used when building Clock Foundation exports.

outputLogs

Character. Directory used for log files when logs = TRUE.

outputDir

Character. Directory used for Clock Foundation export files when saveOutputs = TRUE.

verbose

Logical. If TRUE, emit progress messages with message(). The default is FALSE.

logs

Logical. If TRUE, write the same progress messages to outputLogs. The default is FALSE.

saveOutputs

Logical. If TRUE, write the legacy CSV, ZIP, and .RData outputs to disk. The default is FALSE, so the function can be used in the more traditional in-memory Bioconductor style.

Value

A list with class "dnaEPICO_preprocessingPheno".

pheno

Phenotype table read from phenoFile.

metricsData

Object returned by loadMetricsPreprocessingPheno() containing the beta-value, M-value, and copy-number matrices loaded from betaPath, mPath, and cnPath.

timepointData

Object returned by splitTimepointsPreprocessingPheno() containing per-timepoint phenotype tables and methylation matrices.

combinedData

Object returned by combineTimepointsPreprocessingPheno() containing the merged longitudinal phenotype-plus-beta object and the timepoint combination metadata.

clockFoundation

Object returned by buildClockFoundationInputsPreprocessingPheno() containing the beta table and phenotype table prepared for Clock Foundation export.

savedFiles

Object returned by writePreprocessingPhenoOutputs() when saveOutputs = TRUE, otherwise NULL.

logFile

Resolved path to the optional log file, or NULL when logging was disabled.

See dnaEPICO_preprocessingPheno for a class-level overview.

Examples

tmp <- tempdir()
pheno <- data.frame(
  Sample_Name = c("S1", "S2", "S3"),
  Timepoint = c("1", "1", "2"),
  Sex = c(0, 1, 0),
  stringsAsFactors = FALSE
)
beta <- matrix(
  c(0.10, 0.20, 0.30, 0.40, 0.50, 0.60),
  nrow = 2,
  dimnames = list(c("cg1", "cg2"), pheno$Sample_Name)
)
m <- beta * 10
cn <- beta * 100
pheno_file <- file.path(tmp, "pheno.csv")
beta_path <- file.path(tmp, "beta.RData")
m_path <- file.path(tmp, "m.RData")
cn_path <- file.path(tmp, "cn.RData")
utils::write.csv(pheno, pheno_file, row.names = FALSE)
save(beta, file = beta_path)
save(m, file = m_path)
save(cn, file = cn_path)
result <- preprocessingPheno(
  phenoFile = pheno_file,
  betaPath = beta_path,
  mPath = m_path,
  cnPath = cn_path,
  SampleID = "Sample_Name",
  timeVar = "Timepoint",
  timepoints = "1,2",
  combineTimepoints = "1,2",
  outputPheno = file.path(tmp, "data", "preprocessingPheno"),
  outputRData = file.path(tmp, "rData", "preprocessingPheno", "metrics"),
  outputRDataMerge = file.path(tmp, "rData", "preprocessingPheno", "mergeData"),
  sexColumn = "Sex",
  outputLogs = file.path(tmp, "logs"),
  outputDir = file.path(tmp, "clockFoundation"),
  saveOutputs = FALSE
)
stopifnot(inherits(result, "dnaEPICO_preprocessingPheno"))