R/preprocessingPheno.R
preprocessingPheno.RdRead the phenotype table and the preprocessed beta, M-value, and copy-number
matrices; align them by sample identifier; split them by timepoint; prepare
combined longitudinal objects; and build Clock Foundation export tables. The
function returns a structured in-memory result, while legacy files are
written only when saveOutputs = TRUE.
preprocessingPheno(
phenoFile = "data/preprocessingMinfiEwasWater/phenoLC.csv",
sepType = "",
betaPath =
"rData/preprocessingMinfiEwasWater/metrics/beta_NomFilt_MSetF_Flt_Rxy_Ds_Rc.RData",
mPath = "rData/preprocessingMinfiEwasWater/metrics/m_NomFilt_MSetF_Flt_Rxy_Ds_Rc.RData",
cnPath =
"rData/preprocessingMinfiEwasWater/metrics/cn_NomFilt_MSetF_Flt_Rxy_Ds_Rc.RData",
SampleID = "Sample_Name",
timeVar = "Timepoint",
timepoints = "1,2",
combineTimepoints = "1,2",
outputPheno = "data/preprocessingPheno",
outputRData = "rData/preprocessingPheno/metrics",
outputRDataMerge = "rData/preprocessingPheno/mergeData",
sexColumn = "Sex",
outputLogs = "logs",
outputDir = "data/preprocessingPheno",
verbose = FALSE,
logs = FALSE,
saveOutputs = FALSE
)Character. Path to the phenotype CSV file.
Character. Field separator used in phenoFile. Use "" for
a comma-separated file, "\\t" for a tab-delimited file, or another
separator accepted by utils::read.csv().
Character. Path to the saved beta-value object. Both .RData
and .rds files are supported.
Character. Path to the saved M-value object. Both .RData and
.rds files are supported.
Character. Path to the saved copy-number object. Both .RData
and .rds files are supported.
Character. Name of the phenotype column containing sample identifiers used to align phenotype and methylation data.
Character. Name of the phenotype column containing timepoint labels.
Character vector or comma-separated string of timepoints to retain and split into separate in-memory subsets.
Character vector or comma-separated string of timepoints to combine into the longitudinal phenotype-plus-beta object.
Character. Directory used for saved phenotype CSV files
when saveOutputs = TRUE.
Character. Directory used for saved metric .RData files
when saveOutputs = TRUE.
Character. Directory used for saved merged
phenotype-plus-beta .RData files when saveOutputs = TRUE.
Character. Name of the phenotype sex column used when building Clock Foundation exports.
Character. Directory used for log files when logs = TRUE.
Character. Directory used for Clock Foundation export files
when saveOutputs = TRUE.
Logical. If TRUE, emit progress messages with message().
The default is FALSE.
Logical. If TRUE, write the same progress messages to
outputLogs. The default is FALSE.
Logical. If TRUE, write the legacy CSV, ZIP, and .RData
outputs to disk. The default is FALSE, so the function can be used in the
more traditional in-memory Bioconductor style.
A list with class "dnaEPICO_preprocessingPheno".
Phenotype table read from phenoFile.
Object returned by loadMetricsPreprocessingPheno()
containing the beta-value, M-value, and copy-number matrices loaded from
betaPath, mPath, and cnPath.
Object returned by splitTimepointsPreprocessingPheno()
containing per-timepoint phenotype tables and methylation matrices.
Object returned by
combineTimepointsPreprocessingPheno() containing the merged longitudinal
phenotype-plus-beta object and the timepoint combination metadata.
Object returned by
buildClockFoundationInputsPreprocessingPheno() containing the beta table
and phenotype table prepared for Clock Foundation export.
Object returned by writePreprocessingPhenoOutputs() when
saveOutputs = TRUE, otherwise NULL.
Resolved path to the optional log file, or NULL when
logging was disabled.
See dnaEPICO_preprocessingPheno for a class-level overview.
tmp <- tempdir()
pheno <- data.frame(
Sample_Name = c("S1", "S2", "S3"),
Timepoint = c("1", "1", "2"),
Sex = c(0, 1, 0),
stringsAsFactors = FALSE
)
beta <- matrix(
c(0.10, 0.20, 0.30, 0.40, 0.50, 0.60),
nrow = 2,
dimnames = list(c("cg1", "cg2"), pheno$Sample_Name)
)
m <- beta * 10
cn <- beta * 100
pheno_file <- file.path(tmp, "pheno.csv")
beta_path <- file.path(tmp, "beta.RData")
m_path <- file.path(tmp, "m.RData")
cn_path <- file.path(tmp, "cn.RData")
utils::write.csv(pheno, pheno_file, row.names = FALSE)
save(beta, file = beta_path)
save(m, file = m_path)
save(cn, file = cn_path)
result <- preprocessingPheno(
phenoFile = pheno_file,
betaPath = beta_path,
mPath = m_path,
cnPath = cn_path,
SampleID = "Sample_Name",
timeVar = "Timepoint",
timepoints = "1,2",
combineTimepoints = "1,2",
outputPheno = file.path(tmp, "data", "preprocessingPheno"),
outputRData = file.path(tmp, "rData", "preprocessingPheno", "metrics"),
outputRDataMerge = file.path(tmp, "rData", "preprocessingPheno", "mergeData"),
sexColumn = "Sex",
outputLogs = file.path(tmp, "logs"),
outputDir = file.path(tmp, "clockFoundation"),
saveOutputs = FALSE
)
stopifnot(inherits(result, "dnaEPICO_preprocessingPheno"))