R/methylationGLM_T1.R
methylationGLM_T1.RdmethylationGLM_T1() is the high-level coordinator for the one-timepoint GLM
stage of the dnaEPICO workflow. It prepares the merged phenotype-plus-beta
input, optionally creates exploratory plots, fits one Gaussian GLM per CpG for
each requested phenotype, extracts CpG-level summaries, optionally collects
significant CpG coefficient tables, generates diagnostic plots, annotates the
combined summary table, and optionally writes legacy-style outputs to disk.
The default behavior is now in-memory and quiet, which makes the function
easier to compose with other package functions and more aligned with typical
Bioconductor usage.
methylationGLM_T1(
inputPheno = "rData/preprocessingPheno/mergeData/phenoBetaT1.RData",
outputLogs = "logs",
outputRData = "rData/methylationGLM_T1/models",
outputPlots = "figures/methylationGLM_T1",
phenotypes = c("DASS_Depression", "DASS_Anxiety", "DASS_Stress", "PCL5_TotalScore",
"MHCSF_TotalScore", "BRS_TotalScore"),
covariates = "Sex,Age,Ethnicity,TraumaDefinition,Leukocytes,Epithelial.cells",
factorVars = "Sex,Ethnicity,TraumaDefinition",
cpgPrefix = "cg",
cpgLimit = NA,
nCores = 32,
plotWidth = 2000,
plotHeight = 1000,
plotDPI = 150,
interactionTerm = NULL,
libPath = NULL,
glmLibs = "glm2",
prsMap = NULL,
summaryPval = NA,
summaryResidualSD = TRUE,
saveSignificantCpGs = FALSE,
significantCpGDir = "preliminaryResults/cpgs/methylationGLM_T1",
significantCpGPval = 0.05,
saveTxtSummaries = TRUE,
chunkSize = NULL,
summaryTxtDir = "preliminaryResults/summary/methylationGLM_T1/glm",
fdrThreshold = 0.05,
padjmethod = "fdr",
annotationPackage = "IlluminaHumanMethylationEPICv2anno.20a1.hg38",
annotationCols = c("Name", "chr", "pos", "UCSC_RefGene_Group", "UCSC_RefGene_Name",
"Relation_to_Island", "GencodeV41_Group"),
annotatedGLMOut = "data/methylationGLM_T1",
display = FALSE,
verbose = FALSE,
logs = FALSE,
saveOutputs = FALSE
)Character. Path to the merged phenotype-plus-beta .RData
or .rds object created by preprocessingPheno(). The default points to
the timepoint-1 object produced by the package workflow.
Character. Directory used for optional log files.
Character. Directory used for optional serialized model and summary outputs.
Character. Directory used for optional TIFF plots.
Character vector or comma-separated phenotype variables to model.
Character. Comma-separated covariate variables included in each GLM.
Character. Comma-separated variables that should be treated as factors before modeling.
Character. Prefix used to identify methylation columns in the
merged phenotype-plus-beta input object. The default is "cg".
Integer or NA. Maximum number of CpGs to analyse. Use NA
to keep all CpGs matching cpgPrefix.
Integer. Number of worker processes to use while fitting models and extracting summaries.
Integer. TIFF width in pixels when plots are written to disk.
Integer. TIFF height in pixels when plots are written to disk.
Integer. TIFF resolution in DPI when plots are written to disk.
Character or NULL. Optional interaction term. When
supplied and present in the input data, the phenotype is modeled together
with its interaction against this variable.
Character vector or NULL. Optional library paths forwarded to
worker processes. By default, the current .libPaths() are used.
Character. Comma-separated package names to validate on worker
processes. The default is "glm2".
Character or NULL. Optional phenotype-to-PRS mapping in the
form "Phenotype1:PRS_1,Phenotype2:PRS_2".
Numeric or NA. Optional p-value threshold applied to the
returned CpG summary tables. Use NA to keep all summary rows.
Logical. If TRUE, append residual standard
deviations to the CpG summary tables and residual diagnostic plots.
Logical. If TRUE, collect significant CpG
coefficient tables in the returned object and optionally write them to disk
when saveOutputs = TRUE.
Character. Directory used for optional significant CpG coefficient tables.
Numeric. P-value threshold used to collect or write significant CpG coefficient tables.
Logical. If TRUE and saveOutputs = TRUE, write
tab-delimited summary tables to summaryTxtDir.
Integer or NULL. Number of CpGs processed per summary
extraction chunk. NULL chooses a value automatically.
Character. Directory used for optional tab-delimited GLM summary tables.
Numeric. False-discovery-rate threshold used to highlight CpGs in the residual-significance diagnostic plots.
Character. P-value adjustment method passed to
stats::p.adjust(). The default is "fdr".
Character. Annotation package or object name passed
to minfi::getAnnotation(), for example
"IlluminaHumanMethylationEPICv2anno.20a1.hg38".
Character vector or comma-separated annotation columns to append to the combined GLM summary table. Available columns depend on the selected annotation package.
Character. Directory used for the optional annotated GLM summary CSV file.
Logical. If TRUE, draw exploratory and diagnostic plots on
the active graphics device.
Logical. If TRUE, emit progress messages with message().
The default is FALSE, so the function is quiet unless requested.
Logical. If TRUE, write the same progress messages to
file.path(outputLogs, "log_methylationGLM_T1.txt").
Logical. If TRUE, write optional serialized model files,
summary tables, significant-CpG tables, annotated results, and TIFF plots to
the requested output directories. The default is FALSE, so the function
returns in-memory results without writing files.
A list with class "dnaEPICO_methylationGLM_T1".
Object returned by prepareMethylationGLM_T1Data()
containing the merged phenotype-plus-beta analysis table and modeling
metadata.
Object returned by
plotMethylationGLM_T1Distributions() describing any exploratory plots that
were generated or written.
Object returned by fitMethylationGLM_T1Models()
containing the per-phenotype CpG model fits.
Object returned by
summarizeMethylationGLM_T1Models() containing the combined CpG summary
tables used for reporting and annotation.
Object returned by
collectSignificantCpGsMethylationGLM_T1() containing optional
phenotype-specific significant-CpG tables.
Object returned by
plotMethylationGLM_T1Diagnostics() describing the diagnostic plot objects
and any written TIFF files.
Object returned by
annotateMethylationGLM_T1Summaries() containing the annotated combined
summary table.
Object returned by writeMethylationGLM_T1Outputs() when
saveOutputs = TRUE, otherwise NULL.
See dnaEPICO_methylationGLM_T1 for a class-level overview.
if (requireNamespace("IlluminaHumanMethylation450kanno.ilmn12.hg19", quietly = TRUE)) {
tmp <- tempdir()
toy_path <- file.path(tmp, "phenoBetaT1.RData")
phenoBT1 <- data.frame(
Sample_Name = c("S1", "S2", "S3", "S4"),
status = factor(c("Case", "Case", "Control", "Control")),
sex = factor(c("F", "M", "F", "M")),
cg00000029 = c(0.20, 0.25, 0.22, 0.27),
cg00000108 = c(0.60, 0.55, 0.52, 0.58),
check.names = FALSE
)
save(phenoBT1, file = toy_path)
result <- methylationGLM_T1(
inputPheno = toy_path,
phenotypes = "status",
covariates = "sex",
factorVars = "status,sex",
cpgLimit = 2,
nCores = 1,
summaryPval = 1,
annotationPackage = "IlluminaHumanMethylation450kanno.ilmn12.hg19",
annotationCols = "Name,chr,pos",
display = FALSE,
verbose = FALSE,
logs = FALSE,
saveOutputs = FALSE
)
class(result)
}
#> [1] "dnaEPICO_methylationGLM_T1"