panels.Rmd
CAMDAC supports the use of multiple DNA methylation BAM files as a source of the normal infiltrates or normal cell of origin.
To create a panel, process your BAM files with the CAMDAC allele counter:
library(CAMDAC)
# Get BAM files
b_normal1 = system.file("inst/testdata/normal.bam")
b_normal2 = system.file("inst/testdata/normal.bam")
b_normal3 = system.file("inst/testdata/normal.bam")
# Run allele counter
for(file in c(b_normal1, b_normal2, b_normal3)){
prefix = fs::path_ext_remove(file)
outfile = paste0(prefix, ".all.SNPs.CG.csv.gz")
data = cmain_count_alleles(bam_file)
data.table::fwrite(data, outfile)
}
The allele counts files can then be merged into a single file for the panel containing methylation data for deconvolution:
panel_counts <- fs::dir_ls(".", glob="*.SNPs.CG.csv.gz")
panel <- panel_meth_from_counts(panel_counts)
data.table::fwrite(panel, "panel.m.csv.gz")
By default, panel counts are merged by summing the methylation read counts for each CpG site. You can customise the proportion of each sample that is used in the panel by specifying the ac_props
argument in panel_meth_from_counts. To get the mean across each CpG site, simply pass equal proportions for each sample.
To run CAMDAC with your newly created panel, attach your panel to a CamSample object using the meth
argument.
# Load test data
b_tumor <- system.file("testdata", "tumor.bam", package = "CAMDAC")
b_normal <- system.file("testdata", "normal.bam", package = "CAMDAC")
# Setup CAMDAC samples
tumor <- CamSample(id="tumor", sex="XY", bam=b_tumor)
normal <- CamSample(id="normal", sex="XY", bam=b_normal)
config <- CamConfig(outdir="./results", ref="./pipeline_files", bsseq="wgbs", lib="pe", cores=10)
# Setup panel sample
panel <- CamSample(id="panel", sex="XY")
panel_file <- system.file("testdata", "test_panel.m.csv.gz", package = "CAMDAC")
attach_output(panel, config, "meth", panel_file)
# Run CAMDAC with panel
pipeline(
tumor=tumor,
germline=normal,
infiltrates=panel,
origin=panel,
config=config
)
If you have not started from BAM files, you can create a panel using a matrix of beta values:
sample1 | sample2 | sample3 |
---|---|---|
0.5 | 0.6 | 0.7 |
0.4 | 0.5 | 0.6 |
Additionally, a data frame specifying the positions of each CpG site in the beta value matrix is required. Here, start and end refer to the C and G of the CpG site respectively:
chrom | start | end |
---|---|---|
chr1 | 100 | 101 |
chr1 | 200 | 201 |
The matrix and CpG locations can be passed directly to the panel_meth_from_beta()
function, along with settings.
# Load beta values and chromosome positions
ex <- system.file("testdata", "test_panel_from_beta.csv", package = "CAMDAC")
data <- data.table::fread(ex)
mat = data[, 4:ncol(data)] # Beta value matrix with 3 samples
# Create panel from beta values
panel_beta <- panel_meth_from_beta(
mat = mat,
chrom = data$chrom,
start = data$start,
end = data$end,
cov = 100,
props = c(0.1, 0.8, 0.1), # Proportions of each sample in panel
min_samples = 1,
max_sd = 1
)
As CAMDAC requires coverage at each CpG site to estimate uncertainty, the cov
value is given to all CpG sites when building a panel from beta values. Additionally, if any beta values are missing from a sample, proportions are recalculated among the remaining samples as this is the only information available to build the panel for that site.
There are two experimental arguments that can be set to filter CpG sites from the panel:
min_samples: The minimum number of samples that have to have a beta value for a CpG to be included in the panel. The idea here is if you have sparse data, you can skip sites where you aren’t confident in the panel. Set this to 1 to use any sample.
max_sd: Maximum standard deviation of beta values across samples a CpG must have to be included in the panel. The idea here is that when combining many bulk methylomes from the same tissue, sites with high variability reflect sample-specific differences and their averages are less reliable for use in a methylation panel.