copy_number_alt.Rmd
The germline sample is optional as, in the absence of patient-matched methylation data, you may already have an allele-specific CNA solutions for your bulk tumor. For example, this could be derived from bulk WGS of the same sampl.
You can provide this data in tab-delimited text file as shown below. Importantly,:
chrom | start | end | major_cn | minor_cn | purity | ploidy |
---|---|---|---|---|---|---|
chr1 | 1 | 400 | 2 | 1 | 0.67 | 3.5 |
chr1 | 401 | 1000 | 1 | 1 | 0.67 | 3.5 |
To run CAMDAC with this CNA solution, pass attach the file to the tumor CamSample()
object:
library(CAMDAC)
# Load test data
b_tumor <- system.file("testdata", "tumor.bam", package = "CAMDAC")
b_normal <- system.file("testdata", "normal.bam", package = "CAMDAC")
cna_file <- system.file("testdata", "test.cna.txt", package = "CAMDAC")
# Set config
config <- CamConfig(outdir="./results", bsseq="wgbs", lib="pe", build="hg38", n_cores=10)
# Create tumor object and attach CNA solution
tumor <- CamSample(id="T", sex="XY", bam=b_tumor)
attach_output(tumor, config, "cna", cna_file)
# Define normal object(s) for deconvolution or differential methylation
normal <- CamSample(id="N", sex="XY", bam=b_normal)
# Run pipeline with CNA solution
pipeline(
tumor=tumor,
germline=NULL,
infiltrates=normal,
origin=normal,
config=config
)
If no SNP file is present for the germline, CAMDAC will infer the copy number calls from the tumor sample alone. Here, the BAF is calculated by a threshold on the tumor BAF, and the LogR is calculated by taking the coverage relative to the median. These results are not as accurate as using a germline normal sample.
You may already know where heterozygous SNPs lie for your sample, obviating the need for a tumor BAF threshold. In addition, you may have a proxy of the normal coverage for your platform, which is an improvement over taking the tumor median. You can provide this information by attaching a SNPs file to the germline CamSample object. The file should contain:
Field | Description |
---|---|
chrom | Chromosome name |
POS | Position of SNP |
BAF | (optional) B-allele frequency at this SNP |
total_counts | (optional) Total number of reads at this SNP |
POS and total_counts are used to derive the BAF and the LogR respectively. We strongly recommend that total_counts is derived from a normal sample sequenced with the same bisulfite-sequencing assay as the tumor, and unmatched patient samples are acceptable.
CAMDAC may be run to the copy number calling stage using the external heterozygous SNP file:
library(CAMDAC)
# Load test data
b_tumor <- system.file("testdata", "tumor.bam", package = "CAMDAC")
snps_file <- system.file("testdata", "test.to.norm_pos.csv.gz", package = "CAMDAC")
# Set config
config <- CamConfig(outdir="./results", bsseq="wgbs", lib="pe", build="hg38", n_cores=10)
# Create tumor object and attach CNA solution
tumor <- CamSample(id="T", sex="XY", bam=b_tumor)
attach_output(tumor, config, "cna", cna_file)
# Define normal object(s) for deconvolution or differential methylation
germline <- CamSample(id="G", sex="XY")
attach_output(germline, config, "snps", snps_file)
# Run pipeline with CNA solution
pipeline(
tumor=tumor,
germline=germline,
infiltrates=NULL,
origin=NULL,
config=config
)
After this, we recommend inspecting the CNA results. If all is well, the pipeline() function can be repeated with the infiltrates and origin CamSamples to complete deconvolution and differential methylation respectively.