Allele-specific methylation (ASM) analysis

CAMDAC can be used to detect allele-specific methylation (ASM) by phasing CpGs to heterozygous SNPs and deconvolving bulk methylation rates per allele.

This tutorial steps through the ASM analysis pipeline. Briefly:

  1. Count CpG methylation on tumor and normal at sites phased to SNP loci.
  2. Deconvolve methylation on tumor per haplotype using the normal
  3. Assign allele-specific copy number state per CpG using the bulk tumor solution
  4. Call allele-specific differential methylation within samples
  5. Call allele-specific differential methylation between samples

Results from this pipeline are found in the results directory under ‘PATIENT/AlleleSpecific’ and ‘PATIENT/Methylation’. See output file headings below for files and their content.

CAMDAC-ASM from BAM files

The asm_pipeline() function runs CAMDAC-ASM analysis by generates the allele-specific copy number solution and heterozygous SNP loci, followed by deconvolution and differential ASM analysis:

b_tumor <- system.file("testdata", "tumor.bam", package = "CAMDAC")
b_normal <- system.file("testdata", "normal.bam", package = "CAMDAC")
regions <- system.file("testdata", "test_wgbs_segments.bed", package = "CAMDAC") # speed up tests

tumor <- CamSample(id = "T", sex = "XY", bam = b_tumor)
normal <- CamSample(id = "N", sex = "XY", bam = b_normal)
config <- CamConfig(
  outdir = "./results", ref = "./pipeline_files", bsseq = "wgbs", lib = "pe", cores = 10,
  min_cov = 1, # For test data
  regions = regions
)

asm_pipeline(
  tumor = tumor,
  germline = normal,
  infiltrates = normal,
  origin = normal,
  config = config
)

CAMDAC-ASM from external inputs (in_development)

To run the ASM pipeline without BAM files, CAMDAC requires: - Each CamSample object has SNP loci - The tumor CamSample object has an allele-specific CNA solution - All CamSample objects have BAM files available for phasing

CAMDAC-ASM requires a file of heterozygous SNP loci against which CpGs will be phased. This is a tab-delimited file with a header containing four fields:

Field Description
chrom Chromosome name
pos SNP loci position
ref The reference allele (A/C/T/G)
alt The alternate SNP allele (A/C/T/G)

First, attach your SNP loci file to the tumor object with attach_output(), then run asm_pipeline():

# Setup CAMDAC samples
tumor <- CamSample(id = "tumor", sex = "XY", bam = b_tumor)
normal <- CamSample(id = "normal", sex = "XY", bam = b_normal)
config <- CamConfig(
  outdir = "./results", ref = "./pipeline_files", bsseq = "wgbs", lib = "pe", cores = 10,
  min_cov = 1, # For test data
  regions = regions
) # For arapid testing)

# Add SNPs
asm_snps_file <- system.file("testdata", "test_het_snps.tsv", package = "CAMDAC")
attach_output(tumor, config, "asm_snps", asm_snps_file)
attach_output(normal, config, "asm_snps", asm_snps_file)

Next, CAMDAC requires the allele-specific copy number solution from the tumor, attached as follows:

cna_file <- system.file("testdata", "test_cna.tsv", package = "CAMDAC")
attach_output(tumor, config, "cna", cna_file)

Finally, run the allele-specific methylation pipeline:

asm_pipeline(
  tumor = tumor,
  infiltrates = normal,
  origin = normal,
  config = config
)

Using SNPs called from previous CAMDAC runs

If you have already run the CAMDAC pipeline in tumor-normal mode, then the germline object’s SNP files will be used by default. The simplest run from BAM to ASM is shown below using matched normals for infiltrates and DMPs:

b_tumor <- system.file("testdata", "tumor.bam", package = "CAMDAC")
b_normal <- system.file("testdata", "normal.bam", package = "CAMDAC")
regions <- system.file("testdata", "test_wgbs_segments.bed", package = "CAMDAC") # speed up tests

tumor <- CamSample(id = "T", sex = "XY", bam = b_tumor)
normal <- CamSample(id = "N", sex = "XY", bam = b_normal)
config <- CamConfig(
  outdir = "./test_results", bsseq = "wgbs", lib = "pe",
  build = "hg38", n_cores = 10,
  regions = regions,
  min_cov = 1, # For test data
  cna_caller = "ascat" # Battenberg always recommended, however ASCAT used here for rapid testing.
)

# Run main CAMDAC generate SNP files for ASM
# Deconvolution skipped here for simplicity.
pipeline(tumor, germline = normal, infiltrates = NULL, origin = NULL, config)

# Run ASM pipeline
asm_pipeline(
  tumor = tumor,
  germline = normal,
  infiltrates = normal,
  origin = normal,
  config = config
)

ASM output file headings

AlleleSpecific

  • *asm_counts.csv.gz - The number of reads supporting each allele at each CpG
  • *asm_hap_stats.csv.gz - Summary statistics for each phased SNP
  • *asm_phase_map.csv.gz - A mapping of CpG-SNP phased pairs per read
  • *snps.txt - The heterozygous SNP loci input for ASM analysis
  • *cna.csv - For the tumour, the allele-specific copy number profile. See format in vignettes("pipeline").

Methylation/

  • *asm_meth.csv.gz - Allele-specific methylation rates for bulk samples
  • *asm_ss_dmp.csv.gz - Single sample differential allele-specific methylation
  • *asm_meth_cna.csv.gz - For the tumour, ASM rates with annotated copy number states
  • *asm_meth_pure.csv.gz - For the tumour, pure methylation rates for each allele
  • *asm_dmp.csv.gz - Differential allele-specific methylation between tumor and origin sample