Compile allele counts at SNPs and at CpGs for bisulfite sequencing data get_allele_counts

get_allele_counts(
  i,
  patient_id,
  sample_id,
  sex,
  bam_file,
  mq = 0,
  path,
  path_to_CAMDAC,
  build = NULL,
  n_cores,
  test = FALSE
)

Arguments

i

Integer loop index. The function must be run with all values from 1 to 25, each containing 1/25th of the RRBS covered genome.

patient_id

Character variable containting the patient id

sample_id

Character variable with the sample id

sex

Character variable with the patient sex expressed as "XX" for female or "XY" for male.

bam_file

Character variable with the full bam file name and path

mq

Character variable or numeric containting the mapping quality treshold to be used. For RRBS, set mq=0. Read mapping validity is based on read start site and nucleotides rather than mq.

path

Character path variable pointing to the desired working directory. This is where the output will be stored and should be constant for all CAMDAC functions. Do not alter the output directory structure while running CAMDAC. The function output of this function will be a sub-directory of the path variable under "./Allelecounts/sample_id/". Do not change the directory structure as subsequent functions will look for files in this directory.

path_to_CAMDAC

Character variable containting the CAMDAC installation path (e.g. "/path/to/CAMDAC/").

build

Character variable corresponding to the reference genome used for alignment. CAMDAC is compatible with "hg19", "hg38", "GRCH37","GRCH38".

n_cores

Numerical value correspdonding to the number of cores for parallel processing

test

Logical value indicating whether this is a quick test run with data subsampling

Value

One .fst file including methylation info at CpGs and BAF and depth of coverage at SNPs for the ith subset of RRBS loci