Overview

Reversion mutations are secondary mutations that reverse the deleterious effects of an original pathogenic mutation, partially or fully restoring the gene’s function. Reversion mutations are key mechanisms for cancer cells to develop resistance to targeted therapies such as PARP inhibitors which target DNA damage repair in cancers with BRCA1/2 mutations. Detecting reversion mutations can help understand treatment failure and predict resistance. Monitoring reversions through blood tests (ctDNA) during treatment can offer early warnings of acquired resistance.

The revert package detects reversions for a specific pathogenic mutation from BAM files of DNA-seq data. revert performs local realignments of reads in flanking windows surrounding the pathogenic mutation with permissive gap opening for soft-clipped reads and adjustments subject to pathogenic mutation, and identifies reversion mutations that restore the open reading frame of the reference gene or the reference sequence, e.g., secondary indels converting the orignal frameshift insertion or deletion into inframe indels, secondary SNVs restoring the mutant codon caused by the original nonsense or missense SNV, indels or SNVs replacing the original pathogenic mutation, secondary SNVs creating a cryptic splice donor/acceptor site or a cryptic start/stop codon, etc.

The revert package is designed to be applicable to most types of DNA-seq data such as ctDNA, WES, WGS and targeted amplicon sequencing (TAS). To start using revert quickly, see the Examples section.

Prerequisite

  • R >= 4.4.0

Inputs

Required information for running revert

  • A BAM file containing aligned reads to be processed, see below for recommendations on BAM file preparation

  • A file path to write output files

  • The reference genome version (hg19/hg38/mm10) or a FASTA file containing the open reading frames of reference sequences

  • Genomic position of a pathogenic mutation following the HGVS-like syntax for substitution, deletion, insertion, deletion-insertion (delins), or duplication, e.g., “chr13:g.32913778T>G”, “chr13:g.32913319_32913320delTG”, “chr17:g.41244706_41244707insT”, “chr17:g.41244936delinsAA”, “Brca2_5805_wt:117del”

  • Gene name and transcript Ensembl ID of the pathogenic mutation for the reference genome hg19, hg38 or mm10

  • Other default parameters

    Parameter Description Default
    detection.window the length of flanking regions to be added to both ends of pathogenic mutation locus for detecting reversion mutations 100
    splice.region the length of splicing junction region to be considered in introns 8
    check.soft.clipping whether soft-clipped reads to be realigned TRUE
    softClippedReads.realign.window the length of flanking regions to be added to both ends of pathogenic mutation locus for realigning soft-clipped reads 1000
    softClippedReads.realign.match the scoring for a nucleotide match for realigning soft-clipped reads 1
    softClippedReads.realign.mismatch the scoring for a nucleotide mismatch for realigning soft-clipped reads 4
    softClippedReads.realign.gapOpening the cost for opening a gap in the realignment of soft-clipped reads 6
    softClippedReads.realign.gapExtension the incremental cost incurred along the length of the gap in the realignment of soft-clipped reads 0
    check.wildtype.reads whether wild type reads to be processed as revertant-to-wildtype reads FALSE
    is.paired.end whether reads in BAM file are paired-end (TRUE) or single-end (FALSE) TRUE
    keep.duplicate.reads whether duplicated reads in the BAM file to be processed (TRUE) or discarded (FALSE) TRUE
    keep.secondary.alignment whether secondary alignment reads in the BAM file to be processed (TRUE) or discarded (FALSE) TRUE
    keep.supplementary.alignment whether supplementary alignment reads in the BAM file to be processed (TRUE) or discarded (FALSE) TRUE
    minimum.mapping.quality the minimum mapping quality of reads in the BAM file to be processed 0
    verbose whether progress logging to be printed to stdout TRUE
    out.failed.reads whether the name of failed reads to be written to ‘.failed_reads.txt’ file FALSE

BAM file preparation

Many state-of-art NGS aligners enable clipping modes to improve the accuracy of reads alignment by focusing on the high-confidence and well-aligned parts of a read and discarding (hard-clipping) or ignoring (soft-clipping) the non-aligned parts caused by adapters, large indels or translocations where the large indels or translocations might suggest potential large genomic rearrangements (LGRs) restoring the gene’s function partially. The revert package realigns soft-clipped reads in flanking windows surrounding the pathogenic mutation with permissive gap opening to identify the LGR reversions. To improve the sensitivity for reversion detection, it is recommended to generate the BAM files by using standard aligners in soft-clipping mode, e.g., enabling parameters -Y for bwa mem and --local for bowtie2.

Outputs

The function getReversions() writes the following result files to the output directory:

  • .reversions.txt’ contains all reversions identified for the pathogenic mutation from the BAM file.

    Column Description
    pathogenic_mutation the original pathogenic mutation
    pathogenic_mutation_left_aligned left-aligned position of the pathogenic mutation if it is an insertion or deletion
    reversion_id unique identifier of the reversion
    reversion_frequency number of reads carrying the reversion
    pathogenic_mutation_retained whether the pathogenic locus retained the original mutation (Yes), arose a different mutation (No), or reverted to wild type (WT)
    reversion the reversion for pathogenic mutation, consisting of one or more mutations
    reads_total number of total reads aligned to the pathogenic mutation locus
    reads_wildtype number of reads exhibiting wild type at the pathogenic mutation locus
    reads_withPathogenicMutation number of reads carrying the pathogenic mutation
    reads_withReplacementMutation number of reads carrying a different mutation but not the pathogenic mutation at the pathogenic locus
    mutations_in_reversion number of mutations included in the reversion
  • .split_mutations.txt’ contains information of each single mutation in a reversion.

    Column Description
    reversion_id unique identifier of a reversion, corresponding to the ‘reversion_id’ in ‘.reversions.txt’
    mutation_id unique identifier of each single mutation in a reversion
    mutation_type SNV, INS, DEL, DELINS or WT (self-revertant mutation represented by MT>WT)
    mutation genomic position of the mutation in HGVS-like syntax
    mutation_length_change length of the reference sequence change caused by the mutation
    pathogenic_mutation the original pathogenic mutation
    distance_to_pathogenic_mutation distance in reference sequence between the mutation and the pathogenic mutation
  • .revert_assembly.bam’ contains all reads realigned to the pathogenic mutation. An RG tag is added to each realigned read indicating two read groups, ‘Revertant’ and ‘NonRevertant’. The revert-assembled BAM file can be loaded to IGV for visualizing reversions.

  • .revert_assembly.bam.bai’ is the index file for ‘.revert_assembly.bam’.

  • .revert_settings.txt’ contains the summary of running parameters and processed reads.

  • .failed_reads.txt’ (optional) contains the names of reads failed for reversion detection.

Examples

Reversion detection for a frameshift deletion

library(revert)

getReversions(
    bam.file = system.file("extdata", "toy_data_1.bam", package="revert"),
    out.dir = tempdir(),
    reference = "hg19",
    pathog.mut = "chr13:g.32913319_32913320delTG",
    gene.name = "BRCA2",
    transcript.id = "ENST00000544455" )

Reversion detection for a frameshift insertion

getReversions(
    bam.file = system.file("extdata", "toy_data_2.bam", package="revert"),
    out.dir = tempdir(),
    reference = "hg19",
    pathog.mut = "chr17:g.41244706_41244707insT",
    gene.name = "BRCA1",
    transcript.id = "ENST00000357654" )

Reversion detection for a frameshift deletion-insertion

getReversions(
    bam.file = system.file("extdata", "toy_data_3.bam", package="revert"),
    out.dir = tempdir(),
    reference = "hg19",
    pathog.mut = "chr17:g.41244936delinsAA",
    gene.name = "BRCA1",
    transcript.id = "ENST00000357654" )

Reversion detection for a nonsense SNV

getReversions(
    bam.file = system.file("extdata", "toy_data_4.bam", package="revert"),
    out.dir = tempdir(),
    reference = "hg19",
    pathog.mut = "chr13:g.32913778T>G",
    gene.name = "BRCA2",
    transcript.id = "ENST00000544455" )

Reversion detection for a splice-acceptor SNV

getReversions(
    bam.file = system.file("extdata", "toy_data_5.bam", package="revert"),
    out.dir = tempdir(),
    reference = "hg19",
    pathog.mut = "chr13:g.32928997G>A",
    gene.name = "BRCA2",
    transcript.id = "ENST00000544455" )

Reversion detection for a targeted deletion with customised reference sequence

getReversions(
    bam.file = system.file("extdata", "toy_data_6.bam", package="revert"),
    out.dir = tempdir(),
    reference = system.file("extdata", "toy_data_6_reference.fa", package="revert"),
    pathog.mut = "Brca2_5805_wt:117del",
    softClippedReads.realign.gapOpening = 8,
    check.wildtype.reads = TRUE )

Acknowledgements

Development of revert was supported by Breast Cancer Now.