Synonymous Constraint Track Hub

Description

These tracks represent regions of protein-coding sequences in which the rate of synonymous substitutions during mammalian evolution has been significantly lower or higher than in the rest of that gene. There are two tracks: Synonymous Constraint Elements (SCEs) have had lower than normal synonymous substitution rates while Synonymous Acceleration Elements (SAEs) have had higher than normal rates.

Synonymous Constraint Elements contain functional elements that impose constraint in addition to the constraint imposed on the amino acid sequence. Such overlapping elements could include splicing regulatory elements, dual-coding regions, RNA secondary structures, microRNA target sites, and developmental enhancers (Lin et al. 2011).

Synonymous Acceleration Elements seem to be located in highly mutable areas of genes. They are enriched for de novo mutations and for low frequency non-private single-nucleotide variants (SNVs).

A third track, Regions Searched, shows the coding regions that were searched for SCEs and SAEs. This track can be used to distinguish regions that lack SCEs and SAEs from those that were not studied. (This track is unavailable for hg19.)

Methods

A full description of these tracks can be found in (Wolf 2019).

The regions were determined using FRESCo (Sealfon et al. 2015) with 9 codon windows. FRESCo first fits a maximum-likelihood HKY model of nucleotide evolution to the sequence alignment of the coding region of each gene. Using the parameters from the nucleotide model, it then estimates branch lengths and codon model parameters for that gene using a Muse-Gaut 94 type model with an F3x4 estimator of equilibrium codon frequencies. Finally, it runs a scanning window across the alignment. For each window, it estimates position-specific synonymous and nonsynonymous substitution rates (alternative model) and position-specific nonsynonymous substitution rate only, keeping the synonymous rate at the gene-wide average (null model), and performs a likelihood ratio test to compare the two models. The probabilities for each window were Bonferroni corrected to allow for differences in gene length to a significance threshold of 0.05. We merged neighboring significant windows, and designated as SCEs those regions with significantly lower synonymous rates, and as SAEs those with significantly higher synonymous rates, i.e., those regions that are significantly depleted or enriched, respectively, for synonymous substitutions, relative to the gene-wide average.

Alignments were obtained from the UCSC Genome Browser. The hg19 tracks were computed using the 29-mammal alignments, applied to all CCDS release 9 transcripts. The hg38 tracks were computed using a 24 placental mammal subset of the 100-vertebrate alignments (Human, Chimp, Rhesus, Bushbaby, Chinese_tree_shrew, Squirrel, Mouse, Rat, Guinea_pig, Rabbit, Pika, Alpaca, Dolphin, Cow, Horse, Cat, Dog, Megabat, Microbat, Hedgehog, Shrew, Elephant, Tenrec, and Armadillo), applied to all CCDS release 20 and GENCODE v27 protein-coding transcripts.

Credits

Questions related to the computation of SCEs and SAEs should be directed to Maxim Wolf.

Questions related to the track hub should be directed to Irwin Jungreis.

References

Lin MF, Kheradpour P, Washietl S, Parker BJ, Pedersen JS, Kellis M. Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome Res. 2011 Nov;21(11):1916-28. doi: 10.1101/gr.108753.110. Epub 2011 Oct 12.

Sealfon RS, Lin MF, Jungreis I, Wolf MY, Kellis M, Sabeti PC. FRESCo: finding regions of excess synonymous constraint in diverse viruses. Genome Biol. 2015 Feb 17;16:38. doi: 10.1186/s13059-015-0603-7.

Wolf, MY. Evolutionary and structural signatures of protein-coding function: synonymous acceleration, read-through, and structural impact of mutations. (2019) Doctoral dissertation. Massachusetts Institute of Technology.