Genes and protein structural elements are delineated by DNA mirror repeats and characterized by net dinucleotide composition

  • Dorothy Lang

    Student thesis: Doctoral Thesis


    Dinucleotide composition and sequence repeats are attributes of DNA that have been used for more than 40 years to evaluate sequence identity and function, phylogeny and evolution. This thesis provides new perspectives of dinucleotide composition and DNA organization by the development and application of several novel methods of analyzing sequence composition and repeats. The key aspects of the new methods described in this thesis are the identification of “net dinucleotide counts”, i.e. rXY - sYX = (r-s) netXY, and the systematic identification of imperfect mirror repeats, i.e. sequence segments that have an axis of symmetry on a single strand and a position determined by sequential evaluation of a sequence from 5’ to 3’. These measurements were found to be different perspectives of related components. For example, the segment ACGTGCA is a perfect mirror repeat with a net dinucleotide value of zero. Analysis of the distribution of imperfect mirror repeats (IMRs) found that IMRs coincide with protein structural elements. Therefore, the net dinucleotide value of a sequence is a measure of those dinucleotides that are not principally associated with the span of protein structural elements. It was found that many sequences consist of a limited number of net dinucleotides that could readily be combined - e.g., 2nAC + 2nCT + 2nTA = 2nACTA. Each of these combined values is called a circuit, and has both a qualitative and quantitative component. If a sequence consists of several circuits, they are collectively referred to as the circuit assemblage of the sequence. Circuits and circuit assemblages were found to differentiate phylogeny and function in a manner consistent with traditional phylogenetic and functional classification at all levels of genome organization. Analysis of mirror repeats and PSEs reveals that genes and genomes are hierarchically ordered, that the hierarchal order is based on the distribution of reverse dinucleotide pairs and sequence repeats, that the span of protein structural elements (helices, sheets and turns) coincides with DNA mirror repeats, and that many key aspects of protein function can be inferred from repeat motifs in the DNA from which it is translated.
    Date of AwardMay 2009
    Original languageEnglish
    SupervisorJohn W. Palfreyman (Supervisor) & Douglas H. Lester (Supervisor)

    Cite this