PGRS and MPTR are members of the PE and PPE multigene families

During a number of molecular epidemiological studies performed to investigate the relatedness of strains of the tubercle bacillus two families of repetitive DNA were detected. MPTR (Hermans et al 1992) was believed to consist of numerous

FIG. 1. Distribution of mapped and sequenced insertion sequence elements on the genome of Mycobacterium tuberculosis H37Rv. The positions of the previously mapped IS6110 and IS1081 are shown. Based on analysis of about 70% of the genome sequence, new insertion sequence elements and prophages were identified and their approximate location determined with respect to the integrated map (Philipp et al 1996a). Insertion sequence elements are provisionally referred to by the corresponding cosmid name and the prophages designated as ^Rv1 and ^Rv2. The locations of a few genetic markers and the direct repeat (DR) region harbouring the progenital copy of IS6110 are indicated.

FIG. 1. Distribution of mapped and sequenced insertion sequence elements on the genome of Mycobacterium tuberculosis H37Rv. The positions of the previously mapped IS6110 and IS1081 are shown. Based on analysis of about 70% of the genome sequence, new insertion sequence elements and prophages were identified and their approximate location determined with respect to the integrated map (Philipp et al 1996a). Insertion sequence elements are provisionally referred to by the corresponding cosmid name and the prophages designated as ^Rv1 and ^Rv2. The locations of a few genetic markers and the direct repeat (DR) region harbouring the progenital copy of IS6110 are indicated.

tandem copies of the sequence GCCGGTGTTG (or its complement) separated by 5 bp spacers, whereas PGRS (Poulet & Cole 1995a,b) comprised multiple copies of the motif CGGCGGCAA. Initial mapping work with H37Rv suggested that at least 26 loci harboured PGRS elements (Philipp et al 1996a, Poulet & Cole 1995a), whereas MPTR appeared to be even more abundant (Hermans et al 1992). Although it was initially felt that both PGRS and MPTR corresponded to families of dispersed, non-coding repeats, it is now clear that they belong to multigene families encoding proteins that are rich in Gly, Ala and Asn, and to a lesser extent Ser and Thr (data not shown). Multiple sequence alignments performed with the putative PGRS and MPTR proteins revealed that they were members of two larger protein families that are now referred to as the PE and PPE families, respectively, and that these shared a common organization.

Members of the PE protein family all have a highly conserved N-terminal domain of ~110 amino acid residues that is followed by a C-terminal segment which varies in size, sequence and repeat copy number (Fig. 2). The name PE derives from the fact that the motif Pro-Glu (PE) is found in almost all cases at positions 8 and 9 and, based on preliminary analysis of about 70% of the genome sequence, the family is expected to contain ~80 members. Phylogenetic analysis of the degrees of relatedness of the members of the PE family shows them to fall into two groups, the larger of which contains roughly 65% of proteins carrying multiple copies of Gly-Gly-Ala repeats, corresponding to the PGRS motif, or Gly-Gly-Asn repeats, whereas members of the other group share limited sequence similarity in their C-terminal domains (Fig. 3). The predicted molecular weights of the PE proteins, which are all acidic, vary considerably as a few members only contain the ~110 amino acid N-terminal domain while the majority have C-terminal extensions ranging in size from 100 to >500 residues.

Like the PE family, the PPE protein family (Fig. 2) also has a conserved N-terminal domain that comprises ~180 amino acid residues and has a Pro-Pro-Glu (PPE) motif at positions 7 to 9 followed by C-terminal segments that vary considerably in sequence and length. The PPE family of acidic proteins (mean pi = 4.4) is expected to contain > 20 members and these fall into at least three groups, one of which constitutes the MPTR class that is characterized by the presence of multiple, tandem copies of the motif Asn-X-Gly-X-Gly-Asn-X-Gly. The second subgroup contains a characteristic, well-conserved motif around position 350 (Gly-X-X-Ser-Val-Pro-X-X-Trp), whereas the other group contains proteins that are unrelated except for the presence of the common 180-residue PPE domain.

At the time of writing, there is little information concerning the significance, subcellular location and biological roles of the PE and PPE protein families which, if expressed together, would represent, numerically, about 2% of the protein species present in M. tuberculosis. Genes homologous to PGRS and

PE family

(GGAGGA)n

-110

0 0

Post a comment