US25* US30 US31 ^ US22 US23 US24 US26 US27 US28 US32 US34 -!<■-1 <"-1 '
Fig. 2 Genomic arrangement of clinical HCMV strains. HCMV ORFs that are conserved in the five sequenced HCMV strains (FIX, Ph, TR, Toledo and Merlin) are arranged in the conventional HCMV map organization. ORFs are represented as arrows demonstrating the relative orientation of ORFs; and, where applicable, black carat symbols connect exons. In several cases, e.g., UL37, UL122 and UL123, numerous spliced variants are known, but only one abundant variant is shown on the map. The color codes designate ORFs that are essential (redarrows), augmenting (yellow arrows) or nones-sential (green arrows) for replication within cultured fibroblasts. Gray arrows represent ORFs that are have not yet been tested for function. Red, yellow, green and gray ORFs are conserved in CCMV; white ORFs are not conserved in CCMV. ORFs with an asterisk do not contain an AUG > 80 codons from a stop codon. The three blue boxes represent the repeat sequences found at the ends of the unique long and unique short regions. The orange pins designate the location of virus-coded miRNAs. Their placement above or below the sequence line designates the strand on which they are encoded. Each tick mark on the black sequence line represents 1 kb of DNA
isolates, but are not found in the CCMV genome. Finally, 20 microRNAs (miR-NAs), predicted to be encoded by HCMV (Dunn et al. 2005; Grey et al. 2005; Pfeffer et al. 2005), are identified as orange pins. So far, expression of 14 of the miRNAs has been demonstrated (see the chapter by P.J.F. Rider et al., this volume).
Further details of the full set of 232 ORFs are presented in Table 2, which includes previously annotated ORFs that did not pass our filters for inclusion on the map. As is evident in Table 2, UL147a and UL148a are each missing in only the PH clinical isolate and present in CCMV. Thus, they are likely bona-fide ORFs.
The map in Fig. 2 has numerous uncertainties. Several relate to the filters used previously to qualify an ORF as a potential protein-coding sequence, and, therefore, for inclusion in the database from which we selected ORFs. First, the majority of previously annotated ORFs were required to code polypeptides meeting a minimum size standard, often 80 amino acids or more, as is evident in Table 2. This is an arbitrary cut off, utilized for practical reasons, but, of course, there is no reason to assume that HCMV does not encode smaller polypeptides. As a case in point, analysis of the proteins associated with HCMV virions (Varnum et al. 2004) raises the possibility that the virus encodes some very small polypeptides. In this study, mass spectroscopy was employed to identify proteins in preparations of purified virus particles. The analysis identified 12 tryptic-digestion products corresponding to polypeptides encoded by ORFs that were not previously recognized. Several of the ORFs encode polypeptides of fewer than 80 amino acids, and one has a coding potential of 22 amino acids. This polypeptide might be the result of spurious transcription/translation late after infection, or the polypeptide or a portion of it could be appended to a larger protein as a consequence of splicing. Although it is not possible to conclude that the virus encodes a 22-amino acid polypeptide from this data set, the observation nevertheless serves to reinforce the very likely possibility that the virus encodes small polypeptides that have been overlooked.
A second uncertainty comes from overlap restrictions placed on the pool of previously annotated ORFs. An ORF on one strand can potentially bias the sequence of the opposing strand (Silke 1997; Cebrat et al. 1998), and the high G+C content of HCMV (57%) potentially favors the presence of spurious ORFs since stop codons are A+U-rich. In past annotations, the overlap of the shorter of two overlapping ORFs has been arbitrarily limited to 60% or more or 25% or more, or the overlap has been limited to 396 bp, the longest overlap documented for two HCMV ORFs known to code proteins (UL76 and UL77). It is certainly possible that, in some instances, functional ORFs have evolved with longer overlaps.
Another significant uncertainty to the map in Fig. 2 is our incomplete understanding of HCMV splicing. It is not possible to predict splice donors and acceptors with certainty. A variety of spiced mRNAs have been successfully identified, (e.g., Stenberg et al. 1984; Rawlinson and Barrell 1993; Scott et al. 2002; Adair et al. 2003), but so far, there has been no exhaustive experimental search for spliced HCMV mRNAs. Splicing can, of course, combine ORFs originally assumed to be separate, or utilize small coding regions as a constituent of a larger mRNA.
The majority of the 173 ORFs that are present in all HCMV clinical isolates and in CCMV are extremely likely to encode proteins. Indeed, 130 of these ORFs have
Was this article helpful?