True and potential retroviruses are typical in a wide range of organisms, from plants to vertebrates. They are firstly recognized for displaying in the internal region an additional Open Reading Frame (ORF) that codifies for an envelope (env) polyprotein necessary for transferring retroviruses cell-to-cell (see Retroviridae family). However, env-like polyproteins are extremely variable and not readily identified from primary sequence data. Independent origins were in fact suggested for the capture of an env-like gene by the different lineages of retroviruses described in protostomes (Malik, Henikoff and Eickbush 2000). Despite this, a canonical env polyprotein reveals the following elements; a signal peptide, a fusion peptide, an anchor peptide, a peptide cleavage site, glycosylation sites and a C-terminal transmembrane domain (Varmus and Brown 1989; Coffin 1990).
Envs encoded by vertebrate retroviruses are usually composed of two subunits:
Little is known concerning what is common at the primary structure level among all retroviral env polyproteins. However, a conserved amino acidic "KRG" motif (also termed "R-X-K-R") has been described preceding a zone common to all retroviral env-like polyproteins (Kim et al. 1994; Leblanc et al. 1997; Lerat and Capy 1999; Malik, Henikoff and Eickbush 2000). This motif is the consensus cleavage site recognized by the cellular endopeptidase that cleaves the env precursor into the SU and TM peptides (Steiner 1998). Also, the following template "R-x(2)-R-X(5,6)-[GE]-x(5)-[LV]-x-Gx(2)-D-x(2)-D" has been suggested for the in silico detection of insect retroviral env sequences in databanks (Terzian, Pelisson and Bucheton 2001).