The human immunodeficiency virus type 1 (HIV-1) nucleocapsid protein (NC) is a small (55 amino acids) basic protein characterized by two zinc fingers and a basic N-terminal domain. NC exhibits numerous functions all along the virus replication cycle, being notably involved in selective packaging of unspliced viral genomic RNA and chaperoning of nucleic acid strands during reverse transcription. The multiple roles of NC in virus replication are thought to result from its interplay with various target nucleic acid sequences.
At high concentrations, NC can bind non specifically to any DNA and RNA sequence of 5–7 nt length. In contrast, at low concentrations, the binding of NC strongly depends on the sequence and the structure of the DNA or RNA sequence. Numerous in vitro studies support the notion that NC zinc fingers are responsible for specific interactions, whereas the basic N-terminal domain is involved in non-specific binding. A clear-cut feature of NC is its preference for single-stranded regions (bulges, loops, linear fragments,…) over double-stranded sequences and its ability to destabilize short double-strand regions. Interestingly, NC exhibits higher affinity for sequences containing unpaired guanines. More precisely, sequences containing TG, UG, and GNG (where N corresponds to either A, C, T or U) motifs are preferred. High-resolution structures allow understanding the structural basis for this specificity, by showing that insertion of an unpaired guanine into the hydrophobic platform at the top of the folded zinc fingers is systematically present in all solved complexes. This insertion is thought to be critical for discriminating the guanine residue from the other bases.
In this context, one important question is to understand the molecular basis of the selective binding of NC to particular sequences, such as for instance the SL2 and SL3 stem-loops involved in the specific packaging of the unspliced viral RNA genome. The recent determination of the architecture and secondary structure of the entire HIV-1 RNA genome indicates that large portions are double-stranded, suggesting that NC specific sites are limited. Furthermore, data with the Gag protein and the MuLV retroviral genome indicate that the local context can considerably increase the NC affinity for particular sequences and show that a short motif (4 nt) with a low information content can be discriminated and identified in the entire viral genome. However, understanding of the molecular mechanisms involved in the selection process is limited and requires additional studies.
Using NMR methods, we recently investigated the binding of NC(11–55) to mini-cTAR, a model stem-loop DNA molecule of 26 nt, that corresponds to the top half of cTAR, the complementary sequence of TAR (Trans activating response element) RNA. The annealing of cTAR with TAR is necessary for the first strand transfer of reverse transcription. The determination of the three-dimensional structure of mini-cTAR:NC(11–55) complex allowed by comparison with other reported NC:nucleic acid structures to identify the molecular determinants of the opposite binding polarity of NC on DNA molecules as compared to RNA molecules. Interestingly, although five guanines of mini-cTAR are not involved in stable base pairing (defined on the basis of the presence or absence of a detectable imino proton signal at 10°C) in free cTAR and constitutes therefore potential binding sites, our NMR data indicate only one major binding site in mini-cTAR corresponding to the G26 residue of the 24TGG26 sequence at the 3′-end. Furthermore, all nucleic acid partners of NC used in the previous NMR studies had only one NC binding site and only in one case a minor binding site was identified besides the main site. Therefore, the nearly exclusive binding of NC to the TGG sequence at the 3′-end of mini-cTAR is intriguing as is also the absence of significant binding to the apical and internal loops that contain unpaired guanines.
To further understand the origin of the preferential recognition of the TGG sequence by NC and the absence of significant binding in apical and internal loops, quantitative information on the motions experienced by DNA molecules in the presence and absence of NC are needed to complete the previous structural studies and to provide insights into the role of dynamics in the NC:DNA recognition. Although the top half of TAR RNA has been the subject of numerous NMR studies that describe the internal dynamics and relative motions of the stems of this hairpin, the cTAR element has been little studied with NMR methods. Here, using 13C spin relaxation, and relaxation dispersion measurements, we quantified the mini-cTAR DNA dynamics. Quantitative analysis of the relaxation data identified the main sites of the fast dynamical processes (in the ps-ns timescale) as well as the slow motions (in the µs-ms time scale). The relaxation rates and heteronuclear NOE have been measured for the C6, C8 and C1′ sites which allowed depicting the dynamics of residues at the level of both the bases and deoxyribose sugars. Large differences in the dynamics between the various parts of the hairpin were observed. Moreover, we identified several putative transient base pairs in the apical and internal loops and investigated their role in the stability of the different parts of the hairpin. Interestingly, due to these transient base pairs the unpaired guanines in the apical and internal loops and the lower stem are not fully accessible to interact optimally with NC. Therefore, only the guanines of the TGG sequence being not involved in transient base-pairs can constitute a strong binding site in this model sequence.