Acquired immunodeficiency syndrome (AIDS), caused by human immunodeficiency virus (HIV), is one of the most fatal diseases to threat human life for its infectivity and high mortality. Since its recognition in 1981, more than 60 million people have been infected with HIV around the world, and approximately 25 million people have died of AIDS. Nowadays, more than 34 million are living with HIV infection , . Currently, the main strategies for treating AIDS are through disrupting one or several key steps of HIV life cycle to control the replication rate of HIV virus.
HIV-1 protease is one of the main therapeutic targets in HIV and it is a dimeric protein composed of two identical 99-residue chains. The protease cleaves the Gag-Pol polyprotein into structure proteins and enzymes, which is a necessary step for the generation of new infectious virus particles, and nine of the twenty-eight FDA-approved anti-HIV drugs in current use target the HIV-1 protease. However, mutations were found in the protease soon after the HIV protease inhibitors were introduced, and the high mutation rate of HIV-1 protease allows the virus to escape from the antiviral therapy . So it is necessary to acquire a reasonable method to predict antivirus capability of compounds for a wide spectrum of HIV.
To date, for experimental methods, high-throughput screen is mostly used to filter novel compounds against all kinds of targets as well as HIV mutated variants; for in silico methods, molecular docking , , , pharmacophore models , , quantitative structure-activity relationship (QSAR) , , ,  etc are widely used to virtually screen antiviral compounds against HIV mutated variants. However, these methods are limited to the study of the molecular recognition of one series of ligands interacting with single target. In addition, the experimental assays are not only cost-consuming but also limited by the repertoire of compounds . What the previous methods obtained are only suitable for single variant rather than an overall bioactivity profile of compounds' activity against series of variants. Although several methods have been proposed on multi-target, like Liu et.al , _ENREF_13_ENREF_13 applied multi-task learning in QSAR to analyze and design the novel multi-target HIV-1 inhibitors as well as HIV-HCV co-inhibitors; Ragno et.al , De Martino et.al  and Sotriffer et.al  used cross-docking to gain insight on the mode of action of new anti-HIV agents against both wild-type and resistant strains, in such multi-target QSAR models, there are no explicit descriptions for targets, especially for the interaction information of target-ligand pairs , . On the other hand, it is well known that docking is time-consuming, and the accuracy and versatility of the scoring functions are the main issues for the current docking algorithms , , , , .
More recently, proteochemometric modeling has been widely used to study the mechanisms for molecular recognition of series of proteins, and widely applied in multiple variants- , , , superfamily- , , kinome- , as well as proteome-wide interaction , , . This method combines both the ligand and target descriptors, and then correlates them to the activity data. Therefore, PCM models can be considered as an extension of the QSAR models, which are only based on the ligand information. So far proteochemometrics have been successfully applied to HIV-1 protease ,  and reverse transcriptase  to analyze drug resistance over the mutational space for multiple variants and multiple inhibitors.
However, in most of previous proteochemometric modeling, cross-terms were derived from Multiplication of Ligand and Protein Descriptors (MLPD) , , , , . Cross-term is an additional introduced term. Although it was introduced to account for the complementarity of the properties of the interacting entities and it can describe the two entities simultaneously, the significance is not easy to understand. In addition, a lot of descriptors will be generated by MLPD so that it is computationally time-costive and with much redundancy. To address this issue, here we presented a new cross-term protein-ligand interaction fingerprint (PLIF) , , , , which describes the interaction of a protein's residues with its ligand. In our study, we used PLIF to construct PCM models to analyze bioactivity profiles of series of inhibitors against series of HIV-1 protease variants comprehensively.
HIV-1 protease is one of the main therapeutic targets in HIV. However, a major problem in treatment of HIV is the rapid emergence of drug-resistant strains. It should be particularly helpful to clinical therapy of AIDS if one method can be used to predict antivirus capability of compounds for different variants. In our study, proteochemometric (PCM) models were created to study the bioactivity spectra of 92 chemical compounds with 47 unique HIV-1 protease variants. In contrast to other PCM models, which used Multiplication of Ligands and Proteins Descriptors (MLPD) as cross-term, one new cross-term, i.e. Protein-Ligand Interaction Fingerprint (PLIF) was introduced in our modeling. With different combinations of ligand descriptors, protein descriptors and cross-terms, nine PCM models were obtained, and six of them achieved good predictive abilities (Q2test>0.7). These results showed that the performance of PCM models could be improved when ligand and protein descriptors were complemented by the newly introduced cross-term PLIF. Compared with the conventional cross-term MLPD, the newly introduced PLIF had a better predictive ability. Furthermore, our best model (GD & P & PLIF: Q2test = 0.8271) could select out those inhibitors which have a broad antiviral activity. As a conclusion, our study indicates that proteochemometric modeling with PLIF as cross-term is a potential useful way to solve the HIV-1 drug-resistant problem.