Scientific Papers

Introduction of protein vaccine candidate based on AP65, AP33, and α-actinin proteins against Trichomonas vaginalis parasite: an immunoinformatics design | Parasites & Vectors

Description of Image

Genome extraction of AP33, AP65, and α-actinin proteins

The sequences of AP33 (accession number: Q65ZG5), AP65 (accession number: Q27093), and α-actinin (accession number: O96524) proteins were extracted from the UniProtKB database.

Prediction of B cell epitopes

Prediction of B cell epitopes for all three proteins, AP33, AP65, and α-actinin, was done using Bepipred and IEDB servers (Kolaskar and Tongaonkar) (Additional file 1: Table S1).

Prediction of T-cell epitopes

IEDB and Rankpep databases were also used to predict T cell epitopes. The allelic group for MHCII alleles DRB1*0101, *0301, *0401, *0701, *0801, *1101, *1301, *1501, which covers the genetic background of most humans, was selected. The most important epitopes with the highest score were selected (Additional file 2: Table S2).

Selection of epitope-rich regions

The regions of AP33, AP65, and α-actinin proteins with the highest epitope abundance are considered as the target domain for vaccine design to select the domains that make up the vaccine candidate. Finally, nine epitope-rich domains from these three proteins were selected as vaccine candidates, which contain a large number of B cell and T cell epitopes (Table 1).

Table 1 Nine epitope-rich domains selected from three proteins AP33, AP65, and α-actinin

Protein design using selected domains and different linkers

By combining selected epitope-rich domains at different positions using EAAAK, EAAAKEAAAK, and GGGGS linkers, several protein constructs were designed. The designed constructs were evaluated based on physicochemical properties, antigenicity, and secondary and tertiary structure, and finally the most suitable construct was introduced as a vaccine candidate (Additional file 3: S3) (Fig. 1a, b).

Fig. 1
figure 1

a Schematic diagram of the final construct of the multiepitope protein. b The tertiary structure of the designed protein

Physical and chemical properties of designed structure

Using the EXPASY ProtParam server (, the physical and chemical properties of the designed structures such as the number of amino acids, molecular weight, PI, number of charged amino acids, amino acid composition, hydrophobicity, and hydrophilicity were obtained. The results of this investigation showed that our designed vaccine candidate protein finally consisted of 780 amino acids and had a molecular weight of 85,190.25 daltons (Table 2). The instability index (< 40) indicates that the designed protein has high stability to induce an immunogenic response. The instability index of our vaccine candidate was 35.8, which classifies the protein as stable. The aliphatic index of the recombinant protein was calculated to be 86.04, indicating the stability of this protein at different temperatures (Table 2).

Table 2 Evaluation of physical and chemical properties of the designed structure using the EXPASY ProtParam Server

Antigenicity, allergenicity, and solubility evaluation

The Vaxijen 2.0 server predicts the designed protein as an antigen with a threshold score ≥ 0.4 (score: 0.4983). The Evaller web server was used to check the allergenicity of the designed structure. The designed protein was not allergenic. The solubility of the vaccine candidate was also evaluated using the Protein-sol server. Our selected protein has a solubility score of 0.555. Solubility-scaled proteins using the Protein-sol server that have a score greater than 0.45 indicate a higher solubility than the average soluble E. coli protein from the experimental solubility dataset [45]; therefore, our designed protein has a high solubility.

Secondary and tertiary structure prediction and validation

The GOR software was used to check the second structure of the designed structures. The amino acids that make up these recombinant proteins are involved in the formation of random coils, alpha helixes, and beta strands. The results showed that out of 780 amino acids, 430 amino acids (55.13%) are alpha helix, 96 amino acids (12.31%) are extended strands, and 254 amino acids (32.56%) are random coils (Fig. 2a). Tertiary structures were predicted by the I-TASSER server for designed protein sequences. All structures were validated and the best structure was selected. Predicted tertiary structures were evaluated using the MolProbity, ProSA-web, and SAVES servers. The MolProbity server was used to evaluate the structural similarity of new proteins to the best-known structures of similar proteins ( On MolProbity analysis, the protein structure analysis was evaluated based on the Clash score and the MolProbity score. The SAVES server ( was also used to check the Ramachandran plot and evaluate the placement of amino acids in the favored, allowed, and disallowed regions.

Fig. 2
figure 2

Predicting and validating the secondary and tertiary structure of the vaccine candidate. a Secondary structure of the designed protein. b Validation of the tertiary structure of the protein by ProSA-web. c Validation of the tertiary structure of the protein by Ramachandran plot

On MolProbity evaluation, it was found that the Clash score for this protein was 2.49 (99% similar to the structures). Also, the MolProbity score was 2.13 (69% similar to the best structures). ProSA-web analyzed a 3D model of the vaccine candidate using an energy plot and Z-score. ProSA-web analyzed a 3D model of the vaccine candidate using energy plot and Z-score. The Z-score of the selected protein was −3.44, which was within the range of native protein structure (Fig. 2b). The evaluation of the Ramachandran diagram also showed that 97.4% of the amino acids were in the favored and allowed region and 2.6% were in the nonallowed areas, indicating the appropriate structure predicted for the protein (Fig. 2c).

Prediction of conformational B cell epitopes

Ellipro servers were used to predict this type of epitope (Fig. 3a–f). The 3D structure of the designed vaccine protein used in the Ellipro server was predicted by the I-TASSER server. The most antigenic epitopes with a score above 0.5 is presented in Table 3.

Fig. 3
figure 3

The most potent vaccine candidate conformational epitopes designed using the Ellipro server. a Epitope with score 0.901, b Epitope with score 0.736, c Epitope with score 0.723, d Epitope with score 0.657, e Epitope with score 0.641, f: Epitope with score 0.596

Table 3 Prediction of B cell conformational epitopes by Ellipro

Protein–protein molecular docking

Cluspro 2.0 was used to study the protein–protein binding between the designed vaccine candidate with TLR4 and TLR2. To select the best interaction, the parameters of the weighted score and number of clusters calculated by Cluspro 2.0 were evaluated. In addition, hydrogen and hydrophobic bonds between the vaccine candidate and TLR4 and TLR2 were investigated using the LIGPLOT tool. Finally, we considered the lowest energy and the lowest affinity (Kd) obtained from the PRODIGY web server as essential standards for selecting the strongest complexes. The results showed that there is a strong interaction between the vaccine candidate with TLR4 and TLR2 (Table 4). Interactions between TLR2 (Fig. 4) and TLR4 (Fig. 5) and the designed vaccine candidate were observed using PyMOL and LIGPLOT. As shown in Figs. 4 and 5, the vaccine candidate made a strong interaction with the active site of the receptors, and this binding includes the essential amino acids Ile319, Phe322, Phe325, Tyr326, Val348, Phe349, and Pro352 for TLR2 and the amino acids Arg434, Arg380, Lys341, Lys263, and Gln339 for TLR4.

Table 4 Evaluation of molecular binding results between protein vaccine candidate and TLR4 and TLR2
Fig. 4
figure 4

a Graphic representation of the interaction of the designed vaccine candidate with the TLR2 complex. b LIGPLOT representation of the amino acids involved in the interaction between the protein vaccine candidate and TLR2. *Hydrogen bonds between receptors (blue) and the protein vaccine candidate (green) and hydrophobic interactions with receptors (black) and the protein vaccine candidate (blue) are indicated by dark green lines

Fig. 5
figure 5

a Graphical representation of the interaction of the designed vaccine candidate with the TLR4 complex. b LIGPLOT representation of the amino acids involved in the interaction between the protein vaccine candidate and TLR4. *Hydrogen bonds between receptors (blue) and the protein vaccine candidate (green) and hydrophobic interactions with receptors (black) and the protein vaccine candidate (blue) are indicated by dark green lines

Molecular dynamics simulation

To verify the stability of the designed protein structure and protein–receptor complexes, MD simulation was performed for up to 100 ns. The RMSD parameter is used when analyzing the results of MD simulations of proteins and complexes to obtain the degree of movement of the protein or atoms when the ligand is placed in the active site of the receptor and to evaluate the stability of the structure, deviation, and conformations of the protein or complex during the simulation period. A lower RMSD value indicates more stability and less fluctuations during the simulation. The analysis of the results related to the RMSD of the designed protein and the complexes showed that the designed protein reached stability after about 10 ns and its average RMSD was 0.95 nm (Fig. 6a). This stability is maintained during the simulation up to 100 ns. Also, protein–TLR2 complexes with an average of 1.7 nm are stable during the simulation (Fig. 6a). The protein–TLR4 complex reached stability after about 40 ns with an average RMSD of 1.1 nm, and considering that the fluctuations during 40–100 ns are less than 0.3 nm, it can be concluded that the complex has reached stability (Fig. 6a). Another parameter that has been investigated in the evaluation of MD simulations is the Rg, which is evaluated the amount of compression changes during the MD simulation. Rg is defined as the distribution of a protein’s atoms around its axis and is widely used in the calculation of protein behavior. Therefore, this variable allows us to analyze the overall dimensions of the protein, and the more stable the compression of the protein is during the simulation, it indicates the stability of the protein and the complexes. As the graph shows, the fluctuations of the designed protein alone and in interaction with TLR4 and TLR2 are stable during the simulation (Fig. 6b).

Fig. 6
figure 6

a RMSD results of the designed protein and protein–TLR2 and protein–TLR4 complexes in unit time (ns). b Rg results of the designed protein and protein–TLR2 and protein–TLR4 complexes per unit time. c RMSF results of the designed protein in the noninteracting form and in the interacting form with TLR2 and TLR4

The RMSF of the amino acid residues can be used to evaluate the motion and flexibility of the structure. In addition, we decided to perform an RMSF analysis to examine the changes in the backbone atoms of the designed protein and the protein–TLR4 and protein–TLR2 complexes. In this analysis, the average value of changes of each residue during the simulation was plotted. As shown in Fig. 6c, the RMSF values show small fluctuations (less than 0.3 nm) for most amino acids in protein–TLR4 and protein–TLR2 complexes compared with the designed protein. These results show that the designed protein becomes more stable in interaction with the immune system receptors.

Snapshots taken at 0, 50, 75, and 100 ns intervals to check the state of the vaccine during the simulation showed that the structure of the vaccine and the site of interaction of the vaccine with the receptors were stable during the simulation (Fig. 7a–c).

Fig. 7
figure 7

Snapshots of 0, 50, 75, and 100 ns of MD simulation of the vaccine candidate and ligand–receptor complexes. a The vaccine candidate, b vaccine candidate–TLR2, and c vaccine candidate–TLR4 complexes. Brown: 0 ns; blue: 50 ns; purple: 75 ns; light green: 100 ns

Using covariance matrices of Cα atoms, PCA calculates the significant motions of atom pairs associated with vital biological functions. The first two principal components (PC1 and PC2) of the candidate vaccine, candidate vaccine–TLR2 and candidate vaccine–TLR4 complexes were generated by projecting the trajectories onto their respective eigenvectors. Figure 8 shows the PCA of the three structures. The plot shows that most of the common essential subspace was occupied by the vaccine candidate–TLR2 and vaccine candidate–TLR4 complexes. In the Eigenvector (EV) plots, the three structures shared a common conformational subspace. The sampling of both systems demonstrates the stability of the complexes and the vaccine candidate in the simulation. In addition, the FELs of the first and second PCA showed that the vaccine candidate, vaccine candidate–TLR2 and vaccine candidate–TLR4 complexes had global energy minima of 7.71, 7.54, and 7.15 kJ mol−1, respectively (Fig. 9). The Gibbs energy landscape shows the same energy range for all three structures and it can be argued that the structures have not undergone sudden drastic changes and are stable. These results are consistent with the analysis of RMSD, Rg, and RMSF values.

Fig. 8
figure 8

Conformational sampling in principal component analysis. Two-dimensional projection of trajectories showing conformational sampling of the vaccine candidate and vaccine candidate–TLR2 and vaccine candidate–TLR4 complexes

Fig. 9
figure 9

The Gibbs energy landscape plot during 100 ns of simulation. a The vaccine candidate, b Vaccine candidate-TLR2, c Vaccine candidate-TLR4 complexes

Immune simulation

The C-ImmSim server was used to simulate the immune system response to the designed vaccine candidate. Figure 7 shows the simulation of the host immune response to the vaccine candidate protein. Antigen and immunoglobulin parameters, cytokine production, TH cell population and B cell population were examined in this evaluation. An increase in IgM levels indicates the initial host response. In addition, a secondary response to the designed protein as antigen is indicated by increased levels of B cell population (Fig. 10a), TH cell population (Fig. 10b), and IgG1 and IgG2 (Fig. 10c). There was also a significant increase in the levels of cytokines and interleukins after immunization, especially interferon-γ (Fig. 10d). Interpretation of the results indicates that the vaccine candidate is capable of stimulating the immune system to produce cytokines and antibodies against T. vaginalis.

Fig. 10
figure 10

In silico immunity simulation against protein antigen designed as a vaccine candidate using C-ImmSim web server. Simulations after three injections at steps 1, 336, and 672 are presented. a B cell population. b TH cell population. c Antigen and immunoglobulin. d Cytokine production

Codon optimization and in silico cloning of the designed candidate vaccine

Codon optimization was performed using the JCat tool. After codon optimization, the sequence length of the designed structure was 2352 nucleotides. The codon compatibility index and the GC content of the nucleotide sequence before the optimization were 0.311% and 66.24%, respectively. After codon optimization, the parameters were 1% and 50.73%, respectively (Fig. 11a, b). The simulation of the optimized sequence of the vaccine candidate in pET-28a(+) using the SnapGene software showed that the vaccine candidate sequence is clonable in pET-28a(+) (Fig. 12a). In the middle of the designed construct, there is a cleavage site for HindIII and BsrGI enzymes, so we set the first and last sequence of the construct with NcoI and XhoI enzymes, respectively. Double digestion with NcoI and XhoI enzymes showed presence of vaccine candidate (2346 bp) together with pET-28a(+) vector (5231 bp) (Fig. 12b).

Fig. 11
figure 11

Codon optimization using the JCat web server. a Before optimization, b after optimization

Fig. 12
figure 12

a Cloning of the designed protein construct into the pET-28a vector (shown in blue). b Informatics evaluation of the cloning of the designed protein by double digest

Description of Image

Source link