Prediction of Single Point Mutations in Human Coronavirus and Their Effects on Binding to 9-O-Acetylated Sialic Acid and Hidroxychloroquine

Due to the current spreading of the new disease CoViD-19, the World Health Organization formally declared a world pandemic on March 11, 2020. The present trends indicate that the pandemic will have an enormous clinical and economic impact on population health. Infections are initiated by the transmembrane spike (S) glycoproteins of human coronavirus (hCoV) binding to host receptors. Ongoing research and therapeutic product development are of vital importance for the successful treatment of CoViD-19. To contribute somewhat to the overall effort, herein, single point mutations (SPMs) of the binding site residues in hCoV-OC43 S that recognizes cellular surface components containing 9-O-acetylated sialic acid (9-O-Ac-Sia) are explored using an in silico protein engineering approach, while their effects on the binding of 9-O-Ac-Sia and Hidroxychloroquine (Hcq) are evaluated using molecular docking simulations. Thr31Met and Val84Arg are predicted to be the critical – most likely SPMs in hCoV-OC43 S for the binding of 9-O-AcSia and Hcq, respectively, even though Thr31Met is a very likely SPM in the case of Hcq too. The corresponding modes of interaction indicate a comparable strength of the Thr31Met/9-O-Ac-Sia and Val84Arg/Hcq (or Thr31Met/Hcq) complexes. Given that the binding site is conserved in all CoV S glycoproteins that associate with 9-O-acetyl-sialoglycans, the high hydrophobic affinity of Hcq to hCoV-OC43 S speaks in favor of its ability to competitively inhibit rapid S-mediated virion attachment in high-density receptor environments, but its considerably low specificity to hCoV-OC43 S may be one of the key obstacles in considering the potential of Hcq to become a drug candidate.


Introduction
A number of enveloped, positive single-stranded RNA viruses, denoted by CoV, are involved in respiratory, enteric, hepatic and neuronal infectious diseases both in animals and in humans. Among four CoV genera (alpha, beta, gamma and delta), the beta genus comprises bCoV, hCoV-OC43, MHV, SARS-CoV and MERS-CoV. 1 A new infectious disease CoViD-19, caused by a new severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2; also referred to as hCoV- 19), was first identified in 2019 in Wuhan, China, 2 and has since spread through interpersonal contacts globally. The World Health Organization officially declared a world 2019-20 coronavirus pandemic on March 11, 2020. 3 The rate of deaths per number of diagnosed cases varies on a daily basis and currently ranges from 0.2% to 15%, depending on age and other health problems. 4 If SARS-CoV-2 continues to adapt genetically, initiating person-to-person transmission, it will presum-ably spread like wildfire throughout the globe. Such pandemic might well arise by a still-undiscovered mechanism, making coronavirus fundamental research and therapeutic product development urgent. The US FDA -United States Food and Drug Administration is testing coronavirus treatments, including hidroxychloroquine and chloroquine, by looking at widespread clinical trials of the drugs. 5 As initial therapy for patients infected with SARS-CoV-2, the use of Hcq is, presumably, more preferential than the use of Cq. 6 According to the sources from the pharmaceutical industry, a vaccine will be available on the market in the next 12 to 18 months. 7 SARS-CoV-2 binds the angiotensin-converting enzyme 2 (ACE2) at the surface of respiratory cells. There are biophysical and structural evidences that the SARS-CoV-2 S protein binds ACE2 with higher affinity than does SARS-CoV S. 8 The shape of SARS-CoV-2 S and the size of the protein (SARS-CoV-2 S)/protein (ACE2) interaction interface make the bi-protein complex unsuitable as a mo-Mitrasinovic: Prediction of Single Point Mutations in Human ... lecular model for exploring competitive inhibition mechanism, by which a small ligand (such as Hcq) is supposed to interfere with the binding of beta CoV S glycoproteins to host receptors, particularly in terms of affinity and specificity. Structural and molecular modeling studies have suggested that the SARS-CoV-2 S protein displays two distinct domains, the receptor-binding domain (RBD) that interacts with ACE2 and the N-terminal domain (NTD) that interacts with the ganglioside-rich domain of the plasma membrane. 6 Amino acid residues (111-158) of the NTD of SARS-CoV-2 S define a functional ganglioside-binding domain (GBD) being completely conserved in clinical isolates worldwide. Hcq has been found to bind sialic acids (linked to gangliosides) with pronounced affinity, indicating its potential to block the interaction of GBD with lipid rafts. 6 Therefore, specificity issue underlying the dual recognition of gangliosides and ACE2 by SARS-CoV-2 S has not been raised.
One of the most representative CoV prototypes of this genus that causes common cold and pneumonia in elderly population, as well as severe lower respiratory tract infection in patients with compromised immune system is hCoV-OC43. 1,9 Based on some structural knowledge of hCoV-OC43 S, 10 in the present communication, the recognition modes of the hCoV-OC43 S protein by 9-O-Ac-Sia and Hcq, which share the same binding site (Figure 1), are investigated using molecular docking simulations. A computer-based protein engineering approach 11,12 is employed to predict single point mutations to which the virus is prone during its genetic adaptation. A question, how the most likely SPMs affect the affinity and specificity of the hCoV-OC43/9-O-Ac-Sia and hCoV-OC43/Hcq interactions, is consequently analyzed.

Methods
To obtain the initial coordinates of atoms, the experimental structure of the trimeric hCoV-OC43 S in complex with 9-O-Ac-Sia (PDB ID: 6NZK) was retrieved from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). 10 The FoldX molecular design toolkit version 5.0 13-21 was employed to systematically perform single point mutations of each residue being in the binding site of hCoV-OC43 S. FoldX provides a fast quantitative estimation of the importance of the interactions contributing to the stability of proteins and protein complexes. The algorithm uses a full atomic description of the structure of the proteins, while different energy terms taken into account are weighted using empirical data obtained from protein engineering experiments. The code mutates one amino acid to the other 24, including phosphorylated Tyr, Ser and Thr, as well as hydroxyl Proline in addition to 20 standard amino acids, and repairs the neighbor residues. The way it functions is: it mutates the selected position to Ala and anno-tates all neighbor residues; it mutates the wild-type (wt) residue to itself, and then the neighbors to themselves followed each time by the wt residue to itself. In this way it ensures that, when mutating, any residue that has not been moved in the wt reference will not move. Once this is done, the new wt reference is mutated at the selected position to the target amino acids. To prevent problems, neighbor side chains are only optimized when creating the wt reference after self mutation, but not when making the individual mutants unless a new rotamer for the neighbor is selected. [11][12][13][14][15][16][17][18][19][20][21] The predictive power was tested on a very large set of point mutants (1088) that comprise most of the structural environments found in proteins. 22 A training database of 339 mutants in nine different proteins was initially considered and the set of parameters and weighting factors that best accounted for the changes in stability of the mutants was optimized. The predictive power was then tested using a blind test mutant database of 667 mutants, as well as a database of 82 protein-protein complex mutants. The global correlation for 95% of the entire mutant database (1030 mutants) was 0.83, with a standard deviation of 0.81 kcal mol -1 and a slope of 0.76. 22 In the present work, single point mutations of the binding site residues in the hCoV-OC43 S protein were predicted at pH 8, which was used for the experimental determination of the starting structure (PDB ID: 6NZK). 10 It was shown that pH acidification of the medium, such as the one occurring in the endosomal compartment, does not trigger hCoV-OC43 S fusogenic conformational changes. 10 Docking calculations were performed using the AScore/ShapeDock protocol from the ArgusLab 4.0.1 suite of programs. 23 AScore is based on the decomposition of the total protein-ligand binding free energy into the following contributions: the van der Waals interaction between the ligand and the protein, the hydrophobic effect, the hydrogen bonding between the ligand and the protein, the hydrogen bonding that involves charged donor and/or acceptor groups, the deformation effect, and the effects of the translational and rotational entropy loss in binding process, respectively. Flexible ligand docking was done by describing the ligand as a torsion tree. Groups of bonded atoms that do not have rotatable bonds are nodes, while torsions are connections between the nodes. Topology of a torsion tree is a determinative factor influencing efficient docking. The AScore/ShapeDock protocol is fast, reproducible and formally explores all energy minima. 23 This particular protocol was shown to be very consistent for docking ligands into the crystal structures of viral proteins, 24-28 while the calculated binding free energies were well-correlated with the experimental inhibitory concentrations. 26

Results and Discussion
Infections are mediated by the transmembrane S glycoproteins, binding to host receptors and fusing the viral and cellular membranes. 10 Cell surface components that contain 9-O-Ac-Sia are recognized by hCoV-OC43 S. 29,30 The interacting site of hCoV-OC43 S (PDB ID: 6NZK) contains the following amino acids: Asn27, Asp28, Lys29, Asp30, Thr31, Gly32, Leu80, Lys81, Gly82, Ser83, Val84, Leu85, Leu86, Trp90 and Phe95 (Figure 1). These residues are mainly located in two loops, L1 (27-Asn-Asp-Lys-Asp-Thr-Gly-32) and L2 (80-Leu-Lys-Gly-Ser-Val-Leu-Leu-86). The P1 hydrophobic pocket is defined by Leu85, Leu86 and Trp90 (Figure 1), accommodating the 9-O-Ac-Sia methyl (Figure 2, center). The P2 hydrophobic pocket is defined by Leu80, Trp90 and Phe95 (Figure 1), suiting the 5-N-acetyl methyl (Figure 2, center). The 9-O-acetyl carbonyl makes a hydrogen bond with Asn27 (2.85 Å, Figure 2, right), while the 5-N atom makes a hydrogen bond with the Lys81 backbone carbonyl (2.79 Å, Figure 2, right). The 9-O-Ac-Sia C1-carboxylate makes both a salt bridge with the Lys81 side chain amine (3.24 Å, Figure 2, right) and a hydrogen bond with the Ser83 side chain hydroxyl (2.71 Å, Figure 2, right). Thus, the specificity of the hCoV-OC43/9-O-Ac-Sia complex formation is Asn27Lys81Lys81Ser83. Evaluated strength of the hCoV-OC43/9-O-Ac-Sia interaction, seen through a dissociation constant K d = 48.7 μM (Table 1), is in agreement with an experimental observation of 49.7 ± 10.7 μM, 10 indicating rapid S-mediated virion attachment, especially in high-density receptor environments. 31 The binding site is conserved in all CoV S glycoproteins that associate with 9-O-acetyl-sialoglycans, including hCoV-OC43 S, hCoV-HKU1 S, bCoV S and PHEV S. 10 The particular topology of the residues that is similar to those of the ligand-binding pockets of CoV hemagglutinin esterases (HEs) and influenza virus C/D hemagglutinin-esterase fusion (HEF) glycoproteins means that the CoV S glycoproteins share the ligand specificity of influenza C/D HEF but are functionally more compatible to influenza A/B hemagglutinin. 10 Each of the binding site residues is mutated to the other 24 (20 standard amino acids, phosphorylated Tyr, Ser and Thr, as well as hydroxyl Proline). All the single point mutants are energetically evaluated with reference to the original receptor (PDB ID: 6NZK). SPMs that stabilize the wt receptor structure for more than 2 kcal mol -1 are extracted as likely ones (Table 1). An average level of thermochemical accuracy of 2 kcal mol -1 is acceptable for the structure-based drug (or ligand) design purposes. [32][33][34][35][36][37][38][39][40][41][42][43][44] A careful inspection of the values in Table 1 shows that the SPM Thr31Met stabilizes the wt/9-O-Ac-Sia complex to a largest extent. The introduction of Met31 ( Figure  3), instead of Thr31 (Figure 2), in the binding site is asso-   ciated with the horizontal conformational flip of 9-O-Ac-Sia, making the 9-O hydrophobic side chain face outside of the active cavity (Figure 3, left). The P2 (Leu80, Trp90 and Phe95) hydrophobic pocket (dots) accommodates the 5-N-acetyl methyl (Figure 3, left). The 9-O-Ac-Sia C1carboxylate makes a hydrogen bond with the Lys29 side chain (3.00 Å, Figure 3, right); the 9-O-acetyl carbonyl makes two hydrogen bonds with the Lys81 and Ser83 side chains (3.00 Å and 2.38 Å, Figure 3, right), while the 9-O makes a hydrogen bond with the Ser83 side chain (2.87 Å, Figure 3, right). Thus, Thr31Met changes the specificity, Asn27Lys81Lys81Ser83, of the original hCoV-OC43/9-O-Ac-Sia complex to Lys29Lys81Ser83Ser83. The specificity difference is in the substitution of an Asn by a Ser, that is, in the substitution of a small amino acid by a tiny one. Met31 is a larger and more hydrophobic amino acid than Thr31 being almost indifferent in terms of its hydrophobicity. Among aliphatic amino acids, Met is unique by having a sulphur atom in its side chain. A water molecule in bulk water can rotate in many directions. The presence of a nonpolar residue such as Met in the binding site restricts the movement of the water, causing an entropy loss that can be regained through the hydrophobic effect and the release of protein-bound water molecules. The chemical structure of Hcq (Figure 4, left) contains an aromatic core scaffold to which both a Cl atom and a large side chain are bound, indicating a pronounced hydrophobic character of the ligand. The binding mode obtained by docking Hcq in the binding site of hCoV-OC43 S (PDB ID: 6NZK) is illustrated in Figure 4 (right). The P2 (Leu80, Trp90 and Phe95) hydrophobic pocket is in interaction with the Hcq side chain, while the Hcq aromatic core is oriented outside of the active cavity. Noteworthy is that electrostatic interactions are not involved in formation of the hCoV-OC43/Hcq complex.
The values in Table 1 illustrate that the SPM Val84Arg is the most stabilizing one in the wt/Hcq complex. The introduction of Arg84 ( Figure 5), instead of Val84 (Figure 4), in the binding site causes the vertical conformational  The values in Table 1 indicate that Thr31Met, which is the most likely SPM for the binding of 9-O-Ac-Sia, is a very likely one for the binding of Hcq as well. The hydrophobic nature of the Thr31Met/Hcq interaction mode ( Figure 6) is similar to that of the Val84Arg/Hcq interaction mode ( Figure 5) qualitatively.
Comparing the binding free energies (Table 1) of the complexes, formed by docking 9-O-Ac-Sia and Hcq to the same receptor such as the Thr31Met mutant of hCoV-OC43 S, shows that Hcq has slightly higher affinity to hCoV-OC43 S than does 9-O-Ac-Sia. This standpoint can be accounted for by the almost-pure hydrophobic recognition of hCoV-OC43 S by Hcq with very low specificity. However, rapid kinetics underlying S-mediated virion attachment to 9-O-Ac-Sia is associated with recognition being hydrophobic and very specific at the same time. In view of this, it is useful to contrast the chemical structures of 9-O-Ac-Sia and Hcq and to consider entropy loss upon ligand binding. Flexible docking of ligand is based on active torsions in ligand structure, which can be conceivable as particular sp 3 bonds that are directly involved in finding the lowest energy receptor/ligand conformations. Entropy loss due to ligand binding is related to the loss of its degrees of freedom. The torsional potential indirectly takes care of the particular entropy amount by being proportional to the number of active torsions in ligand structure.
An active torsion has been estimated to cost circa 0.3 kcal mol -1 energetically. 45,46 It means that the structure of 9-O-Ac-Sia having eleven active torsions experiences a negative entropy change that roughly costs -3.3 kcal mol -1 . By having only one active torsion in its structure, the binding of Hcq is associated with a tiny entropic decrease of about -0.3 kcal mol -1 . The dissociation constants of the hCoV-OC43/Hcq complexes are in the micromolar (10 -6 M) range (Table 1), which is acceptable for hit ligand molecule but not for drug candidate. Knowing that the binding site is conserved in all CoV S glycoproteins that associate with 9-O-acetyl-sialoglycans, 10 the very low specificity of Hcq to hCoV-OC43 S does not speak in favor of the potential of Hcq to be a drug candidate.