QSAR Studies and Structure Property/Activity Relationships Applied in Pyrazine Derivatives as Antiproliferative Agents Against the BGC823

Electronic structures, the effect of the substitution, structure physicochemical property/activity relationships and drug-likeness applied in pyrazine derivatives, have been studied at ab initio (HF, MP2) and B3LYP/DFT (density functional theory) levels. In the paper, the calculated values, i.e., NBO (natural bond orbitals) charges, bond lengths, dipole moments, electron affinities, heats of formation and quantitative structure-activity relationships (QSAR) properties are presented. For the QSAR studies, we used multiple linear regression (MLR) and artificial neural network (ANN) statistical modeling. The results show a high correlation between experimental and predicted activity values, indicating the validation and the good quality of the derived QSAR models. In addition, statistical analysis reveals that the ANN tech-nique with (9-4-1) architecture is more significant than the MLR model. The virtual screening based on the molecular similarity method and applicability domain of QSAR allowed the discovery of novel anti-proliferative activity candidates with improved activity.


Introduction
Pyrazine is a heterocyclic compound containing two nitrogen atoms in its aromatic ring with molecular formula C 4 H 4 N 2 . 1 it is a symmetrical molecule with point group D 2h .
Pyrazine is less basic than pyridine, pyridazine and pyrimidine. Tetramethyl pyrazine (also known as ligustrazine) is reported to scavenge superoxide anion and decrease nitric oxide production in human polymorph nuclear leukocytes and is a component of some herbs in traditional Chinese medicine. Some pyrazine derivatives contain various pharmacological effects: anti-cancer, antidepressant and anxiolytic, tuberculosis, an anti-diabetic drug and pulmonary hypertension and cardiac valve. [2][3][4][5][6][7] Quantum chemistry methods play an important role in obtaining molecular structures and predicting various properties. To obtain highly accurate geometries and physical properties for molecules that are built from electronegative elements, expensive Ab initio/MP2 electron correlation methods are required. 8 Density functional theory methods [9][10][11][12][13][14] offer an alternative use of inexpensive computational methods which could handle relatively large molecules. [15][16][17][18][19][20] Quantitative structure-activity relationships (QSAR) [21][22][23][24][25] are attempts to correlate molecular structure, or properties derived from molecular structure, with a particular kind of chemical or biochemical activity. The kind of activity is a function of the interest of the user. QSAR is widely used in pharmaceutical, environmental and agricultural chemistry in the search for particular properties. The molecular properties used in the correla-tions relate as directly as possible to the key physical or chemical processes taking place in the target activity. 26 This work is planned to illuminate the theoretical determination of the optimized molecular geometries, MESP, NBO charges of pyrazine compounds. In addition, we calculated important quantities such as the HOMO-LUMO energy gap. 27 Lipinski's 'Rule of Five' 28 as well as other parameters is useful a tools to aid in choosing oral drug candidates. Drug-likeness is described to encode the balance among the molecular properties of a compound that influences its pharmacodynamics, pharmacokinetics and ADME (absorption, distribution, metabolism and excretion) in a human body like a drug. 29 These parameters allow estimating oral absorption or membrane permeability, which occurs when evaluated molecules obey Lipinski's rule-of-five. Other parameters that are included the number of rotatable bonds, molecular volume, molecular polar surface area and the in vitro plasma protein binding.
The present paper deals with a specific organizational form of molecular matter. Other forms are given for example in the References. [30][31][32][33][34] Many different chemometric methods, such as multiple linear regression (MLR), 35 partial least squares regression (PLS), 36 different types of artificial neural networks (ANN), 37-40 genetic algorithms (GA) 41 and support vector machine (SVM) can be employed to deduce correlation models between the molecular structure and properties. At present, we derive a quantitative structure-activity relationship (QSAR) model using multiple linear regression (MLR) as well as artificial neural network (ANN) methods for the series of pyrazine derivatives.
The goal of the present study is to validate a suitable methodology for the accurate prediction of molecular geometries and energetic properties of potentially active compounds, and to determine the best molecular descriptors to be used in conjunction with linear (MLR) and nonlinear (ANN) QSAR models to identify the best candidates for antiproliferative agents against the BGC823. The obtained QSAR models were finally employed to identify biological activities of potentially novel active compounds using in silico screening procedures.

Materials and Methods
All calculations were performed using HyperChem 8.0.6 software 42 and Gaussian 09 program package 43 , Marvin Sketch 6.2.1 software 44 , Molinspiration online database 45 and JMP 8.0.2 software. 46 The geometries of pyrazine and their methyl, ethyl, bromo, fluoro derivatives were fully optimized with ab initio/HF, MP2 and DFT/B3LYP methods, using both basis set 6-311G ++(d,p) and cc-pVDZ integrated with Gaussian 09 program package. The calculation of QSAR proper-ties is performed through the module QSAR properties (HyperChem version 8.0.6), which allows several properties commonly used in QSAR studies to be calculated.
Molinspiration, web-based software was used to obtain parameters such as TPSA (topological polar surface area), nrotb (number of rotatable bonds) and drug-likeness.
Multiple Linear Regression MLR analysis and artificial neural networks ANN were carried out using the software JMP 8.0.2.
The calculated results have been reported in the present work.

1. Geometric and Electronic Structure of Pyrazine
The optimized geometrical parameters of pyrazine with ab initio/HF, ab initio/MP2 and DFT method using 6-311G ++ (d, p) and cc-pVDZ basis set. Results concerning bond length values for pyrazine are listed in (Table 1), bond angles are listed in (Table 2) with the experimental results 47 and charge densities are listed in (Table 3) are following the numbering scheme given in (Fig. 1). The efficiency of the DFT/B3LYP method with cc-pVDZ basis set may be scrutinized by comparison with the results obtained by more elaborate calculations such as ab initio/HF and MP2 methods. A very good agreement between predicted geometries (bond lengths and bond angles) and corresponding experimental data was obtained especially through the DFT/B3LYP results.
From that, we can say that the DFT method is more appropriate for further study on the pyrazine rings. Charge densities calculated by DFT/B3LYP are almost similar to ab initio/HF and MP2 methods. The geometry of the pyrazine is symmetric and planar; as all the dihedral angles are either nearly 0° or 180°, which makes this conforma-Soualmia et al.: QSAR Studies and Structure Property/Activity ... tion more stable. The total atomic charges of pyrazine obtained from NBO charges with DFT/B3LYP and ab initio/ HF and MP2 methods with cc-pVDZ basis set are listed in Table 3. The atoms N have negative charges which lead to an electrophilic attack, the atoms C and H have a positive charges which leads to the preferential site to nucleophilic attack.
The molecular electrostatic potential surface (MESP) is a plot of electrostatic potential mapped on to the constant electron density surface. In the majority of the MESP the maximum negative region which preferred the site for an electrophilic attack is indicated in red color, while the maximum positive region which preferred the site for a nucleophilic attack is symptoms indicated in blue color. 48 MESP has been found to be a very useful tool in the investigation of the correlation between the molecular structure and the physicochemical property relationship of molecules including biomolecules and drugs. [49][50][51][52][53] The MESP surface and contour map of pyrazine ( Fig.  2) show the three regions characterized by red color (negative electrostatic potential) around the tow cyclic nitrogen atoms which explain the ability of an electrophilic at-tack on these positions, also the blue color (positive electrostatic potential) around the four hydrogen atoms which explain that these regions are susceptible for a nucleophilic attack. The green color situated in the middle between the red and blue regions explains the neutral electrostatic potential surface.

2. Substitution Effect on Pyrazine Structure
Calculated values of the two studied series indicated that in the first series methyl and ethyl groups with effects of electron donors,however, in the second series bromo and fluoro groups with effects of electron acceptors in po-sitions C2 and C3 in the same series are given in (Table 4) and (Table 5),the heat of formation, dipole moment (µ) and HOMO (Highest Occupied Molecular Orbital) and LUMO (Lowest Unoccupied Molecular Orbital) energies of pyrazine systems are presented in (Fig. 3), NBO charges of pyrazine derivatives are reported in (Table 6) for the first series and in (Table7) for the second series. This calculation is performed with DFT/B3LYP method using the cc-pVDZ basis set.
is more polarizable and is generally associated with a high chemical reactivity, low kinetic stability and is also termed a soft molecule. 55 For the first series, it was found that electron donors of compound A4 (2-ethyl pyrazine) has the lowest energy gap HOMO-LUMO (0.1958) and compound B3 (2,3-dibromopyrazine) has the lowest energy gap (0.1927) for the second series (Fig. 4).
From HSAB (Hard Soft Acid and Base) principle the lowest energetic gap allows an easy flow of electrons which makes the molecule soft and more reactive, 56 which means that A4 and B3 compounds are the most reactive in the two series of pyrazine derivatives. For each addition of alkyl-substituted, the energy of the HOMO and LUMO increase respectively but the addition of the fluoro, bromo substituted leads to the decrease of the LUMO energy an exception increase of the bromo substituted and decrease of the fluoro substituted of the HOMO. The carbon C2 has the most important positive charge (0.206) in the compound A4 (2-ethyl pyrazine) for the first series, also for compound B3 (2,3-dibromopyrazine) of the second series, the most important positive charges are on carbon C2 (0.102) and C3 (0.102) as shown in (Table 5), these positions C2 and C3 with the important positive charges lead to preferential sites of nucleophilic attack. The compound B3 is predicted to be the most reactive with a smaller HO-MO-LUMO energy gap and with sites of nucleophilic attack, more stable with the maximum value in the heat of formation.
The contour plots of the π like frontier orbital for the ground state of the compound B3 are shown in (Fig.  4).
From the plots, we can observe that the HOMO is a π bonding molecular orbital developed on C5 and C6 atoms, and the LUMO is a π* anti-bonding molecular orbit-  For each addition of methyl, ethyl and fluoro, the heat of formation decreases approximately 6, 12 or 39 (kcal • mol -1 ) respectively but the addition of the bromo group leads to the increase of the heat of formation with 6 (kcal • mol -1 ) approximately.
The Frontier orbitals, the highest occupied molecular orbital (HOMO) and lowest unoccupied molecular orbital (LUMO) are important factors in quantum chemistry 54  al developed on the N1 and C2 atoms. These further demonstrates the existence of the delocalization of the conjugated π-electron system in 2, 3-dibromopyrazine molecule. Dipole moment equal to zero which confirms the symmetry group D 2h of pyrazine. The compound B5 (2, 3-di-fluoropyrazine) also shows a high dipole moment value (2.2435 Debye).

Structure Activity/Property Relationship for Pyrazine Derivatives
For the series of pyrazine derivatives (Fig. 8) we have studied seven physicochemical properties with respect to their anti-proliferative activity against the BGC823 (human gastric cell). 57 The properties involved are: Surface area grid (SAG), molar volume (V), hydration energy (HE), partition coefficient octanol/water (log P), molar refractivity (MR), polarizability (Pol) and molecular weight (MW).
The results obtained using HyperChem 8.0.8 software are shown in Table 8. For example, Fig. 5 shows the favored conformation in 3D of compound 1.   Molar refractivity and polarizability relatively increase with the size and the molecular weight of the studied pyrazine derivatives (Table 8 and fig.6). This result is in agreement with the formula of Lorentz-Lorenz, which gives a relationship between polarizability, molar refractivity and molecular size.
From the obtained results presented in Table 8 and figure 6, we observed that polarizability data and molecular refractivity are generally proportional to the size and the molecular weight of pyrazine derivatives. This explains the congruity of our results with Lorentz-Lorenz expression. For instance, compound 9 and compound 12 show the same maximum values of polarizability (41.91 (Å³)) and refractivity (118.37(Å³)). These compounds have also high values of molecular weight (424.32 uma), and a slight difference in surfaces and volumes.
Hydration energy in absolute value, the most important is that of the compound 17 (14.62 kcal • mol -1 ) and the smallest value is that of the compound 12 (10.63 kcal • mol -1 ). Indeed, in biological environments, the polar molecules are surrounded by water molecules. They have established hydrogen bonds between them.
Hydrophobic groups in pyrazine derivatives induce a decrease of hydration energy.
However, the lipophilie increases proportionally with the hydrophobic features of the substituent. As seen in Table 8, compound 17 is expected to have the highest hydrophilicity, whereas compound number 12 should be  most lipophilic. This implies that these compounds should have poor permeability across the cell membrane. We noticed that compound 17 possess seven hydrogen bond acceptors (HBA) and no hydrogen bond donors (HBD), the presence of hydrophilic groups in this compound result in an increase of the hydration energy. This property explains the ability of these compounds, not only to fix the receptor but also to activate it. Hydration energy measures the degree of agonist character of a potential drug molecule.
Almost (log P) of studied molecules have optimal values. For good oral bioavailability, the log P must be greater than zero and less than 3 (0 < log P < 3). For very high values of log P, the drug has low solubility and for very low values of log P; the drug has difficulty penetrating the lipid membranes. Thus, compound 17 has the most important hydration energy and the optimal value of log P, the small value of molecular weight leading to better distribution and solubility in fabrics, good oral bioavailability and permeability in cellular membranes respectively (Fig. 7).

4. Drug-Likeness Screening Applied in Pyrazine Derivatives
We have applied rules of thumb and calculated metrics of eighteen derivatives of pyrazine (Fig. 8) taken from literature with their anti-proliferative activity against the BGC823. 57 The properties involved are: octanol/water partition coefficient (log P), molecular weight (MW), hydrogen bond donors (HBD), hydrogen bond acceptors (HBA), number of rotatable bonds (NRB) and polar surface area (TPSA). All the results have been calculated using Hyper-Chem 8.0.8 and Marvin Sketch 6.2.1 software, which are listed respectively in Table 9, we have studied Lipinski and Veber rules to identify "drug-like" compounds: 58,59 (1) There are less than 5 H-bond donors (expressed as the sum of OHs and NHs).
(2) The molecular weight is under 500 DA.
(3) The log P is under 5.
(4) There are less than 10 H-bond acceptors (expressed as the sum of Ns and Os).   All the compounds of the series have the MW under 500 DA, thus they can easily pass through the cell membrane and the better the absorption will be.
There are less than 10 H-bond acceptors and 0 H-bond donors, the fat solubility will be high and therefore the drug will be able to penetrate the cell membrane to reach the inside of the cell. If two of these rules are unsatisfied, the compound will have a problem in absorption and permeability. 60 TPSA of pyrazine derivatives was found in the range of 52.325-65.217 A°2 and is well below 140 Å 2 , indicating that these compounds should have good cellular plasmatic membrane permeability. All the screened compounds were flexible, especially; compounds 9 and 11-14 which have 5 rotatable bonds (table 9).

5. Quantitative Structure-Activity Relationships Studies (QSAR) of Pyrazine Derivatives
When chemical or physical properties and molecular structures are derived from numbers, it is often possi-ble to propose mathematical relations connecting them, which allow making quantitative predictions. The obtained mathematical expressions can then be used as a predictive means of the biological response for similar structures. They are widely used in the pharmaceutical industry to identify promising compounds, especially at the early stages of drug discovery. 61 Relationships between the physicochemical properties of chemical substances and their biological activities can be derived using QSAR (Quantitative Structure-Activity Relationships) concept. These models can also be used to predict the activities of new chemical entities and for their design. 62 therefore, the biological activity is quantitatively expressed as the concentration of substance necessary to obtain a certain biological response. For that purpose, multiple linear regression, MLR, and artificial neural networks (ANNs) are used. The accuracy of such models is mainly evaluated by the correlation coefficient R 2 . 63 The MLR and ANN models were generated using JMP 8.0.2 software.
The equilibrium geometries and the highest occupied molecular orbital energy (E HOMO ) and lowest unoccupied molecular orbital energy (E LUMO ) and dipole moment (µ) of pyrazine derivatives were determined at the B3LYP/cc-pVDZ level of theory. We list in table 10 of the supplementary material the Cartesian coordinates of the optimized pyrazine derivatives equilibrium structures. Then, the QSAR properties module from Hyper Chem 8.08 was used to calculate: molar weight (MW), surface area (SAG), volume (V), molar refractivity (MR), polarizability (Pol), octanol-water partition coefficient (log P) and hydration energy (HE).

5. 1. Multiple Linear Regression (MLR)
Despite being the oldest, MLR 64,65 still remains one of the most popular approaches to build QSAR models. This is due to its simple practicaluse, ease of interpretation and transparency. Indeed, the key algorithm is available and accurate predictions can be provided. 66 The values of the calculated descriptors are those listed in Table 10. Data were randomly divided into two groups: a training set (internal validation) and a testing set (external validation) at a ratio of 80:20. A correlation matrix between parameters was performed on all nine descriptors. Nevertheless, the analysis revealed six independent descriptors for the development of the model. The significant correlation analysis between biological activity and descriptors is represented by the following equation: pIC50 BGC823 = -6.878+0.0115 V-0.0134HE + 0.1763MR-0.0087 SAG-0.004355MAG-0.5185Pol-15. 46 (1) Where, pIC50 is the response or dependent variable (V, HE, MR, SAG, MAG, Pol, EHOMO, E LUMO and µ) are descriptors (features or independent variables). Within the regression, the coefficients in front of these descriptors are optimized.
The F value (F = 11.84) was found to be statistically significant at 95% level, since all the calculated F value is higher as compared to tabulated values.
For validation of the model, we plot in Fig. 9 the experimental activities against the predicted values as determined by equation (1). We can observe that the predicted pIC50 values are in an acceptable agreement and regular distribution with experimental ones with correlation coefficient (R 2 ) for the training set (R 2 inter = 0.955) and test set (R 2 ext = 0.930) indicate the significant correlation between different independent variables with anti-proliferative activity against the BGC823.

5. 2. Artificial Neural Networks
ANN [67][68][69][70] is a popular nonlinear model, used to predict the biological activity (i.e. IC50) of the datasets of therapeutic molecules. It presents several benefits like better prediction, adaptation and generalization capacity beyond the studied sample, and better stability of the coefficients. It is employed in complex drug design, drug engineering and medicinal chemistry domains. 71 In this work, the neural network is a system of fully interconnected neurons arranged in three layers. The input layer is made of nine neurons, where each of them receives one of the nine descriptors selected from the correlation matrix of the model. The intermediate (hidden) layer is composed of four neurons that form the deep internal pattern that discovers the most significant correlations between predicted and experimental data. One neuron constitutes the output layer, which returns the value of pIC50 (Fig.  10). 72 As it can be seen in Fig. 10, a good agreement between experimental data and predicted pIC50 issued from the ANN model is observed. Indeed, the statistical parameters for this model, reveal a correlation coefficient close to 1 (= 0.995), indicating that the ANN one is more reliable. Furthermore, the robustness of the model was further confirmed by the significant value of the test data set (= 0.920).

5. Virtual Screening Application
The aim of this study is to identify new structures of pyrazines 73 with improved anti-proliferative activity against BGC823 that has to be within the applicability do-

Conclusion
The present work deals with the molecular properties of pyrazine. The HF, MP2 and DFT methods, the DFT method is more appropriate for further study on pyrazine rings. The geometry of the pyrazine is symmetric and planar, as all the dihedral angles are either nearly 0° or 180°, which makes this conformation more stable. The compound B3(2,3-dibromo pyrazine) is predicted to be the most reactive with a smaller HOMO-LUMO energy gap of all pyrazine systems, C2 and C3 positions are the most preferential site of nucleophilic attack.
Afterward, we showed that both ANN and MLR methods provide similar QSAR model accuracy. As can be seen in Table 11, the ANN network has substantially better predictive capabilities compared to MLR, leading to pIC50 values closer to the experimental determinations. Nevertheless, both models remain satisfactory and exhibit a high predictive power, thus validating their use to explore and propose new molecules as anti-proliferative activity against the BGC823.
Based on the obtained QSAR equation we have identified a series of potential novel compounds of pyrazine. This series has been used as a primary step for predicting the anti-proliferative activity against the BGC823. It is worth testing the reliability of these predictions in vitro, our work should help in identifying new compounds targeting anti-proliferative activity against the BGC823.