Modelling the Correlation Between Molecular Electrostatic Potential and pKa on Sets of Carboxylic Acids , Phenols and Anilines

Calculations of molecular electrostatic potential were correlated with experimental pKa values for different sets of acidic molecules (carboxylic acids, phenols, and anilines) to obtain linear relationships of variable quality. A single tri-parameter model function was constructed to describe the pKa dependence on MEP maxima together with two automatically generated molecular descriptors, namely the counts of carboxylic acid and amine functional groups.


Introduction
Any distribution of the electrical charge in a molecule creates an electrostatic potential V(r) at each point of the surrounding space r: 1

formula (1)
7] The background of the correlation is due to the fact that MEP is a quantum-mechanical descriptor influenced by electronic or stereoelectronic properties of the groups in close vicinity to a particular acidic atom.Similarly, NAO (the sum of valence p natural atomic orbitals) has also been used for such calculations and it has been proven that this descriptor is equivalent to MEP. 8 Other examples of descriptors in QSAR models that have already been successfully used for determining pKa values are topological 9 , atom type descriptors 10 , and group philicity 11 .
The correlation was established by comparing the calculated MEP values via different theoretical methods with the experimentally determined pK a values for the same sets of compounds.The success of the method consequently depends on the training set used in the program learning process, and the similarity of the studied compounds. 7 relatively good correlation between the maximum in MEP and the pK a value was found for amines, anilines, carboxylic acids, alcohols, sulfonic acids, and tioles. 12urthermore, a good correlation was established recently between pK a and the most positive value of the MEP of benzoic acids and phenols, and between the most negative value of MEP and the local ionization energies of these compounds. 13Nevertheless, all the published work has studied only one set of similar compounds and to the best of our knowledge, a model that could be applied to the wider array of differently functionalized molecules has not yet been presented.
Since the knowledge of the pK a values is vital for predicting the reactivity of the compounds in question and is therefore of broad interest in drug design, toxicology studies, chemical synthesis etc., it is desirable to extend the current knowledge to other compounds.
This paper describes the correlation between pK a values and MEP for certain tested carboxylic acids, phenols, and anilines and additional physico-chemical descriptors, which are introduced in a regression model to improve the correlation.
The starting molecular conformations were formed using the Merck molecular force field (MMFF) optimization. 14Further, semi-empirical methods AM1, and PM6, ab initio Hartree-Fock method HF 6-31G*, and a density functional theory method B3LYP 6-31G* were used to calculate the MEP. 15everal isoelectronic densities (0.2, 0.02, 0.002, and 0.0002) were examined for a linear correlation between the maximum in MEP, V max , and pK a values obtained from the literature. 16,17The best correlation was obtained for the isoelectronic value of 0.02, which was used in the further calculations.
To improve the correlation between pK a and calculated descriptors from MEP, CANVAS's ligfilter descriptors were included into the calculation of multiple linear regression analysis with CANVAS software from Schrödinger 2015 Suite.These descriptors denote the presence/absence of certain functional groups, which are important for the stability of the molecules studied.The model was constructed using V max and two best suited ligfilter descriptors, which count the number of certain functional groups.
For the validation of the model, a group of 87 compounds was divided into two sets.The first set of compounds (named training set) was composed of 70% randomly selected compounds, which were used for building a model.The second one (testing set) contained the remaining 30% of compounds used for the validation of the prediction capacity of the model.
The best subset of descriptors for the construction of the model was selected with CANVAS's simulated annealing 19 algorithm with performing maximum of 1000 MC steps, initial and final temperature which corresponds to standard deviation of y being 0.5 and 0.05 respectively.

Results and Discussion
The results for the correlation between V max in MEP for the isoelectronic density 0.02 and pK a for phenols (a), anilines (b), aliphatic (c), and benzoic acids (d) are given below.Multivariable analysis of the compounds studied is presented in subsection (e).
a) The correlation between V max and pK a for the chosen set of monosubstituted phenols is shown in Figure 1.The results obtained with different methods are shown with different symbols/colours.Regardless of the used method for calculating the MEP, the correlation coefficient, R 2 , is around 0.8.If we disregard the values for nitrophenols calculated via PM6 method, the slopes of the fitted curves are approximately the same, indicating that the results are not dependent on the method of calculation.
b) The results for the anilines are shown in Figure 2.For the determination of the correlation for the set of anilines, we modelled the V max on protonated molecules (ani-

linium cations).
As with the set of phenols, the value of R 2 being higher than 0.8 shows good correlation between calculated and experimental properties.c) In the case of aliphatic acids, a poor correlation between V max and pK a was obtained (Figure 3).The correlation coefficient R 2 ranged from 0.395 to 0.588, depending on the method of calculation, the semi-empirical PM6 method having the lowest score.Poor correlations could be the consequence of structural diversity of the test set, and its larger pK a range span. 15However, to indisputably prove this speculation, further calculations on a larger set of systematically substituted aliphatic acids would be required.
d) The results for the correlation between V max and pK a for the benzoic acids studied is shown in Figure 4.As seen, the values of V max and pK a for the set of mono-substituted benzoic acids are poorly correlated.We speculate that the solvent effect, which was not specifically accounted for in the calculations, could be the reason.An evident deviation from linearity can be observed in values for pmethoxybenzoic, which could be due to the stronger reso-nance effect from substituent in para position influencing the MEP in a non-linear fashion corresponding to pK a .The relatively large scatter in the studied pK a range, therefore, suggests a non-linear response of the V max to the electronic effects of some substituents.
As evident from the Figure 5, the obtained linear correlations for the applied molecules can be classified into three groups.By incorporating additional parameters, it should be possible to construct a single model to predict pK a values for all four molecule sets.e) To improve the obtained correlations we made an attempt to include some software-generated descriptors into the multivariable analysis.Using CANVAS, we determined ligfilter descriptors to be the parameters of choice.The inclusion of a number of carboxylic acids or carboxylates (N COO ) and a number of quaternary ammonium groups (N NH4+ ) together with calculated V max provides the possibility of unifying the correlation between chemical descriptors and pK a for different types of molecules (Figure 6).Some deviations are present in the case of dicarboxylic aliphatic acids.As a result, they were excluded from the set for model construction.Using a more complicated calculation method does not notably improve the correlation.R 2 has the same value (0.96) for the models with V max from AM1, HF or B3LYP methods.A slight distinction between models can be seen from the quality of prediction for a test set of molecules.The Q 2 parameter for the models with V max from AM1, HF or B3LYP methods has the values 0.95, 0.92 and 0.92, respectively.Calculated parameters and the equation of the model for the simplest computational method are given in Table 1.The data for models obtained via B3LYP, and HF method is presented in Table S2 (see Supporting Information) for comparison.The implemented descriptors were used as a correction factor to enable the construction of a single model for different sets of compounds.

Conclusions
Good correlations were obtained between the maxima of molecular electrostatic potential and pK a values for anilines and phenols.With the inclusion of certain physico-chemical descriptors, it was possible to unify the correlation between chemical descriptors and pK a for different types of molecules with a single model function.The correlation is satisfying even with the simplest quantum mechanical calculation methods (semi-empirical method AM1).

Figure 1 .
Figure1.The correlation between the V max calculated using different methods at IsoValue 0.02, and pK a values16,17 of phenols.

Figure 2 .
Figure 2.The correlation between the V max calculated using different methods at IsoValue 0.02, and pK a values16,17 of anilines.

Figure 3 .
Figure 3.The correlation between the V max calculated using different methods at IsoValue 0.02, and pK a values16,17 of aliphatic acids.

Figure 4 .
Figure 4.The correlation between the V max calculated using different methods at IsoValue 0.02, and pK a values16,17 of benzoic acids.

Figure 5 .
Figure 5. Experimental pK a values 16,17 as a function of V max (AM1) for all molecule sets.

Figure 6 .
Figure 6.Predicted versus experimental pK a values16,17 for training and testing sets of molecules.

Table 1 .
Coefficients of the model for pK a prediction with V max (AM1) and its statistical parameters.-0.0438 V max -5.16 N coo + 6.18 N NH + 4 + 30.2