Electrostatic and Topological Features as Predictors of Antifungal Potential of Oxazolo Derivatives as Promising Compounds in Treatment of Infections Caused by Candida albicans

The results presented in this study include the prediction of the antifungal activity of 24 oxazolo derivatives based on their topological and electrostatic molecular descriptors, derived from the 2D molecular structures. The artificial neural network (ANN) method was applied as a regression tool. The input data for ANN modeling were selected by stepwise selection (SS) procedure. The ANN modeling resulted in three networks with the outstanding statistical characteristics. High predictivity of the established networks was confirmed by comparisons of the predicted and experimental data and by the residuals analysis. The obtained results indicate the usefulness of the formed ANNs in precise prediction of minimum inhibitory concentrations of the analyzed compounds towards Candida albicans. The Sum of Ranking Differences (SRD) method was used in this study to reveal possible grouping of the compounds in the space of the variables used in ANN modeling. The obtained results can be considered to be a contribution to development of new antifungal drugs structurally based on oxazole core, particularly nowadays when there is a lack of highly efficient antimycotics.


Introduction
Quantitative structure-activity relationship (QSAR) approach is an attempt to remove the trial-and-error element from drug design by using high-quality mathematical relationships which relate measurable physicochemical parameter(s) as independent variable(s) and a biological response (a dependent variable). 1These variables have been correlated in many QSAR studies applying various chemometric regression methods, as linear regression (LR), multiple linear regression (MLR), polynomial regression (PR), artificial neural networks (ANN), partial least squares regression (PLS), principal component regression (PCR), etc. [2][3][4][5][6][7][8] Any high-quality model obtained by aforementioned chemometric techniques may be used by the chemist in order to facilitate the synthesis of more effective drugs.A high-quality QSAR model must be based on reasonable number of tested compounds, characterized by good values of statistical parameters, defined for particular application domain and suitably validated by internal and external validation approaches.Using these QSAR models, it is possible to precisely calculate the theoretical activity of compounds prior to their synthesis, and thus decrease financial expenses and time needed for the experimental work.
The selection of appropriate regression method depends on nature of the variables.ANN method is suitable for correlation analysis when there is a complex relation-ship between the variables, as in the case of biological systems.The complex relationships between biological activity and molecular characteristics are not unusual, since there are many factors which have certain influence on biological effect of a compound, such as lipophilicity, dissociation, molecular weight, presence of polar/non-polar functional groups, conformation, etc.In the present paper, the electrostatic and topological characteristics of benzoxazoles and oxazolo [4,5-b]pyridines were used as predictors of their antifungal activity against Candida albicans.Topological descriptors of a compound can be calculated based on molecular graphs that are hydrogen-suppressed.In these graphs the bonds are presented by edges and atoms by vertices. 9Simple topological descriptors are based on the counting of some specific graph elements (Kier shape descriptors, Hosoya Z index, pat/walk shape indices, self-returning walk counts), but the most common topological descriptors are obtained by using some algebraic operators. 9In QSAR and quantitative structureproperty relationship (QSPR) modeling, the graph-invariants have been effectively used in characterization of the structural similarity and dissimilarity of compounds. 9here is no need for energy minimization of the molecular structure for calculation of topological descriptors.Electrostatic descriptors describe many of the electrical characteristics of molecules, such as polarity, dipole moment, polarizability, ionization energy, etc.These characteristics certainly have an influence on interactions between the molecule and its surroundings, in example interactions with cell membranes, extra-and intercellular molecules.
The most popular classes of molecules that are used in treatment of infections caused by Candida species are polyenes, azoles, analogs of nucleosides, allylamines, etc.In treatment of Candida albicans infections, fluconazole, as one of the members of azoles, is definitely one of the most popular antifungals.According to previous studies, Candida has developed high-level resistance toward some azole antifungal drugs. 10However, some oxazole analogs, such as oxazolo [4,5-b]pyridines and benzoxazoles expressed significant antifungal activity and are considered to be a very good basis for development of new antifungal therapeutics.This study presents our efforts to define sophisticated QSARs that would be limited on prediction of antifungal activity of structurally similar oxazolo [4,5-b]  pyridines and benzoxazoles toward Candida albicans.2][13][14] The ANN method with absorption, distribution, metabolism and excretion (ADME) descriptors was applied for the same purpose as well. 14However, this study explores the importance of topological and electrostatic characteristics of a series of benzoxazoles and oxazolo [4,5-b]pyridines in prediction of their antifungal activity toward Candida albicans.

1. The Studied Series of Oxazolo[4,5-b] pyridines and Benzoxazoles
Structural formulae of the analyzed benzoxazoles and oxazolo [4,5-b]pyridines are presented in Figure 1.The analyzed compounds possess various types of substituents/functional groups, including tert-butylphenyl, ethylphenyl, dimethyl, chlorophenyl, nitrophenyl, fluorophenyl, methoxyphenyl, ethoxyphenyl and acetamide groups.The experimental results of determination of the antifungal activity of studied derivatives against Candida albicans MTCC 183 are given in literature. 15Antifungal activity in the form of minimum inhibitory concentration (MIC), that was used in further QSAR-ANN modeling, was mathematically transformed in the form of logarithm of the MIC reciprocal value, log(1/c MIC ).

Electrostatic and Topological Descriptors Calculation
The set of 35 electrostatic and 10 topological descriptors was calculated by using PreADMET online software. 16he structural optimization and energy minimization were not required since the molecular descriptors were calculated on the basis of 2D structures.The values of the calculated descriptors are shown in Supplementary data (Table S1).

3. Chemometric Methods
The first step in chemometric analysis was the selection of the most appropriate descriptors which will be used as inputs in ANN modeling.For this purpose, stepwise selection (SS) procedure was applied by using NCSS statistical software. 17In the SS procedure minimum change in the root mean square error (RMSE) was used as a measure for removing or adding variables.In the present analysis, the limit of RMSE change was set at 0.05.
Artificial neural networks are a non-linear chemometric tool.They have been widely applied in modeling of complex relationships between different type of variables, which is usually the case in prediction of biological activity of many biologically active compounds.An ANN consists of several layers: the input layer, one or more hidden layers, and one output layer. 18The ANNs were trained applying the feedforward multilayer perceptron (MLP) ANN function with Broyden-Fletcher-Goldfarb-Shanno (BFGS) learning algorithm in Statistica 10.0 software. 19The data normalization was carried out by min-max normalization method. 20,21 ior to ANN modeling, the analyzed compounds were divided into the training set (compounds The estimation of the contribution of every input variable in a network was done by calculation of Global sensitivity analysis (GSA) coefficients. 22A GSA coefficient describes the ANN's outputs changes that are caused by variations in the parameters that affect the ANN.If the GSA index is higher than 1, the greater change in ANN's performance is achieved with minor variation in the input variable. 22,23 e ANN models' validity was estimated on the basis of the following statistical parameters: R (correlation coefficient), R tr (correlation coefficients of training set), R t (correlation coefficients of test set), R v (correlation coefficients of validation set), RMSE (root mean square error), RMSE tr (root mean square error of training set), RMSE t (root mean square error of test set), RMSE v (root mean square error of validation set), F-test, variation coefficient (VC) and significance level (p).Also, the analysis of residuals and the graphical comparison of predicted and experimental data were carried out in order to estimate predictive ability of ANN models.
SRD method was used as relatively new approach in comparison of samples, compounds, models. 24The purpose of the SRD analysis in this study was to reveal possible similarities or dissimilarities among the analyzed molecules on the basis of their topological and electrostatic descriptors used in ANN modeling.In the SRD analysis the row average values were used as the reference ranking.It is substantially different than hierarchical cluster analysis (HCA) and principal component analysis (PCA) approaches.5][26] The validation of SRD analysis was done by comparison of ranks by random numbers (CRRN) and 7-fold cross-validation. 24

1. The Selection of Suitable Variables -SS Procedure
SS analysis was conducted after the descriptors calculation procedure.The significance level of 0.05 was required for a variable to enter the equation, while the significance level of 0.20 was used as a criterion for removal of variables from the model.The iterations number was set at 500.As a result of SS analysis, the subset of 9 calculated descriptors was formed (Table 1).The selected descriptors, suggested by SS analysis, are the following: RPCS (relative positive charge surface area), PNSA1 (partial negative surface area 1 st type), RNCS (relative negative charge surface area), FNSA1 (fractional charged partial negative surface area 1 st type), Rouvray index, FPSA1 (fractional charged partial positive surface area 1 st type), WI (Wiener index), Gutman 2D MTI (Gutman 2D molecular topological index) and TNC (total negative charge).This subset of 9 descriptors was used as the input variables for further ANN modeling.

2. ANN Modeling and Validation of Models
The ANN modeling resulted into three statistically very good models, whose activation functions and statisti-cal parameters are presented in Table 2.The comparison of statistical quality of the obtained ANNs was done based on these parameters.Exponential (Exp) and tangent (Tanh) functions were used as MLP activation functions for hidden and output neurons.The total number of 150 ANNs was obtained, but only three ANNs were chosen as the best ones.During the training of the networks, the number of neurons in the hidden layer varied in the range of 2-20.The architecture of the obtained ANNs is presented in Figure 2.
Based on the data given in Table 2 it can be seen that the statistical quality of selected ANNs is very similar.The comparison of the ANNs quality was estimated by comparing their R and RMSE values (Figure 3).The comparisons of RMSE and R indicate that the network MLP 9-14-1 make the best concurrence of the data (the highest R) with the lowest RMSE values (Figure 3).The prediction ability of the established networks is tested by the graphical comparison of the predicted and experimental log(1/c MIC ) values (Figure 4).The outstanding concurrence between the predicted and experimental values and small scattering of the points around linear relationship indicate high quality of the obtained models.Also, the slope of this linear relationship is very close to 1 and the intercept is very close to zero.This is another proof of the outstanding predictive ability of the ANNs.The residuals versus predicted log(1/c MIC ) values plots for the established networks are presented in Figure 5.The presented ANN models fit the data well since the residuals behave randomly, which is obvious from the presented plots.The amplitude of the residuals is in acceptable range.The application of the external test set confirmed the quality of the established networks.
Other confirmation of reliability of the obtained networks is individual percentage deviations (IPD%) for experimental-predicted values pairs.Figure 6 shows that all three ANNs have almost all IPD% values lower than 2.0% which indicates acceptable differences between predicted and experimental log(1/c MIC ) values.

Global Sensitivity Analysis
As the result of global sensitivity analysis, the GSA coefficients were calculated by the applied software for ev-ery input variable.A GSA coefficient is presented in the following form:

Conclusions
The conducted variable selection procedure and artificial neural network modeling resulted in three reliable where ERR o is the network error when the observed input variable is omitted and ERR p is the network error when the observed input variable is included in the model.The GSA coefficients for MLP 9-7-1, MLP 9-13-1 and MLP 9-14-1 networks are given in Figure 7.As it is shown in the pie charts, each input variable is described by GSA coefficient higher than 1.This indicates a significance of each input variable, particularly the significance of RPCS, FPSA1 and PNSA1 descriptors (the highest average GSA coefficients).
In comparison with the results of QSAR analysis of oxazolo [4,5-b]pyridines and benzoxazoles previously published in literature, 13,14 the results described in the present paper are based on non-linear prediction of their antifungal activity based on topological and electrostatic descriptors, while in the previous studies 13 the linear modeling (PCR and PLS) of the antifungal activity have been carried out on the basis of some physicochemical and lipophilicity descriptors, as well as non-linear prediction (ANN) of antifungal activity based on some ADME descriptors. 14The presented results emphasized the influence of electrostatic and topological molecular features on the antifungal activity based on the established non-linear models.These models can be considered slightly statistically better than the models presented in literature. 13,14

4. Sum of Ranking Differences Analysis of Oxazolo[4,5-b]pyridines and Benzoxazoles
SRD analysis was carried out on the basis of average row values as the reference value of the variables included in the ANN models (consensus ranking).The results of the SRD analysis (Figure 8) indicate the separation of the compounds into three main groups and detection of one outlier.measures, by comparisons of the experimental and predicted data including the residuals analysis.Applying stepwise selection procedure, the most important electrostatic and topological descriptors were determined: RPCS,

Figure 1 .
Figure 1.The molecular structures of the analyzed oxazole derivatives

Figure 2 .
Figure 2. The general architecture of the established QSAR-ANN models.

Figure 3 .
Figure 3. Comparisons of R and RMSE values of the established networks.

Figure 4 .
Figure 4. Correlations between the experimental and predicted antifungal activity of the analysed compounds.
Kovačević et al.: Electrostatic and Topological Features as Predictors ... The first group (the compounds 2, 10, 13, 17, 19, 20, 21, 22 and 23) is characterized by the ranking number 0, which means that these compounds have the same ranking as the reference one.The second group (the compounds 1, 3, 4, 5, 7, 8, 11, 12, 14, 15 and 16) has the ranking number 2 and third group (the compounds 6, 9 and 24) the ranking number 4.The second and third group are considered to be very close to the reference ranking and contain most of the benzoxazole derivatives.However, the compound 18, with the ranking number 6, is separated from the other compounds, but still can be considered relatively close to the reference ranking in the variable space (since it fits very well in the established QSAR models, it is not considered to be an outlier in the QSAR models).This compound has most of the molecular descriptors that are significantly different from the molecular descriptors of other compounds.Generally, oxazolo[4,5-b]pyridines are placed in the first group, indicating their distinctiveness regarding RPCS, PNSA1, RNCS, FNSA1, Rouvray index, FPSA1, WI, Gutman 2D MTI and TNC molecular features.The compounds 9 and 24 from the third group are specific due to the presence of -NO 2 functional group in their structures.The presented results of SRD analysis of oxazolo[4,5-b]pyridines and benzoxazoles revealed particular similarities/dissimilarities among the analyzed derivatives.This fact could be particularly interesting for further 3D-QSAR and 4D-QSAR modeling and molecular docking studies of antifungal activity of oxazolo[4,5-b]  pyridines and benzoxazoles toward Candida albicans.

Figure 6 .
Figure 6.Individual percentage deviations (IPD%) of predicted values compared with the experimental values.

Figure 7 .
Figure 7.The GSA coefficients of the established ANNs and their average values.

Table 1 .
The results of stepwise selection procedure.

Table 2 .
The results of ST-ANN procedure.