Elements of an Universal Matrix as Topological Indices for Physicochemical Properties of Octanes

Some of the elements of the Universal matrix and their combinations are useful topological indices of physicochemical properties of octanes. Whereas some of the single elements of the Universal matrix give rise to 0.70 < |R| < 0.99, mutually optimized combinations of only four to six out of 56 of them in the Universal matrix of octanes give rise to R > 0.99 and in the worst cases to R > 0.98. Also a new measure of goodness of correlation, the information content in the topological index, IC (%), is introduced. Structural interpretation of some of the physicochemical properties of octanes is demonstrated as well as of the contribution by the most useful elements of the Universal matrix.


Introduction
Mathematical methods occupy an eminent place in the field of prediction of properties and activities of chemical compounds, and even materials. These methods, known under the acronym QSPR/QSAR (quantitative-structure-property or structure-activity relationship) use also graph-theoretical descriptors, where molecules are seen as chemical graphs, i.e. as a set of vertices attached to each other by a set of non-metrical connections. 1 These descriptors are proposed as topological indices. They are the simplest means of describing the structure of a molecule, characterizing it by a single number. 2 There is known a plethora of topological indices. [3][4][5][6][7][8][9] After their compilations, a huge number of new ones has been described and new and new ones are being developed, cf. e.g. 10,11 A substantial part of topological indices is derived from one or another matrix associated with molecular topology. Ivanciuc 12,13 presented the Dval matrix and its characteristics, and we have shown 14 that this matrix represents a step in unification of several matrices which had been used to derive topological indices, i.e. of the adjacency matrix, the distance matrix, the reciprocal di-stance matrix, etc, being thus an Universal matrix. The characteristics of some groups of topological indices derived by means of this generalized vertex-degree vertexdistance matrix have been studied and there was demonstrated the usefulness of some of those new topological indices. 14 The well known topological indices W, 15 RW, 16 χ, 17 for example, are composed of the one half of the sum of all 56 matrix elements u ij (a, b, c) of the Universal matrix, where at W: 15 (a, b, c) = (0, 0, 1); at RW: 16 (a, b, c) = (0, 0, -1); and at χ: 17 (a, b, c) = (-½, -½, -∞).
There arose the question whether particular elements of the Universal matrix as well as their combinations are good topological indices or not. It has been demonstrated that although particular elements of the Universal matrix are not invariant to molecular labelling, they are invariant regarding the structural features of octanes, and the topological indices, which are not invariant to molecular labelling give rise to better correlations than the topological indices, which are invariant to molecular labeling. 18 For this reason, the elements of the Universal matrix and their mutually optimized combinations have been systematically studied and the results are presented here.

Data and Definitions
The origin of data of physicochemical properties (PCP), as well as the notations of octanes have been presented elsewhere. 14 The data are presented in Appendix 1. Correlation between physicochemical properties of octanes used in present study is presented in Appendix 2.
Grouping of physicochemical properties of octanes by their intercorrelation in Appendix 2 and put into subgroups according to the correlation coefficient with the best topological indices (TI) based on grid values of exponents 14 in TI(a, b, c) are presented in Table 1.
For demonstration of usefulness of elements of the Universal matrix as well as of their combinations, there was chosen in Table 1 from subgroup 1a MON as a physicochemical property having the best correlations with previously tested topological indices. As a less good example was taken from the subgroup 1b Tc 2 /Pc representing the van der Waals parameter a 0 with constants omitted. From the group 2, BP was chosen. From subgroup 2,3b, Tc was selected and from the subgroup 3a n D . As two of the worst cases were chosen from the group 4 dc, and from the group 5 logVP.

1. Universal Matrix and its Elements
The Universal matrix 14 U(a, b, c) (first described by Ivanciuc 12,13 as the Dval matrix) has its elements defined here as follows: where v i and v j are the vertex degrees of vertices i and j, d ij is the distance between them. Each element of the Universal matrix is a function of exponents on vertex degrees and vertex distances, u ij (a, b, c) = f(a, b, c). For easier comparison, the Uni-versal matrix relating to 2,3-dimethylhexane and used here is presented in Appendix 3. The relation between matrix elements from the left side of the Universal matrix and from its right side is simple, for example: u 52 (a, b, c) = u 25 (b, a, c). The elements of the Universal matrix, which contain the factor 1 a or 1 b resp. 1 c are given in the form demonstrated here for u 32 (a, b, c) ≡ u 32 (a, b, 1 c ) to demonstrate that the factors 1 a or 1 b resp. 1 c do not influence the usefulness of the topological index.

Exponent Values
The first step to assess the usefulness of elements of the Universal matrix is the goodness of their correlation with the physicochemical properties of octanes.
To assess where approximately the maxima in absolute values of correlation coefficient R are positioned in the space of exponents a, b, and c, a 3D grid of values of exponents was applied and the values of correlation coefficients at those combinations of values were derived. The exponent values -5, -4, -3, -2, -1, -0.5, -0.3, -0.2, -0.1, 0, 0.1, 0.2, 0.3, 0.5, 1, 2, 3, 4, and 5 were chosen as the grid points in all three dimensions of exponents.
The true maximum of the correlation coefficient can then be approached by exponent optimization using also two-digit and, if necessary, three-digit values of exponents besides the grid values. The values of exponents were limited to at most three decimals.

3. Goodness of Topological Indices
To illustrate the potential goodness of topological indices, the correlation coefficient R and standard error S are generally used. Here is proposed also another quantity, the information content (IC) about the physicochemical property in question contained in the topological index (index combination) in question. The information content (IC) in the topological index (index combination) in question about the physicochemical property (PCP) of octanes in question is defined as follows: where PCP exp means experimental PCP data of octanes, PCP calc those calculated from the topological index (index combination) values, and PCP av is the average of PCP exp .
To the experimental PCP data of octanes, PCP exp , is ascribed the information content IC = 1, whereas to the average of PCP exp data of octanes, PCP av , is ascribed the information content IC = 0 since PCP av does not contain any information about the contribution of branching in octanes to the value of PCP in question.
The value of IC contributed by particular matrix elements in the index combination is given normalized in such a way that the sum of all particular IC is equal to the value of IC of the topological index combination.

4. Topological Index Combination
To assess the usefulness of the topological index combination (TI comb ) composed of two or several elements of the Universal matrix the approach: TI comb = ∑ u ij (a i , b j , c ij ) × k ij was used, where ∑abs(k ij ) = 1 and 0 < abs(k ij ) < 1 and the exponents a i , b j , c ij as well as the smallest k ij have two significant digits. The exponents a i , b j , c ij as well as the factors k ij are mutually optimized to reach the highest R value possible.

Results
As the first step to assess the usefulness of particular elements of the Universal matrix, u ij (a, b, c), as topological indices is the goodness of their correlation with the physicochemical properties of octanes.
The best correlations between tested physicochemical properties (PCP) and u ij (a, b, c) elements using grid values of exponents are presented in the form | |R max grid | | (PCP, u ij ) as follows: The usefulness of particular elements of the Universal matrix increases on going from grid values of exponents to two-digit values of them as well as on using mutual optimization of combination of two or more matrix elements using two-digit values of exponents. This is demonstrated in the case of octanes in Tables 2, 3, and 4 for physicochemical properties MON, Tc 2 /Pc, BP, n D , Tc, dc, and logVP.
In Tables 2 through 4    Extrapolation of the best observed regression data to the structure of 2,2,3,3-tetramethylbutane indicates that its missing MON value would be around 98.5, and if 2,2,3,3-tetramethylbutane would exist at normal pressure and 20 °C in the liquid state, it would have n D of around 1.429 and logVP of around 3.51.
The illustration, which elements of the Universal matrix, values of their exponents, and their relative contribution give rise to the values presented in Table 2  The sign of the factor k ij defines the sign of the pro- . Contribution of particular matrix elements (u 63 , u 74 , u 42 , u 72 , u 32 , u 53 , and u 43 ) to the optimized combined topological index derived from them in the case of BP is presented in Figure 1.
Individual goodness of elements of the Universal matrix in their best combination presented in Figure 1 is presented in Table 5, whereas their goodness observed in Table 3. Best observed standard errors of estimation (S) between the vertex-degree vertex-distance optimized matrix element (or their combination) and physicochemical property of octanes. collective result is very good. In Table 5 and 6 can be seen that their goodness is better in the best individual cases than in their contribution to the collective result, but the collective goodness is decidingly better. Such a situation has been observed in all tested cases.

Discussion
Several well known indices, e.g. the Wiener index, 15 the Randi} index, 17 etc, are in fact derived from the Universal matrix using the grid values of exponents.
It has been observed that the first digit in the exponent, e.g. 2, defines in most tested cases the first three decimals of the correlation coefficient. The second digit, e.g. 2.3, improves in most tested cases the value of the third to fifth decimal, depending on how far from the best value of the exponent is its one-digit grid value approximation. The third digit in the exponents a, b, and c, e.g. 2.31, improves the value of the fifth or higher decimal of the correlation coefficient. 19 In the space of exponents a, b, and c, there are observed several local maxima of correlation coefficient R.
For our purpose, in the first step of assessment three decimals in the correlation coefficient are sufficient, therefore in our first step we use one-digit grid value of exponents. For optimization, five decimals in the value of the correlation coefficient are considered sufficient, therefore Table 5. Individual goodness of elements of the Universal matrix presented in Figure 1 for the case of BP.  the case of their individual best two-digit exponents is presented in Table 6.
In Figure 1 can be seen that the individual contributions of particular matrix elements vary widely but their Having this relation, there arises the question, which of them is more useful, S or IC. Each of them has its own type of usefulness. IC is in some way more illustrative than S since it directly indicates the information content contained in the tested topological index (index combination). It is an easily comprehensible direct indication of goodness of the topological index (index combination).
S is an inverse measure. Inverse measures are in general less easily to comprehend. And, S can not be used for inter-PCP comparisons of goodness of topological indices.
On the other hand, the IC is not dependent on numerical values of PCP in question and can be used also for inter-PCP comparisons of goodness of topological indices. In this respect it is more similar to the usefulness of the correlation coefficient R and its use together with R is suggested. However, R is dependent on the number of regression parameters, and IC is not. Therefore IC is a better criterion for the goodness of model. In order not to mistake IC data for R data, it is suggested to express IC in %. This way we have three different indications of goodness of correlation, -1 < R < 1, then 0 < IC < 100 (%), and S. The parallelism of values of R and IC is illustrated in Table 7. Thus, if |R| = 0.99 is considered as the lower limit of sufficient goodness of a topological index, 2 then such a lower limit would be also IC = 86%. One can, of course, put also a reverse consideration. For example, if one defines that IC = 90% or any other IC value is a proper criterion, then |R| = 0.995 or another |R| value would result as an additional criterion.
The criterion, how to choose the upper reasonable limit of our demand for |R| and IC is the uncertainty of the experimental data. For example, when the values of a physicochemical property are known to three significant digits as e.g. at dc, and when the uncertainty of the third digit is ±1, then due to uncertainties in the experimental data it is reasonable to demand |R| of about 0.995 and IC of about 90%. If the uncertainty of the third digit is ±2, then due to uncertainties in the experimental data there would be reasonable to demand |R| of about 0.98 and IC of about 80%.
Using IC there arises the question to which quantity to ascribe as not having any information about the differences in the physicochemical property in question among different compounds, for example among isomers of octane. Among octanes, one could suggest its average value as done above, but also the value at n-octane or even at cyclooctane, which graph contains no vertices of degree one. For practical reasons, since there may not be known the PCP value of a particular octane, it is suggested to ascribe the value of zero information to the average of available data. If we take a different basis for the value of zero information, the IC data will be slightly different, but all approaching the value of 1 as the correlation is improving.
As a rule of thumb can be concluded that if the correlation coefficient using optimized values of exponents in an element of the Universal matrix is sufficiently good, e.g. |R| > 0.99, 2 then such a topological index can be used as a predictor of values of that physicochemical property. If the correlation coefficient in such a case is not sufficiently good, then the combination of two or more elements of the Universal matrix representing the mutual contribution of graph vertices to the value of the topological index 20 should be tested, mutually optimizing their exponents and their relative contribution.
Let us look at the results from these points of view. If we present in Table 8 the IC data of individual matrix elements in the best combinations of six of them, the results of which are presented in Table 4, we can see that most of information is contained in the mutually optimized combination of the best three or four matrix elements.
In the worst case (dc) it is contained in five of them out of 56 matrix elements.
Here is the question how to continue the improvement. One possibility is to use the brute force optimization testing all possible combinations of matrix elements. Another possibility is to look in the graph of PCP vs. matrix elements combination, which isomers depart the most from the linear regression line. An example is given in Table 9 and 10 for the case of dc, which is one of the worst examples in Tables 2 through 4. already one of the four best ones. Therefore we start testing first u 32 and u 42 , and continue with other ones containing the information about said vertices. The result using the optimized best combination of six matrix elements gives rise to a correlation, Table 11, R = 0.986, which is close to R = 0.99.   So, the use of mutually optimized combination of elements of the Universal matrix is promising to reach good correlations.
There is also to distinguish, which matrix element contributes the most to good correlation, and which one contributes the most to the »numerical volume« of the combined index. At MON this is not expressed as evidently as at Tc 2 /Pc, BP, nD, Tc, and especially at dc and logVP. In the case of dc, Table 11, there contributes the matrix element u 83 (1 a , -2.7, -0.134) the most to the observed correlation of the combined index, whereas the matrix element u 53 (-0.13, -0.38, 2 c ) contributes the most to the »numerical volume« of the combined index presented in Table 11 as ∑u ij × k ij .
There arises also a principal question, whether the best combination of six matrix elements presented above is an overparametrized situation or not. Counting the number of factors k ij and exponents a, b, c in ∑u ij (a, b, c) × k ij being 24 in the case of 18 octanes seems to confirm the overparametrization. However, one must compare this situation from the same point of view, i.e. from the point of view of the Universal matrix, also with the situation in well known topological indices, e.g. the Wiener index.
In Table 10 we can see that the largest difference is at the octane isomers branched at the vertices No. Wiener index is felt as a single number (single parameter) for each isomer. From the point of view of the Universal matrix one observes that in Wiener index, which is one half of the sum of all (in the case of octanes 56) elements of the Universal matrix, there are contained in the case of octanes in derivation of Wiener index 225 parameters giving rise to a single number of the Wiener index value. This is about one order of magnitude more parameters than in the best combination of six matrix elements presented above. Also in the case of the best combination of six out of 56 matrix elements the result is a single number, as in the case of the Wiener index. The situation in the case of the best combination of six matrix elements presented above is thus, compared to the situation at the Wiener index, not overparametrized. These data demonstrate that the degeneracy of topological indices is an important criterion of their goodness but not always decisive.

2. Meaning of Exponent Values in Elements of the Universal Matrix
When exponent values for a, b and c in the equation u ij (a, b, c) = v i a × v j b × d ij c are equal to 1 (one) it means that the values of vertex degrees resp. vertex distances contribute proportionally to their values. An exponent value of >1 means that the contribution of higher vertex degrees resp. vertex distances is exaggerated. An exponent value between 1 and 0 means that the contribution of vertex degrees resp. vertex distances is diminished, i.e. the contribution of higher vertex degrees resp. vertex distances is less than their original value would indicate. An exponent value of 0 (zero) means that different values of vertex degrees resp. vertex distances contribute equally. An exponent value of <0 means that the higher values of vertex degrees resp. vertex distances contribute less than the lower ones. An exponent value of -∞ means that vertex degrees resp. vertex distances higher than 1 do not contribute anything.

3. Structural Interpretation of Some of the Physicochemical Properties of Octanes Based on Elements of the Universal Matrix
Next question is, whether the elements of the Universal matrix, which represent particular structural features, in our case of octanes, enable the structural interpretation of their physicochemical properties.
Structural interpretation of Octane Number, which is a PCP governed by a series of chemical reactions, has already been performed, cf. e.g. 21,22 Structural interpretation of the elements of the Universal matrix, which give rise to the best observed correlation with MON data is presented in Appendix 4.
The van der Waals constant a 0 , represented here by Tc 2 /Pc, is not a chemical reaction governed PCP but it is governed by the volume of the molecules, by intermolecular attractions and collisions. It decreases with increasing branching of octanes quite monotonously, Oct > 2M7 > 3M7 > 4M7 > 3Et6 > 25M6 > 23M6 > 34M6 > 24M6 > 22M6 > 3Et2M5 > 33M6 > 3Et3M5 > 234M5 > 233M5 > 223M5 > 224M5 > 2233M4. Above the general trend are positioned Oct and 233M5, below it 24M6, 224M5, and 2233M4. Structural interpretation of the elements of the Universal matrix, which give rise to the best observed correlation with Tc 2 /Pc data is presented in Appendix 5.
The Boiling point (BP) is governed by the intermolecular attractions and collisions as well. It decreases with increasing branching that gives at octanes the sequence of BP: Oct > 3M7 > 3Et6 > 3Et3M5 > 34M6 > 4M7 > 2M7 > 3Et2M5 > 23M6 > 233M5 > 234M5 > 33M6 > 223M5 > 24M6 > 25M6 > 22M6 > 2233M4 > 224M5. It is presented in Figure 1. The above sequence of BP of octanes indicates a complex dependence of BP on branching. Obviously it depends on the number of branches, e.g. Oct > 3M7 > 34M6 > 234M5 > 2233M4. The sequence of number of branches is, however, modified by the position of branches, e.g. at octanes having one branch: 3M7 > 3Et6 > 4M7 > 2M7, at octanes having two branches: 3Et3M5 > 34M6 > 3Et2M5 > 23M6 > 33M6 > 24M6 > 25M6 > 22M6, at octanes having three branches: 233M5 > 234M5 > 223M5 > 224M5. These partial sequences indicate that a branch in position No. 3 gives rise to higher BP than those in positions No. 4 or No. 2; more centrally positioned branches give rise to higher BP than more peripheral positioned ones; more symmetrical branching gives rise to higher BP than the less symmetrical one. Structural interpretation of the elements of the Universal matrix, which give rise to the best observed correlation with BP data is presented in Appendix 6.
The Refractive index n D is a volumetric PCP. The sequence of values of n D is as follows: 3Et3M5 > 233M5 > 234M5 > 34M6 > 3Et2M5 > 223M5 > 3Et6 > 23M6 > 3M7 > 33M6 > 4M7 > Oct > 2M7 > 22M6 > 24M6 > 25M6 > 224M5. From this sequence follows that a higher number of branches on vertex No. 3 in the structure of octanes contributes to the value of n D more than on vertices in other positions, especially if vertex No. 3 is in a more central position. The vertices bearing most of branching, i.e. vertices No. 2 and 3, are involved in the contribution to IC: vertex No. 2 together with vertex No. 5 to 43.2% , vertex No. 3 together with vertices No. 6 and 8 to 40.3%. Structural interpretation of the elements of the Universal matrix, which give rise to the best observed correlation with n D data is presented in Appendix 7.
Several pairs of Critical density (dc) data are equal or apparently equal in value. The sequence of values of dc is 223M5 > 3Et2M5 ∼ 33M6 > 3Et6 ∼ 3Et3M5 ∼ 233M5 > 234M5 ∼ 2233M4 > 3M7 > 34M6 > 23M6 ∼ 224M5 > 24M6 > 4M7 > 22M6 > 25M6 > 2M7 > Oct. It presents the contribution to dc of the branch Ethyl > Methyl; and at the methyl branches on vertices No.: -one branch: 3 > 4 > 2 > none; -two branches: 3 > 4 > 2 > 5; -three branches: 3 > 4. Thus, the sequence of structures having two branches is the most illustrative for dc. Structural interpretation of the elements of the Universal matrix, which give rise to the best observed correlation with dc data is presented in Appendix 9.
The sequence of the logVP values 24M6 > 224M5 > 33M6 > 223M5 > 25M6 ∼ 22M6 > 3Et2M5 > 234M5 ∼ 233M5 > 23M6 > 3M7 ∼ 3Et3M5 > 34M6 > 3Et6 > 2M7 ∼ 4M7 > Oct indicates some apparently conflicting conclusions. One of them is higher logVP at peripheral substitution than at central one at octanes having two or three branches. There are also exceptions, where the branch on the vertex No. 3 contributes to higher value of logVP at 3M7 vs. 2M7 and 4M7; at 33M6 vs. 22M6; as well as at 24M6 vs. 25M6, 23M6 and 34M6. Structural interpretation of the elements of the Universal matrix, which give rise to the best observed correlation with loVP data is presented in Appendix 10.

Conclusions
Particular elements of the Universal matrix and especially the mutually optimized combinations of few (four to six out of 56) of them can be used as good topological indices, correlating to tested physicochemical properties to R > 0.985 even in the worst tested cases.
Besides R and S, an additional quantity useful to illustrate the potential goodness of topological indices is proposed, the information content (IC). IC is linearly and negatively correlated to S. It is an easily comprehensible direct indication of goodness of the topological indices (index combination) and is not dependent on numerical values of PCP in question, so it can be used also for inter-PCP comparisons of goodness of topological indices.
Structural interpretations of MON, Tc 2 /Pc, BP, n D , Tc, dc, and logVP are presented, as well as interpretations of what contribute to it particular matrix elements, which are members of the best combined topological indices that are mutually optimized combinations of six matrix elements.