Download (direct link):
The validation of the prediction equation is its performance in predicting properties of molecules that were not included in the parameterization set. Equations that do well on the parameterization set may perform poorly for other molecules for several different reasons. One mistake is using a limited selection of molecules in the parameterization set. For example, an equation parameterized with organic molecules may perform very poorly when predicting the properties of inorganic molecules. Another mistake is having nearly as many fitted parameters as molecules in the test set, thus fitting to anomalies in the data rather than physical trends.
The development of group additivity methods is very similar to the development of a QSPR method. Group additivity methods can be useful for properties that are additive by nature, such as the molecular volume. For most properties, QSPR is superior to group additivity techniques.
Other algorithms for predicting properties have been developed. Both neural network and genetic algorithm-based programs are available. Some arguments can be made for the use of each. However, none has yet seen widespread use. This may be partially due to the greater difficulty in interpreting the chemical information that can be gained in addition to numerical predictions. Neural
networks are generally known to provide a good interpolation of data, but rather poor extrapolation.
QSAR is also called traditional QSAR or Hansch QSAR to distinguish it from the 3D QSAR method described below. This is the application of the technique described above to biological activities, such as environmental toxicology or drug activity. The discussion above is applicable but a number of other caveats apply; which are addressed in this section. The following discussion is oriented toward drug design, although the same points may be applicable to other areas of research as well.
In order to parameterize a QSAR equation, a quantified activity for a set of compounds must be known. These are called lead compounds, at least in the pharmaceutical industry. Typically, test results are available for only a small number of compounds. Because of this, it can be difficult to choose a number of descriptors that will give useful results without fitting to anomalies in the test set. Three to five lead compounds per descriptor in the QSAR equation are normally considered an adequate number. If two descriptors are nearly col-linear with one another, then one should be omitted even though it may have a large correlation coefficient.
In the case of drug design, it may be desirable to use parabolic functions in place of linear functions. The descriptor for an ideal drug candidate often has an optimum value. Drug activity will decrease when the value is either larger or smaller than optimum. This functional form is described by a parabola, not a linear relationship.
The advantage of using QSAR over other modeling techniques is that it takes into account the full complexity of the biological system without requiring any information about the binding site. The disadvantage is that the method will not distinguish between the contribution of binding and transport properties in determining drug activity. QSAR is very useful for determining general criteria for activity, but it does not readily yield detailed structural predictions.
30.3 3D QSAR
For drug design purposes, it is desirable to construct a method that will predict the molecular structures of candidate compounds without requiring knowledge of the binding-site geometry. 3D QSAR has been fairly successful in fulfilling these criteria. It is similar to QSAR in that property descriptors, statistical analysis, and fitting techniques are used. Beyond that, the two computations are significantly different.
Like QSAR, molecular structures must be available for compounds that
have known quantitatively defined activities. The first step is then to align the molecular structures. This alignment is based on the fact that all have a drug activity due to docking at a particular site. Alignment algorithms rotate and translate a molecule within the Cartesian coordinate space until it matches the location and rotation of another molecule as well as possible. This can be as simple as aligning the backbones of similar molecules or as complex as a sophisticated search and optimization scheme. For conformationally flexible compounds, both alignment and conformation must be addressed. Typically, the most rigid molecule in the set is the one to which the others are aligned. There are automated routines for finding the conformer of best alignment, or this can be done manually.
Once the molecules are aligned, a molecular field is computed on a grid of points in space around the molecule. This field must provide a description of how each molecule will tend to bind in the active site. Field descriptors typically consist of a sum of one or more spatial properties, such as steric factors, van der Waals parameters, or the electrostatic potential. The choice of grid points will also affect the quality of the final results.