Free Essay

Nvbnvbn

In:

Submitted By manojrajale
Words 3373
Pages 14
research papers
Acta Crystallographica Section D

Biological Crystallography
ISSN 0907-4449

Protein imperfections: separating intrinsic from extrinsic variation of torsion angles

Glenn L. Butterfoss,a Jane S. Richardsonb and Jan Hermansa* a Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, NC 27599-7260, USA, and bDepartment of Biochemistry, Duke University Medical School, Durham, North Carolina 27710-3711, USA

Correspondence e-mail: hermans@med.unc.edu

In this paper, the variation of the values of dihedral angles in proteins is divided into two categories by analyzing distributions in a database of structures determined at a resolution of Ê 1.8 A or better [Lovell et al. (2003), Proteins Struct. Funct. Genet. 50, 437±450]. The ®rst analysis uses the torsion angle for the CÐC bond (11) of all Gln, Glu, Arg and Lys residues (`unbranched set'). Plateaued values at low B values imply a root-mean-square deviation (RMSD) of just 9 for 11 related to intrinsic structural differences between proteins. Extrapolation to high resolution gives a value of 11 , while over the entire database the RMSD is 13.4 . The assumption that the deviations arise from independent intrinsic and extrinsic sources gives $10 as the RMSD for 11 of these unbranched side chains arising from all disorder and error over the entire set. It is also found that the decrease in 11 deviation that is correlated with higher resolution structures is almost entirely a consequence of the higher percentage of low-B-value side chains in those structures and furthermore that the crystal temperature at which diffraction data are collected has a negligible effect on intrinsic deviation. Those intrinsic aspects of the distributions not related to statistical or other errors, data incompleteness or disorder correlate with energies of model compounds computed with high-level quantum mechanics. Mean side-chain torsion angles for speci®c rotamers correlate well with local energy minima of AceLeu-Nme, Ace-Ile-Nme and Ace-Met-Nme. Intrinsic RMSD Ê values in examples with B 20 A2 correlate inversely with calculated values for the relevant rotational energy barriers: from a low of 6.5 for 11 of some rotamers of Ile to a high of 14 for some Met 13 for fully tetrahedral angles and much higher for 1 angles around bonds that are tetrahedral at one end and planar at the other (e.g. 30 for 12 of the gaucheÀ rotamer of Phe). For the lower barrier Met 13 rotations there are relatively more well validated cases near eclipsed values and calculated torques from the rest of the protein structure either con®ne or force the C4 atom into the strained position. These results can be used to evaluate the variability and accuracy of 1 angles in crystal structures and also to decide whether to restrain side-chain angles in re®nement as a function of the resolution and atomic B values, depending on whether one aims for a realistic distribution of values or a spread that is statistically suitable to the probable data-set errors.

Received 23 August 2004 Accepted 26 October 2004

1. Introduction
The premise underlying this study is that each dihedral angle of any given side chain in a protein has its own speci®c equilibrium value which is determined by the details of the packing of the rest of the protein around the side chain; the objective is to determine and analyze the distributions and means of these
Acta Cryst. (2005). D61, 88±98

# 2005 International Union of Crystallography Printed in Denmark ± all rights reserved

88

doi:10.1107/S0907444904027325

research papers equilibrium values over many instances of the same side chain in many proteins. In order to determine this (intrinsic) distribution of side-chain dihedral angles from high-resolution crystal structures, it will be necessary to eliminate any part of the variation that arises from errors or disorder in the structures. As for all experimental sciences, X-ray crystallography is beset by errors in the experimental data (especially in phase determination) that in turn introduce uncertainties in the atomic positions even after exhaustive re®nement. Thermal motion, static disorder and irregularities of the crystal lattice further increase the width of the distribution. In protein crystals, side chains may occupy alternative low-energy conformations, the existence of which is often not explicitly modeled, while ¯uctuations over the many possible solvent structures are coupled with ¯uctuations in the positions of protein atoms near the molecular surface. Again for proteins, the effect of errors in interpretation of the electron-density map when the conformation of a side chain has been misassigned must be included. The effect of the errors in the observed structure factors is reduced by increasing the ratio of independent observations to unknowns, i.e. by increasing the number of observations or by reducing the number of independent variables in the model used to represent the electron density, or both. The former is achieved by collecting data to higher resolution, while the latter can be achieved by using a simple model for the structure and by introducing restraining relations between variables, most often geometric restraints related to the presence È of chemical bonds between atoms (Brunger et al., 1987). Typically, ¯uctuations in atomic position are represented with a simple model such as (isotropic or anisotropic) Gaussian distributions, whose width is measured by the atomic B value(s). As a rule, the effect of errors and of positional uncertainty or disorder is to increase the width of these distributions, i.e. to increase the atomic B values. In addition, B values re¯ect errors, incompleteness and radial fall-off of the experimental data; high-resolution structures inherently have smaller overall B values than low-resolution structures. These relationships suggest that information about the effect of statistical error, positional uncertainty and data incompleteness on the distribution of a particular type of torsion angle can be obtained by comparing distributions of torsion angles de®ned by atoms with high and with low B values in structures of low and of high resolution. Data used to derive the rotamer library of Lovell et al. (2000) were restricted to side chains containing atoms with B values all Ê 40 A2 (thus restricting to the relatively well de®ned as well as the high-resolution instances) in order to lower the noise level. For many purposes, molecular structures are more usefully described by internal coordinates (bonds, bond angles and dihedral angles) than directly by atomic positions. In a ®rst approximation, the bond lengths and bond angles may be thought of as ®xed and the structure described in terms of the values of the torsion angles for internal rotation about single bonds. The conformations of side chains in proteins of known structure, when characterized as multi-dimensional combinations of their torsion angles, distribute into mostly well sepaActa Cryst. (2005). D61, 88±98

rated clusters called rotamers (Ponder & Richards, 1987; Dunbrack & Karplus, 1993; Lovell et al., 2000). The rotamers assume sets of torsion angles that correspond to low-energy conformations of small molecules. For example, torsion angles for aliphatic CÐC bonds cluster near `canonical' values of +60, À60 and 180 . Theoretical estimates for the standard deviations of coÊ ordinates in protein crystal structures at resolutions of 1.5±2 A Ê (Jensen, 1997; Tickle et al., 1998). An error are about 0.1±0.2 A Ê of 0.1 A in an atomic coordinate corresponds to an error of up  to 6 in a torsion angle dependent on the atom's position. Experimental error estimates from comparison of identical structures determined independently are much higher, in the Ê range 0.5±0.8 A (Kleywegt, 1999; Mowbray et al., 1999). The principal aim of this study has been to assess and eliminate the effects of errors and positional uncertainties on distributions of torsion angles and to obtain distributions that correspond only to the intrinsic variation of torsion angles across proteins of known structure, not only in terms of relative rotamer population but also in terms of mean rotamerangle values and the extent of deviations of torsion angles from those mean rotamer values. Our approach makes use of two different extrapolations of distributions of torsion angles in a database of high-resolution X-ray structures of proteins (Lovell et al., 2000, 2003), one to instances of low atomic B values and the other to structures of high resolution. We also investigate the correspondence between the average rotamer structures and the corresponding low-energy structures of molecules that are suf®ciently small that the structures can be carefully optimized with accurate energy functions based on high-level quantum mechanics and justify deviations of the conformations assumed in proteins from the conformations of such isolated low-energy structures in terms of interactions with other parts of the protein as a result of non-bonded forces.

2. Methods
2.1. Model molecules and energy function

We have used accurate ab initio quantum-mechanics-based methods to calculate the energies reported in this paper using Gaussian94 and Gaussian98 (Frisch et al., 1998). Three dipeptide model structures were used, namely for leucine, isoleucine and methionine. The conformation of the dipeptides was optimized at the HF/6-31G(d) level of theory, with backbone torsion angles 9 and 2 kept ®xed in either an -type (9 = À60, 2 = À40 ) or a -type (9 = À120, 2 = 140 ) conformation. n-Butane and ethyl methyl sul®de (EMS) were used to determine the torsional barriers. These calculations were performed at the MP2/6±311+G(d,p) levels of theory. The MP2 (full) option was speci®ed and all calculations used the tight self-consistent ®eld option.
2.2. Database of well ordered residues in high-resolution structures

The database of Lovell et al. (2003) was used to extract statistics of side-chain conformations in folded proteins. This is
Butterfoss et al.


Protein imperfections

89

research papers based on 500 selected non-redundant protein structures of Ê 1.8 A or better resolution. Lovell et al. (2000) use a singleletter notation to describe conformation, t standing for trans, m for gauche± and p for gauche+, and describe a particular rotamer with several of these letters in series, each to describe a successive side-chain torsion; we have adopted this notation for this paper. Subsets of the database were extracted: a set containing all Met, Glu, Gln, Arg and Lys (23 620 residues) and a set containing all Glu, Gln, Arg and Lys (21 476 residues). Mean values and mean-square deviations of torsion angles were computed for each side-chain rotamer of every residue type Ê studied, but using only side chains with B values 20 A2 for every atom and only if more than 95 examples of that rotamer occurred in the database. Side chains of methionine, leucine and isoleucine with speci®c backbone structure were selected from the entire database. Thus, for methionine a set of 680 -helical examples was obtained by selecting all residues listed as having an `-helix' or `-helix ext' secondary structure as assigned by the DSSP program (Kabsch & Sander, 1983) and for which À80 > 9 > À40 and À60 > 2 > À20, and a set of 562 residues in extended conformation (`beta') was obtained by selecting all residues that had backbone conformations within 40 of (9, 2) = (À120, 140 ) (both with disregard of B values). Most of the 500 data-set structures could be assigned to one of the two ranges of data-collection temperature: either above freezing (near 300 K) or cooled with liquid nitrogen (near 100 K). Many of the PDB ®le headers listed temperature; some were given in publications and in a few cases the depositor was contacted. Those cases that could not be determined were omitted from Fig. 4.
2.3. Evaluation of non-bonded torsion potentials

routine added to a molecular-mechanics program (Mann et al., 2002), according to TC ÀS ˆ …rS ÀC4  FC4 † Á eC ÀS Y
4

…3†

where F is the force exerted on C by surrounding atoms, r is the SÐC4 bond vector and e is the unit vector along the C ÐS bond. The non-bonded terms of the potential energy and atomic forces were evaluated in terms of Lennard±Jones 6±12 potentials with parameters from the CEDAR/GROMOS force ®eld (Hermans et al., 1984), with use of the program's standard force routine, at the experimental value of the C ÐS torsion angle and at successive increments of this angle by 5 . In the CEDAR/GROMOS force ®eld, methyl groups have a net charge of zero and the necessary energy terms are those for interactions of the 4-methyl group with surrounding atoms (but excluding C and S of the same residue). With use of the force ®eld's `united-atom' potentials for CH3, CH2 and CH groups, the positions of H atoms are not included in this calculation. Since well de®ned methionine side chains tend not to be exposed to solvent, interactions with solvent have been ignored.

3. Results
3.1. Variation of v 1 of long unbranched side chains

The (mean) torque acting to strain a particular torsion angle i is properly de®ned as hTi i ˆ À dG Y d1i …1†

where G represents the free energy. An approximate value of such a mean torque can in principle be evaluated in a molecular-dynamics simulation of the protein. Drawbacks of this approach are the need to equilibrate the system of protein and solvent before values can be considered to be representative and the general observation of not inconsiderable shifts in atomic position during the equilibration. By using the gradient of the energy, rather than the free energy, i.e. with Ti ˆ À dE Y d1i …2†

As mentioned, for a single CÐC bond with tetrahedral sp3 geometry at each end (as in an aliphatic hydrocarbon) the canonical values of the torsion angle are À60, +60 and 180 (which are also the canonical values for a CÐS bond) and the values of such torsion angles of any given residue in the database can be separated into clusters (rotamers), with the clusters' centers approximating a set of these canonical values (Ponder & Richards, 1987; Dunbrack & Karplus, 1993; Lovell et al., 2000). In order to have a large data set to work with, we have chosen deviations of 11 from the rotameric mean values for all Glu, Gln, Arg and Lys side chains in the database: the `unbranched' set. The deviations are evaluated separately for each rotamer cluster. Since the mean torsion angles of the clusters do not coincide exactly with canonical values (owing to interactions with local backbone or other parts of the protein), the deviation of each instance i of a torsion angle is evaluated relative to the mean torsion value of the rotamer cluster 1i ˆ 1i À h1i ir Y …4† where the subscript r indicates the average over a particular rotamer cluster. The overall mean-square deviation of a given torsion angle h(1i)2i is evaluated by averaging over the database and measures the empirical peak width or variability of 1i in these data. For these tetrahedral geometry 1 angles the distributions are nearly symmetric and unskewed, so the mean values correspond closely to the modal values used in Lovell et al. (2000) while allowing de®nition of mean-squared deviations. Fig. 1 shows the variation of h12 i with resolution of the 1 crystallographic structure for all examples in the unbranched
Acta Cryst. (2005). D61, 88±98

a major contribution to dG/d1i can rapidly be evaluated given a structure with experimentally determined atomic coordinates. This approach has been applied here to torsion about the C ÐS bonds of methionine residues. It proved convenient to evaluate the torque vector T with a special-purpose

90

Butterfoss et al.



Protein imperfections

research papers set and for those torsion angles de®ned by atoms with B values Ê below 30 and below 20 A2. When torsion angles are considered regardless of the B values, the mean-square deviation (MSD) of 11 plateaus at a value near 120 deg2 for an RMSD of Ê 11 . At resolution worse than 1.5 A the mean-square deviations increase and spread over an increasingly broad range. The number of dihedral angles used to compute h12 i in each 1 range of resolution is given in Table 1. The mean-square deviations of 11 are better behaved as well as lower when the B values are limited, in spite of the smaller sample size. For the set of torsion angles with B Ê 20 A2, h12 i is, within statistical variation, independent of 1 resolution of the structures and equal to 75 deg2 for an RMSD of 8.7 .
Table 1
Distribution of Glu, Gln, Arg and Lys residues in the database into bins of decreasing resolution and RMSD of 11 from the rotameric mean.
Ê Resolution range (A) r < 1.10 1.10 r < 1.22 1.22 r < 1.4 1.40 r < 1.50 r = 1.50 1.50 < r < 1.60 r = 1.60 1.60 < r < 1.70 r = 1.70 1.70 < r < 1.80 r = 1.80 Entire data set No. angles 830 1034 801 1711 2285 870 3032 1827 2960 1452 4674 21476 Ê Mean resolution (A) 0.99 1.16 1.30 1.43 1.50 1.55 1.60 1.65 1.70 1.75 1.80 RMSD ( ) 10.9 11.4 11.2 12.4 12.4 15.0 13.2 12.5 14.4 13.4 14.7 13.4

Figure 1

Mean-square deviations (in deg2) from mean canonical values of sidechain torsion angle 11 for Glu, Gln, Arg and Lys residues in the database, as a function of resolution, for all instances (squares), for atoms with Ê Ê B 30 A2 (open circles) and for atoms with B 20 A2 (®lled circles).

Fig. 2 shows the correlation of h12 i with B values. (Data are 1 given in Table 2.) It can be seen that in this plot h12 i plateaus 1 at a value of 80 deg2 for an RMSD of 9 . The results for structures of lower and higher resolution level off at the same value at low B and differ little at higher values of B. The value of h12 i computed over this entire database is 180 deg2. The 1 MSD of 11 rises to quite large values for high B values, to almost half the MSD computed for a completely random distribution of 11 relative to three equally spaced canonical values, which equals 1200. Interestingly, one overall conclusion from Figs. 1 and 2 is that the increase in variance of 11 seen at lower resolution is entirely accounted for by the higher B values. To give a broader view, Fig. 3 shows the MSD of 11 for all Met, Glu, Gln, Lys and Arg residues in each of a selection of structures, including some determined at lower resolution than the database. (The data of Fig. 1 for all B are included as ®lled circles.) A rapid rise in MSD for less well resolved structures can be noted. The wide spread of MSD values at lower resolutions may re¯ect differences in whether or not torsion-angle and other related restraints were imposed during crystallographic re®nement and in whether rotamers were explicitly

Figure 2

Mean-square deviations (in deg2) from mean values of side-chain torsion angle 11 for all Glu, Gln, Arg and Lys residues in the database, as a Ê function of B value (in A2), for the higher and lower resolution halves of Ê the data set. Circles, resolution better than 1.63 A. Squares, resolution Ê between 1.63 and 1.80 A.
Acta Cryst. (2005). D61, 88±98

Figure 3

Mean-square deviation of 11 (in deg2) as a function of resolution. Filled circles, data from Fig. 1. Open circles, individual proteins (all Met, Glu, Gln, Lys, Arg residues).
Butterfoss et al.


Protein imperfections

91

research papers
Table 2
Ê Distribution of Glu, Gln, Arg and Lys residues in the database into bins of increasing B value (in A2) and RMSD of 11 from the rotameric mean.
Ê

Similar Documents