Molecular Medicine Israel

Robust total X-ray scattering workflow to study correlated motion of proteins in crystals

Abstract

The breathing motions of proteins are thought to play a critical role in function. However, current techniques to study key collective motions are limited to spectroscopy and computation. We present a high-resolution experimental approach based on the total scattering from protein crystals at room temperature (TS/RT-MX) that captures both structure and collective motions. To reveal the scattering signal from protein motions, we present a general workflow that enables robust subtraction of lattice disorder. The workflow introduces two methods: GOODVIBES, a detailed and refinable lattice disorder model based on the rigid-body vibrations of a crystalline elastic network; and DISCOBALL, an independent method of validation that estimates the displacement covariance between proteins in the lattice in real space. Here, we demonstrate the robustness of this workflow and further demonstrate how it can be interfaced with MD simulations towards obtaining high-resolution insight into functionally important protein motions.

Introduction

Structural biology has seen remarkable advances in recent years with cryo-electron microscopy1 and structure prediction2,3. However, protein crystallography remains the go-to method for obtaining structures at atomic resolution4. The technique is widely accessible and is still the dominant source of depositions in the Protein Data Bank. Crystallography also provides sub-angstrom coordinate precision and is therefore essential for benchmarking computational methods5, such as simulations and structure prediction. Moreover, crystallography produces diffuse scattering6, an untapped source of information on subtle protein motions that underlie processes such as allostery, catalysis, and signaling7,8.

Crystal structures are often thought to represent static snapshots, but in fact, protein motions occur within the watery environment of crystals9. Diffuse scattering is a direct consequence of this motion, appearing as a structured, continuous signal in the background of diffraction images10. By studying the total scattering of a protein crystal (combining Bragg diffraction with diffuse scattering), crystallography has the potential to simultaneously provide a high-resolution average structure and information on correlated atomic displacements5. Until recently, diffuse scattering analysis was largely considered intractable, but with advances in room-temperature data collection11, the widespread availability of direct X-ray detectors12, and new data processing software13, it has now become feasible to routinely measure highly accurate diffuse scattering maps. However, although extensive efforts have been made in understanding protein diffuse scattering7, a general workflow for utilizing this information has not yet been realized. In order to fulfill the promise of diffuse scattering in structural biology, it is essential to establish robust workflows for data processing, set standards for model-data agreement, and provide benchmark examples in the form of simulations and high-quality experimental data.

Our recent study of lysozyme in the triclinic (P1) space group showed that it is possible to account for the total scattering from a crystal entirely and self-consistently using physically-motivated atomistic models13, and thus it can serve as a potential roadmap for establishing a standard workflow. A key advance from this study was the characterization of intense halo-like scattering around the Bragg peaks, which arise from correlated displacements of protein chains in different unit cells. Supercell simulations demonstrated that these correlations are long-ranged and consistent with phonon-like lattice vibrations, where proteins fluctuate about their average positions. To isolate internal motions of proteins from these external motions, it was first necessary to explain the halo features in the diffuse scattering map, which we achieved by fitting halos with a crystalline elastic network model treating the protein chain as a rigid body. Once the contribution of lattice disorder was accounted for, we were able to show that the remaining diffuse scattering signal and the B-factors from Bragg refinement were consistent with internal protein motions. In developing a workflow, the next step is to establish the accuracy and generality of lattice disorder models and develop tools for model validation. Ultimately, diffuse scattering analysis must provide new insight into biochemically relevant questions, and thus, it is also important that the workflow outputs data in a form that can be directly compared with atomistic modeling such as molecular dynamics (MD)5.

Here, we introduce computational tools and demonstrate a robust workflow to isolate the internal motion signal from total scattering data (Fig. 1). To model lattice disorder, we present GOODVIBES, a general crystalline elastic network model and optimization routine. This method allows for multiple rigid bodies per unit cell, as found in high-symmetry space groups, and accounts for symmetry in its parameterization of the elastic network. To validate the lattice disorder model, we present an independent method called DISCOBALL, which estimates rigid-body displacement covariances for pairs of protein chains in the crystal by deconvolution of the 3D pair distribution function or 3D-ΔPDF (the Fourier transform of the diffuse scattering intensities). Using simulated data and experimental datasets from three lysozyme polymorphs, we show that GOODVIBES and DISCOBALL in combination can be used to accurately model lattice disorder scattering. Finally, we show that the signal from internal protein motion can be recovered and compared quantitatively with crystalline MD simulations. The demonstration of a general workflow for diffuse scattering analysis lays the groundwork towards obtaining atomistic insight into correlated protein motions from experimental data.

Results

A robust workflow to isolate internal motion signal from total scattering data

An overview of a general workflow for the analysis of total scattering from protein crystals is shown in Fig. 1. The first step begins with data collection and reduction, the procedures for which we previously demonstrated13. Briefly, diffraction images are acquired from protein crystals at room temperature as well as from the background scattering using a room-temperature macromolecular X-ray crystallography (RT-MX) setup. Data reduction can then be performed with the mdx software library, which we introduced previously13. Measurement of the background scattering and a careful scaling procedure (such as that implemented in mdx-lib) allow for a high-quality reconstruction of the total scattering (TS) on an absolute scale (electron units), which includes Bragg peak intensities and a three-dimensional diffuse scattering map. The time-averaged electron density of the unit cell is then determined by conventional structure refinement of the Bragg data, along with the mean atomic coordinates and atomic displacement parameters (ADPs) or B-factors. The ADPs represent the motion of each atom, while the diffuse scattering map contains information on how these motions are correlated.

In our previous work13, we showed that correlated motions arise from two sources in protein crystals: the motion of atoms within a protein (internal motion) and deviations from the ideal arrangement of proteins in the crystal (external motion or lattice disorder). Lattice disorder tends to be long-ranged, and therefore it produces characteristic halos around the Bragg peaks, while correlations from internal motion are mostly short-ranged and produce smoothly varying, cloudy patterns. Although the two types of signals have distinct appearances, they cannot be simply separated in reciprocal space because the halo features are remarkably broad and overlap significantly with the much weaker and more nuanced cloudy pattern. Therefore, we aimed to develop additional tools based on a physical model of lattice disorder to subtract its contribution from the diffuse scattering map and the ADPs.

First, it was necessary to derive a general physical model of lattice disorder that can be refined to fit the diffuse halos (Fig. 1, green box). Based on the success of parameterized crystalline elastic network models in the case of triclinic lysozyme13, we chose to extend those techniques to arbitrary space groups with multiple rigid bodies per unit cell. We call the parameterization and refinement method GOODVIBES for General Optimization Of Diffuse halos from VIBrational Elastic network Simulations. The GOODVIBES model for lattice disorder represents proteins as rigid bodies arranged in a supercell with periodic boundary conditions (Fig. 2a). The size of the supercell is chosen to be large enough to account for the long-ranged correlations implied by halo features in the experimental map. Rigid-body motion is enforced by adopting a generalized coordinate system for small displacements, which is a reasonable assumption for the subtle structural fluctuations we expect. The coordinate system, which is identical to that used in translation, libration, screw-axis (TLS) refinement, assigns each rigid body six degrees of freedom representing small translations and rotations. The close contacts between proteins in the lattice are modeled using a network of inter-molecular springs, whose functional form and strength can be tuned. The dynamics of the system are computed using Newtonian mechanics and equipartition of thermal energy (see Methods)….

Sign up for our Newsletter