# Understanding local fluctuations (proposed postdoc project)

**Note: This position is already filled. Project description is kept for future reference.**

## Aim of the research group

The interconversion between protein conformations, occurring at multiple timescales, is fundamental for the biological function of proteins. For example, *local fluctuations* are essential for enzyme catalysis and *conformational gating* allows ligands to enter internal binding cavities. Understanding this motion has been a long-standing goal of experimental techniques, such as nuclear magnetic resonance (NMR), and, in the last decades, molecular simulations.

Simulations offer excellent temporal and spatial resolution, but suffer from the use of an approximate energy model (force field) and from the practically obtainable simulation lengths often being insufficient for sampling all relevant conformations. Both problems need to be addressed; in fact, the success of commonly used protein force fields probably depends on their use in a restricted region of conformational space near the crystallographically determined geometry. When we leave this "comfort zone" to explore functionally important but short-lived states, we might need to employ more generally applicable force fields, e.g. those including explicit polarization.

The sampling problem can be relieved by *enhanced sampling*, which often involves a manual selection of the most important degrees of freedom, *collective variables,* for the process of interest. During the last few years, our research group has built up expertise in extracting collective variables from unbiased molecular simulations and using them to enhance the sampling of biomolecular systems. In our current work, we apply this knowledge to biological problems such as conformational gating in aspartic proteases [ref], but also try to build a framework for systematically performing enhanced-sampling simulations and validating the results towards experiments.

Our "computationally oriented" philosophy relies on two basic principles:

The assumptions underlying the calculations should be few and transparent; we should be able to clearly identify the pieces of empirical information that have been used, normally a force field and some structural information.

The results should be self-consistent. Before comparing with experimental results, we ensure that the results are statistically significant, robust in terms of modifications of the input parameters, and consistent with the initial assumptions.

By collaborating with experimental research groups (e.g. Mikael Akke), we know what can be measured in practice. Our aim is to present clear predictions, irresistible for the experimentalists to try to confirm or refute, thus contributing to explaining how proteins work.

## Proposed project

This particular project aims at understanding local fluctuations of proteins in their native states. What are the relevant conformational states of the native ensemble, what is their respective population and what are the rate constants for the transitions between these states? The computational answers to these questions can be tested experimentally, for example by comparison with NMR spectroscopy. The project involves method development in three different areas, which will be briefly discussed in the following subsections.

**1. New methods for enhanced sampling of conformational transitions**

Protein conformational fluctuations occur on multiple time scales ranging from picoseconds to days. MD simulations provide a detailed picture of the fluctuations on the picosecond and nanosecond time scales, but the simulation length is often practically limited to a few microseconds. One solution is to use *enhanced-sampling methods**, *which either enhance the fluctuations of all degrees of freedom of the system (e.g. replica exchange), or specifically enhance the fluctuations of one or a few predefined collective variables (CVs). For example, *metadynamics* [1,2] is based on the introduction of a history-dependent bias potential in the space spanned by the CVs. At convergence, the bias potential compensates the underlying free energy surface and provides an estimate of its CV dependence.

Another solution is to construct a *Markov State Model* (MSM), which describes the accessible conformational states of the system and the dynamics of moving between these states [3]. To construct an MSM, one needs to find the rate constant for all pathways leading from a given metastable state to any other metastable state. This can either be done by post-processing of long MD trajectories or by specifically starting MD simulations from less sampled states. In both cases, much simulation time is spent waiting for a transition to occur. To fully exploit the power of MSMs, one needs to perform the underlying MD simulations in a more efficient way [4].

Our approach is to combine these methods and thus use a series of enhanced-sampling simulations to construct an accurate MSM [5]. The procedure starts with the crystal structure, and proceeds to find new states by starting simulations in the known states and observing transitions to other states, discovered on-the-fly. The use of enhanced sampling will reduce the computational effort and thus allow greater accuracy and robustness. However, two main problems need to be addressed.

The first problem is to find collective variables that effectively describe the slow degrees of freedom. Fortunately, the CVs do not need to be identical in all parts of the conformational space, so the local topology of the free-energy landscape can be used to define the CVs [6]. More rigorous ways of finding the "best" linear combination of input CVs are provided by three recently proposed methods [7,8,9] all inspired by time-lagged independent component analysis (TICA). In particular, we will employ the strategy in [9], which uses an iterative procedure for refining the optimal combination using information from non-optimal metadynamics simulations. In our case, the input CVs will be descriptors of individual hydrogen bonds, and a linear combination then constitutes an "opening mechanism" that may involve breaking one or several of these hydrogen bonds.

The second problem is that the dynamical behavior of the system is normally perturbed by a biasing method like metadynamics. However, transition rates can indeed be recovered from a metadynamics simulation under certain conditions [10], namely that the collective variables can distinguish between the two energy basins and that one avoids depositing bias in the region around the transition state, as illustrated in Figure 1. In our case, we will consider each transition as a series of two stages, resembling of an earlier study [11]. The first stage will involve breaking of internal hydrogen bonds, whereas the second stage will involve solvent reorganization that provides solvent access to the particular backbone amides. By applying bias only on the CVs of the first stage, the dynamics of the second stage will be largely unaffected by the bias; thus the requirements in [10] hold.

In summary, the transition rates obtained from the carefully designed metadynamics simulations will be used to construct a MSM that reveals the relevant kinetic pathways for the local fluctuations. Particular effort will be devoted to estimating the statistical errors in the computed transition rates. Our approach differs from other combinations of MSM and metadynamics [12], in which metadynamics is only used to define the states.

**2. Testing and development of new force fields**

Several recent studies have tested the ability of current protein force fields to reproduce experimental data for the native ensemble, either using long unbiased MD simulations [13] or enhanced-sampling simulations [14]. A particular case involves intrinsically disordered proteins, for which the performance of the force fields is much worse [15]. In some sense, the open states in our investigation are more disordered than the closed state and thus can be expected to show similarly strong force-field dependence.

However, all the tested force fields in these benchmarks are of the standard two-body type. The reason for not including more physically motivated force fields, that e.g. include explicit electronic polarization, has mainly been their high computational demands. Recent technical developments have significantly reduced these demands [16], and we thus enter an era where such force fields can be included in validation studies. This will provide the possibility of more realistic simulations in the future, although of course improvement is not guaranteed. We will primarily include AMOEBA [17] in our investigation, as it is arguably the most mature polarizable protein force field.

**3. Validating the results towards experiments**

The conformational flexibility of proteins can be studied by several types of NMR spectroscopy, such as CPMG relaxation dispersion. A particularly interpretable experiment measures hydrogen exchange (HX) rates between backbone amides and solvent water. In principle, to use this data for validation, the calculations that predict HX rates should ideally involve not only the conformational transitions of the protein, but also the actual hydrogen exchange reaction, typically involving a hydroxide ion. Such reaction would have to be treated quantum-mechanically and is further complicated by quantum tunneling effects.

Therefore, it is more feasible to divide the process into a "solvent-exposure" of the amide, which can be modeled classically, and an actual exchange step, which is assumed to have the same intrinsic rate as measured for short peptides. Under typical conditions, the HX measurements then yield the free energy difference ΔG_{HX} between the exchange-competent (open) conformation and the dominant (closed) conformation. For some residues, the open state coincides with the globally unfolded ensemble, but for most residues the open state is only locally unfolded. Despite the widespread use of the HX method, the amplitude of the local fluctuations, the role of solvent penetration, and the cooperativity of the fluctuations are poorly understood [18].

When doing this simplifying division, we need a definition of what constitutes an open state. Several such definitions have been proposed [19], but only the geometrical criterion in [20] is based fully on a dynamical validation. In that study, the definition was used to determine ΔG_{HX} for each residue directly by analyzing an ultra-long MD trajectory of bovine pancreatic trypsin inhibitor (BPTI) and counting the number of configurations belonging to the open and closed state, respectively. In general, good agreement with experimental measurements of ΔG_{HX }was obtained. However, some of the amides did not visit the open conformation, and for others, the statistical uncertainty was large.

We will use the sampling strategy outlined above to provide a more rigorous testing ground for this definition. We will first study BPTI, and then investigate the transferability of the definition to other systems, primarily Staphylococcal nuclease, for which mechanisms of local fluctuations have been proposed [21]. The findings will provide a deeper understanding of local fluctuations and their role in the functionality of proteins. It is interesting to note that, even if the sampling strategy would fail to provide statistically converged results, we would anyway be able to characterize the open states. For example, the lifetime of the open states should be accessible even from simpler biased simulations, as they do not depend on the biasing scheme used to prepare the states (assuming Markovian kinetics). The more rigorous sampling strategy developed in this project could then instead be applied to other problems studied in the research group, such as conformational gating.

**References**

[1] A. Laio, M. Parrinello (2002) Proc Natl Acad Sci U S A, 99, 12562

[2] A. Barducci, G. Bussi, M. Parrinello (2008) Phys. Rev. Lett. 100, 020603

[3] F. Noé, S. Fischer (2008) Curr Opin Struct Biol 18, 154-162

[5] P. Tiwary, V. Limongelli, M. Salvalaglio, M. Parrinello (2015) Proc. Natl. Acad. Sci. USA 112, E386

[6] G. A. Tribello, M. Ceriotti, M. Parrinello (2010) Proc. Natl. Acad. Sci. USA 107, 17509

[7] P. Tiwary, B. J. Berne (2016), Proc. Natl. Acad. Sci. USA 113, 2839

[8] M. M. Sultan, V. S. Pande (2017), J. Chem. Theory Comput. 13, 2440

[12] M. Biswas, B. Lickert, G. Stock (2018) J. Phys. Chem. B, DOI: 10.1021/acs.jpcb.7b11800

[13] K. Lindorff-Larsen, P. Maragakis, S. Piana, M. P. Eastwood, R. O. Dror, D. E. Shaw (2012) PLoS ONE 7, e32131

[14] F. Palazzesi, M. K. Prakash, M. Bonomi, A. Barducci (2015) J. Chem. Theory Comput. 11, 2

[15] S. Rauscher, V. Gapsys, M. J. Gajda, M. Zweckstetter, B. L. de Groot, H. Grubmüller (2015) J. Chem. Theory Comput. 11, 5513

[16] L. Lagardere, L.-H. Jolly, F. Lipparini, F. Aviat, B. Stamm, Z. F. Jing, M. Harger, H. Torabifard, G. A. Cisneros, M. J. Schnieders, N. Gresh, Y. Maday, P. Y. Ren, J. W. Ponder, J.-P. Piquemal (2018) Chem. Sci. 9, 956

[17] Y. Shi, Z. Xia, J. Zhang, R. Best, C. Wu, J. W. Ponder, P. Ren (2013) J. Chem. Theory Comput. 9, 4046

[18] H. Maity, W. K. Lim, J. N. Rumbley, S. W. Englander (2003) Protein Sci. 12, 153

[19] J. J. Skinner, W. K. Lim, S. Bédard, B. E. Black, S. W. Englander (2012) Protein Science 21, 987

[20] F. Persson, B. Halle (2015) Proc. Natl. Acad. Sci. USA 112, 10383

[21] J. J. Skinner, W. K. Lim, S. Bédard, B. E. Black, S. W. Englander (2012) Protein Science 21, 996