# Research

Generally speaking, I study modern stochastic concepts within natural systems. These concepts are  among other things extreme value statistics in physical systems, heavy tailed probabilities, long-range correlations, and first passage time problems. I apply these methods in diverse fields, such as transport in disordered systems, hydrology, meteorology, and random walk modelling. You find highlights of my research below.

Extreme events occur in many areas of our life and in nature. They often have an enormous impact, e.g. the fastest sperm in fertilization and stock market crashes. Hence, there is a strong desire for a better understanding of extreme events. Unfortunately, their statistical rare appearance makes it very difficult to predict their future appearance.

The big jump principle is a useful tool from mathematics. It explains the emergence of extreme events due to “microscopical” extreme events. In its original form, this principle relates the sum $$s_N= \sum_{n=1}^N x_n$$ and the maximum $$x_\mathrm{max} = \mathrm{max}(x_1,x_2,\ldots,x_N)$$  of a set of $$N$$ independent and identically distributed random variables $$(x_1,x_2,\ldots,x_N)$$ with a fat-tailed distribution. The probabilities of both quantities are asymptotically equal

$$\mathrm{Prob}(s_N>z) \sim \mathrm{Prob}(x_\mathrm{max}>z),$$

for large values of $$z$$. The important point is that only one single summand, namely the maximum when it is large, contributes to large values of the sum. All other summands are negligible. So we understand the emergence of an extreme $$s_N$$ stemming from an extreme $$x_\mathrm{max}$$. Take for example the position $$s_N$$ of a bumblebee or any other animal, which can modelled by a random walk with fat-tailed step sizes $$x_n$$, see the picture below. Most of the time, the position of the bee fluctuates locally while visiting nearby flowers. Once in a while, however, it makes a single long straight movement (the big jump $$x_\mathrm{max}$$), when moving to a new feeding ground. The global position of the bee is determined by the big jump principle, where it moves from one feeding ground to another while neglecting the local fluctuations. Note that this example is just for clarification of the big jump principle and not a rigorous description of the bee movement.

However, most physical systems in nature exhibit correlations so that the original big jump principle must be modified. I investigate these modifications for a large class of different physical models (from random walk theory, transport in disordered media, time series models, and more) in my DFG research project  at the Bar-Ilan University.

The goal of this project is twofold: 1) We establish the big jump principle for many systems. 2) We want to provide necessary tools for users to deal with the consequences of extreme events. The big jump principle provides the basis for these tools and therefore improves the risk management.

Classical extreme value theory is a textbook problem. It studies the maximum $$x_\text{max}=\text{max}(x_1,x_2,\ldots,x_N)$$ of a set of $$N$$ independent and identically distributed random variables $$(x_1,x_2,\ldots,x_N)$$. The cumulative distribution function (CDF) of the random variables is $$P(x)$$. An important starting point is the observation that the maximum probability obeys

$$\mathrm{Prob}(x_\mathrm{max} \le z)=[\mathrm{Prob}(x_n \le z)]^N,$$

where $$\mathrm{Prob}(x_n \le z)$$ is the probability that any of the random variables is less or equal $$z$$. However, the assumption of independent and identically distributed random variables is typically wrong in many physical systems.

I study with my group at the Bar-Ilan University the extreme value theory for the large class of so-called constrained physical models. These models include renewal processes, mass transport models (the zero range process), and long-range interacting spin systems (the truncated inverse distance squared Ising model). The common property is the global constraint of the dynamics. For example, for renewal processes, the measurement time is fixed, and for the zero range process, the total number of particles is fixed. This fixed global constraint leads to correlations among the random variables. A special feature is that the maximum distribution exhibits a non-analytical point in the middle of the support. We established a method to decouple the problem when the extreme value is beyond this midpoint.

Beyond the midpoint of the support, we found exact relationships between two classical fields: extreme value theory (for constrained models) and stochastic dynamics. For example, for renewal processes at time $$t$$ the probablity of the maximum waiting time $$\tau_\text{max}$$ with $$z>t/2$$ is exactly

$$\mathrm{Prob}(\tau_\text{max} \le z)=1-S(z)\langle N(t-z)\rangle$$

with the survival probability $$S(z)$$ and the mean number of renewal events $$\langle N(t-z) \rangle$$. The stochastic quantities on the right hand side are well-studied in renewal theory. Our result allows us to measure the mean number of renewal events and obtain the maximum probability indirectly with less computation cost.

For the zero range process and the long-range interacting spin model we found similar relationships. Furthermore, our theory is a helpful tool to investigate the thermodynamic limit. In particular, the study of typical and rare events can be derived. An interesting observation is that rare events are described within the framework of non-normalizable states (infinite densities).

Data analysis is concerned with the correct interpretation of some data.  Unfortunately, data scientists face many difficulties: statistical outliers, extreme values, background noise, non-stationaries, external trends, or random breaks. A ubiquitous example is long-memory (also called long-range correlations), which was found in a tremendous amount of data sets, such as air temperature, stock prices, heart rate variability and many more. Long-memory can be described by a power law decaying autocorrelation function $$C(\tau)\sim \tau^{-\gamma}$$ with the correlation parameter $$0<\gamma<1$$ for large values of the time lag $$\tau$$.

A major problem for the data analysis of long-memory is the conflict with non-stationarities (for example, an external linear trend). Classical statistical methods identify long-range correlations as non-stationarities, and vice versa. A popular solution to estimate long-memory correctly are detrending methods of fluctuation analysis. The detrended fluctuation analysis (DFA) and the detrending moving average (DMA) are famous examples. These methods are based on the random walk theory: the data points are interpreted as increments of a random walk, one estimates the mean squared displacement of this “random walk”, which scales as

$$\text{MSD}(s) \sim s^\alpha$$

where $$s$$ is the time lag. The fluctuation parameter $$\alpha$$ is directly related to the correlation parameter $$\gamma=\gamma(\alpha)$$. The key property for the success of these methods is that they filter out external nonstationarities. It is straightforward to verify with Monte Carlo simulations that these methods work for a large class of nonstationarites. But it is by far not obvious how and why these methods work at all.

In my PhD thesis at the Max Planck Institute for the Physics of Complex Systems, I developed a general theory for detrending methods of fluctuation analysis. These methods estimate for a given data set the so-called fluctuation function $$F(s)$$. We found an exact relationship between $$F(s)$$ and the autocovariance function $$C(\tau)$$ of the time series

$$F(s) = \sum_{\tau=0}^{s-1}C(\tau)L(\tau,s)$$

with the detrending kernel $$L(\tau,s)$$. Based on this formula, we claim two basic principles:

1. The fluctuation function $$F(s)$$ scales as the MSD $$MSD(s)$$.
2. The estimator of the fluctuation function is unbiased.

The first point is necessary for the indirect estimation of the correlation. The second point is the mathematical terminology for “removing external trends”. Hence, we showed what actually happens when the data is detrended. Finally, we calculated $$L(\tau,s)$$ for the above DFA and DMA and showed that these methods are specific examples of our general theory.