Back

ⓘ Stochastic chains with memory of variable length are a family of stochastic chains of finite order in a finite alphabet, such as, for every time pass, only one ..




                                     

ⓘ Stochastic chains with memory of variable length

Stochastic chains with memory of variable length are a family of stochastic chains of finite order in a finite alphabet, such as, for every time pass, only one finite suffix of the past, called context, is necessary to predict the next symbol. These models were introduced in the information theory literature by Jorma Rissanen in 1983, as a universal tool to data compression, but recently have been used to model data in different areas such as biology, linguistics and music.

                                     

1. Definition

A stochastic chain with memory of variable length is a stochastic chain X n ∈ Z {\displaystyle X_{n}_{n\in Z}}, taking values in a finite alphabet A {\displaystyle A}, and characterized by a probabilistic context tree τ, p {\displaystyle \tau,p}, so that

  • τ {\displaystyle \tau } is the group of all contexts. A context X n − l, …, X n − 1 {\displaystyle X_{n-l},\ldots,X_{n-1}}, being l {\displaystyle l} the size of the context, is a finite portion of the past X − ∞, …, X n − 1 {\displaystyle X_{-\infty },\ldots,X_{n-1}}, which is relevant to predict the next symbol X n {\displaystyle X_{n}} ;
  • p {\displaystyle p} is a family of transition probabilities associated with each context.
                                     

2. History

The class of stochastic chains with memory of variable length was introduced by Jorma Rissanen in the article A universal system for data compression system. Such class of stochastic chains was popularized in the statistical and probabilistic community by P. Buhlmann and A. J. Wyner in 1999, in the article Variable Length Markov Chains. Named by Buhlmann and Wyner as" variable length Markov chains” VLMC, these chains are also known as" variable order Markov models" VOM," probabilistic suffix trees” and" context tree models”. The name" stochastic chains with memory of variable length” seems to have been introduced by Galves and Locherbach, in 2008, in the article of the same name.

                                     

3.1. Examples Interrupted light source

Consider a system by a lamp, an observer and a door between both of them. The lamp has two possible states: on, represented by 1, or off, represented by 0. When the lamp is on, the observer may see the light through the door, depending on which state the door is at the time: open, 1, or closed, 0. such states are independent of the original state of the lamp.

Let X n ≥ 0 {\displaystyle X_{n}_{n\geq 0}} a Markov chain that represents the state of the lamp, with values in A = 0, 1 {\displaystyle A={0.1}} and let p {\displaystyle p} be a probability transition matrix. Also, let ξ n ≥ 0 {\displaystyle \xi _{n}_{n\geq 0}} be a sequence of independent random variables that represents the doors states, also taking values in A {\displaystyle A}, independent of the chain X n ≥ 0 {\displaystyle X_{n}_{n\geq 0}} and such that

P ξ n = 1 = 1 − ε {\displaystyle \mathbb {P} \xi _{n}=1=1-\varepsilon }

where 0 < ϵ < 1 {\displaystyle 0 \,C\log n\right\}\,}"> ℓ ^ n X 0 n − 1 = max { i = 1, …, K n: Λ n X n − i n − 1 > C log ⁡ n } {\displaystyle {\hat {\ell }}_{n}X_{0}^{n-1}=\max \left\{i=1,\ldots,Kn:\Lambda _{n}X_{n-i}^{n-1}\,> \,C\log n\right\}\,}

where C {\displaystyle C} is any positive constant. At last, by Rissanen, theres the following result. Given X 0, …, X n − 1 {\displaystyle X_{0},\ldots,X_{n-1}} of a finite probabilistic context tree τ, p {\displaystyle \tau,p}, then

P ℓ ^ n X 0 n − 1 ≠ ℓ X 0 n − 1) ⟶ 0, {\displaystyle P\left{\hat {\ell }}_{n}X_{0}^{n-1}\neq \ell X_{0}^{n-1}\right)\longrightarrow 0,}

when n → ∞ {\displaystyle n\rightarrow \infty }.



                                     

3.2. Examples Bayesian information criterion BIC

The estimator of the context tree by BIC with a penalty constant c > 0 {\displaystyle c> 0} is defined as

τ ^ B I C = arg ⁡ max τ ∈ T n { log ⁡ L τ X 1 n − c d f τ log ⁡ n } {\displaystyle {\hat {\tau }}_{\mathrm {BIC} }={\underset {\tau \in {\mathcal {T}}_{n}}{\arg \max }}\{\log L_{\tau }X_{1}^{n}-c\,{\textrm {d}}f\tau\log n\}}
                                     

3.3. Examples Smallest maximizer criterion SMC

The smallest maximizer criterion is calculated by selecting the smallest tree τ of a set of champion trees C such that

lim n → ∞ log ⁡ L τ X 1 n − log ⁡ L τ ^ X 1 n = 0 {\displaystyle \lim _{n\to \infty }{\frac {\log L_{\tau }X_{1}^{n}-\log L_{\hat {\tau }}X_{1}^{n}}{n}}=0}