This disclosure relates to systems and methods for analyzing stochastic processes and more particularly to providing a device that may infer a probabilistic finite state automation.
Stochastic processes are random processes. The output of which may be quantized and represented using an alphabet of symbols. For example, the outcome of a coin flip may be represented by a binary “1” for heads and a binary “0” for tails. A trace, which, in some examples, may be referred to as a word or a sentence, may be used to represent multiple occurrences of a stochastic process. For example, five observed coin flips may be represented as “10011.” Further, stochastic processes may be dynamic and the probability of a particular outcome may change over time and/or based on past outcomes. That is, the probability of an outcome may be based on history (e.g., probability a temperature will increase after a number of previous consecutive days having a temperature increase). Systems with such stochastic dynamics and complex causal structures are of emerging interest, e.g. flow of wealth in the stock market, geophysical phenomena, ecology of evolving ecosystems, genetic regulatory circuits and even social interaction dynamics.
Dynamic stochastic processes may be described using a probabilistic finite state automatic, where a probabilistic finite state automatic may include a number of states, a number of letters in an alphabet, a state transition function, arc probabilities, and an initial state). Current techniques for inferring dynamic stochastic processes may be less than ideal. Current techniques may be computationally inefficient and may not properly infer the gradient of dynamic stochastic processes.
In general, this disclosure describes techniques for analyzing stochastic processes. In particular, the techniques described herein may be used to infer a probabilistic finite state automation. In one example, this disclosure describes a device that may infer a probabilistic finite state automation from an observed trace. In one example, a device such as a computing device, may be configured to infer a probabilistic finite state automation in order to predict the distribution of future symbols based on the recent past. Devices implementing the techniques described herein may be useful in analyzing one or more of the following: probability of errors in signal transmission, flow of wealth in the stock market, geophysical phenomena (e.g., climate changes and seismic events, such as, earthquakes), ecology of evolving ecosystems (e.g., populations), genetic regulatory circuits and even social interaction dynamics (e.g., traffic patterns).
According to one example of the disclosure, a method for inferring a probabilistic finite state automation, comprises receiving an observed string generated by a quantized stochastic process, constructing a derivative heap using the observed string, identifying a string mapping to a vertex of a convex hull of the derivative heap, detecting a structure of a transition function for an inferred probabilistic finite state automation associated with the identified string, and determining arc probabilities for the inferred probabilistic finite state automation associated with the identified string.
According to another example of the disclosure a device for inferring a probabilistic finite state automation, comprises one or more processors configured to receive an observed string generated by a quantized stochastic process, construct a derivative heap using the observed string, identify a string mapping to a vertex of a convex hull of the derivative heap, detect a structure of a transition function for an inferred probabilistic finite state automation associated with the identified string, and determine arc probabilities for the inferred probabilistic finite state automation associated with the identified string.
According to another example of the disclosure a non-transitory computer-readable storage medium has instructions stored thereon that upon execution cause one or more processors of a device to receive an observed string generated by a quantized stochastic process, construct a derivative heap using the observed string, identify a string mapping to a vertex of a convex hull of the derivative heap, detect a structure of a transition function for an inferred probabilistic finite state automation associated with the identified string, and determine arc probabilities for the inferred probabilistic finite state automation associated with the identified string.
According to another example of the disclosure an apparatus for inferring a probabilistic finite state automation comprises means for receiving an observed string generated by a quantized stochastic process, means for constructing a derivative heap using the observed string, means for identifying a string mapping to a vertex of a convex hull of the derivative heap, means for detecting a structure of a transition function for an inferred probabilistic finite state automation associated with the identified string, and means for determining arc probabilities for the inferred probabilistic finite state automation associated with the identified string.
The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Automated model inference is a topic of wide interest, and reported techniques for automated model inference range from query-based concept learning in artificial intelligence to classical system identification, to more recent symbolic regression based reverse-engineering of nonlinear dynamics. This disclosure describes techniques for inferring the dynamical structure of quantized stochastic processes (QSPs), e.g., stochastic dynamical systems evolving over discrete time, and producing quantized time series.
A simple example of a quantized stochastic process is the standard random walk on the line system.
As illustrated in
The techniques described herein may include unsupervised learning algorithms to infer the causal structure of quantized stochastic processes, defined as stochastic dynamical systems evolving over discrete time, and producing quantized observations. The techniques described herein may infer models that are generative, i.e., predict the distribution of the future symbols from knowledge of the recent past. Causal structures may formally be known as probabilistic finite state automaton (PFSA) and the techniques described herein may infer a PFSA from a sufficiently long sequence of observed symbols, with no a priori knowledge of the number, connectivity and the transition rules of the hidden states.
In some examples, the techniques described herein assume ergodicity and stationarity and infer probabilistic finite state automata models from a sufficiently long observed trace. Further, the techniques described herein may be abductive; attempting to infer a simple hypothesis, consistent with observations and modelling framework that essentially fixes the hypothesis class. In some examples, the probabilistic automata that is inferred have no initial and terminal states, have no structural restrictions and are shown to be probably approximately correct-learnable.
In some examples the techniques described herein use symbolic representation of data and quantize relative changes between successive physical observations to map the gradient of continuous time series to symbolic sequences (with each symbol representing a specific increment range). It should be noted that in some instances using a symbolic approach may cause information loss due to quantization. However, advantages of a symbolic approach may include unsupervised inference, computational efficiency, deeper insight into the hidden causal structures, ability to deduce rigorous performance guarantees and the ability to better predict stochastic evolution dictated by many hidden variables interacting in a structured fashion.
In some examples, the modelling framework described herein assumes that, at any given instant, a QSP is in a unique hidden causal state, which has a well-defined probability of generating the next quantized observation. This fixes the rule (i.e. the hypothesis class) by which sequential observations are generated, and the techniques described herein seek the correct hypothesis, i.e., the automaton that explains the observed trace. Inferring the hypothesis, given the observations and the generating rule is abduction (as opposed to induction, which infers generating rules from a knowledge of the hypothesis, and the observation set).
It should be noted that learning from symbolic data is essentially grammatical inference; learning an unknown formal language from a presentation of a finite set of example sentences. In the case of the techniques described herein, every subsequence of the observed trace is an example sentence. Inference of probabilistic automata is well studied in the context of pattern recognition.
However, learning dynamical systems with PFSA has received less attention, and the following issues have been observed: (1) Initial and terminal conditions: Dynamical systems evolve over time, and are often termination-free. Thus, inferred machines should lack terminal states. Reported algorithms often learn probabilistic automata with initial and final states, and with special termination symbols. (2) Structural restrictions: Reported techniques make restrictive structural assumptions, and pre-specify upper bounds on inferred model size. For example, some techniques infer only probabilistic suffix automata, and some techniques further require synchronizability, i.e., the property by which a bounded history is sufficient to uniquely determine the current state, and some techniques further restrict models to be acyclic and aperiodic. (3) Restriction to short memory processes: Often memory bounds are pre-specified as bounds on the order of the underlying Markov chain, or synchronizability; thus only learning processes with short range dependencies. A time series possesses long-range dependencies (LRDs), if it has correlations persisting over all time scales. In such processes, the auto-correlation function follows a power law, as opposed to an exponential decay. Such processes are of emerging interest, e.g. internet traffic, financial systems and biological systems. In some examples, the techniques described herein may be used to learn LRDs.
The additional following issue has also been observed: (4) Inconsistent definition of probability space: Reported approaches often attempt to define a probability distribution on Σ*, i.e. the set of all finite but unbounded words over an alphabet Σ, and then additionally define the probability of the null-word λ to be 1. This is inconsistent, since the latter condition would demand that the probability of every other finite string in Σ* be 0. Some authors use λ for string termination, which is in line with the grammatical picture where the empty word is used to erase a non-terminal. However, this is inconsistent with a dynamical systems model. In some examples, the techniques described herein address this via rigorous construction of a σ-algebra on strictly infinite strings.
Further, some techniques use initial states (superfluous for stationary ergodic processes), special termination symbols and requires a pre-specified bound on the model size. The techniques described herein may be significantly more compact.
To summarize, the techniques described herein may (i) formalize PFSAs in the context of QSPs, (ii) remove inconsistencies with the definition of the probability of the null-word via a σ-algebra on strictly infinite strings, and (iii) show that PFSAs arise naturally via an equivalence relation on infinite strings. Also, the techniques described herein may characterize the class of QSPs with finite probabilistic generators, establish probably approximately correct (PAC)-learnability. Further, the techniques described herein may be used to learn PFSA models with no a priori restrictions on the structure, size and memory. Further, models generated from the techniques described herein have be tested against rigorous performance guarantees and data requirements, and have shown to correctly infer long-range dependencies.
Each of processor(s) 202, memory 204, input device(s) 206, output device(s) 208, and network interface 210 may be interconnected (physically, communicatively, and/or operatively) for inter-component communications. Operating system 212, applications 214, and QSP analysis application 216 may be executable by computing device 200. It should be noted that although example computing device 200 is illustrated as having distinct functional blocks, such an illustration is for descriptive purposes and does not limit computing device 200 to a particular hardware architecture. Functions of computing device 200 may be realized using any combination of hardware, firmware and/or software implementations.
Processor(s) 202 may be configured to implement functionality and/or process instructions for execution in computing device 200. Processor(s) 202 may be capable of retrieving and processing instructions, code, and/or data structures for implementing one or more of the techniques described herein. Instructions may be stored on a computer readable medium, such as memory 204. Processor(s) 202 may be digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry.
Memory 204 may be configured to store information that may be used by computing device 200 during operation. As described above, memory 204 may be used to store program instructions for execution by processor(s) 202 and may be used by software or applications running on computing device 200 to temporarily store information during program execution. For example, memory 204 may store instructions associated with operating system 212, applications 214, and QSP application 216 or components thereof, and/or memory 204 may store information associated with the execution of operating system 212, applications 214, and QSP analysis application 216.
Memory 204 may be described as a non-transitory or tangible computer-readable storage medium. In some examples, memory 204 may provide temporary memory and/or long-term storage. In some examples, memory 204 or portions thereof may be described as volatile memory, i.e., in some cases memory 204 may not maintain stored contents when computing device 200 is powered down. Examples of volatile memories include random access memories (RAM), dynamic random access memories (DRAM), and static random access memories (SRAM). In some examples, memory 204 or portions thereof may include non-volatile storage elements. Examples of such non-volatile storage elements may include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories.
Input device(s) 206 may be configured to receive input from a user operating computing device 200. Input from a user may be generated as part of a user running one or more software applications, such as QSP analysis application 216. Input device(s) 206 may include a touch-sensitive screen, track pad, track point, mouse, a keyboard, a microphone, video camera, or any other type of device configured to receive input from a user.
Output device(s) 208 may be configured to provide output to a user operating computing device 200. Output may tactile, audio, or visual output generated as part of a user running one or more software applications, such as applications 214 and/or QSP analysis application 216. Output device(s) 210 may include a touch-sensitive screen, sound card, a video graphics adapter card, or any other type of device for converting a signal into an appropriate form understandable to humans or machines. Additional examples of an output device(s) 210 may include a speaker, a cathode ray tube (CRT) monitor, a liquid crystal display (LCD), or any other type of device that can provide output to a user.
Network interface 210 may be configured to enable computing device 200 to communicate with external devices via one or more networks. Network interface 210 may be a network interface card, such as an Ethernet card, an optical transceiver, a radio frequency transceiver, or any other type of device that can send and receive information. Network interface 210 may be configured to operate according to one or more communication protocols.
Operating system 212 may be configured facilitate the interaction of applications, such as applications 214 and QSP application 216, with processor(s) 202, memory 204, input device(s) 206, output device(s) 208, network interface 210 and other hardware components of computing device 200. Operating system 212 may be an operating system designed to be installed on laptops and desktops. For example, operating system 212 may be a Windows operating system, Linux, or Mac OS. In another example, if computing device 200 is a mobile device, such as a smartphone or a tablet, operating system 212 may be one of Android, iOS or a Windows mobile operating system.
Applications 214 may be any applications implemented within or executed by computing device 200 and may be implemented or contained within, operable by, executed by, and/or be operatively/communicatively coupled to components of computing device 200, e.g., processor(s) 202, memory 204, and network interface 210. Applications 214 may include instructions that may cause processor(s) 202 of computing device 200 to perform particular functions. Applications 214 may include algorithms which are expressed in computer programming statements, such as, for loops, while-loops, if-statements, do-loops, etc.
As described above real-world systems, such as, for example, probability of errors in signal transmission, flow of wealth in the stock market, geophysical phenomena, ecology of evolving ecosystems, genetic regulatory circuits and even social interaction dynamics may be modelled as quantized stochastic processes (QSPs). QSP analysis application 216 is an example of an application that may implement the techniques described herein in order to analyze QSP. In one example, QSP analysis application 216 may include unsupervised learning algorithms that when executed by processor(s) 202 may cause processor(s) 202 to infer a causal structure of quantized stochastic processes.
It is useful to formalize probabilistic generators for stochastic dynamical systems in order to provide a framework for algorithms that may be included in QSP analysis application 216 to infer a causal structure of quantized stochastic processes (i.e., describe the mathematical connection of QSPs to PFSA generators). This disclosure provides a series of definitions, lemmas, and theorems (i.e., 2.1 through 2.18) below in order to provide a framework for algorithms that may be included in QSP analysis application 216. It should be noted that for the sake of clarity additional some of the proofs with respect to the definitions, lemmas, and theorems are provided in Appendix A instead of the text below.
Throughout this disclosure, Σ denotes a finite alphabet of symbols and the set of all finite but possibly unbounded strings on Σ is denoted by Σ*, the Kleene* operation. The set of finite strings over Σ form a concatenative monoid, with the empty word λ as identity. In this disclosure, concatenation of two strings x, yεΣ* is written as xy. Thus, xy=xλy=xyλ=λxy. In this disclosure, the set of strictly infinite strings on Σ is denoted as Σω, where ω denotes the first transfinite cardinal. For a string xεΣ*, |x| denotes the length of x and for a set A, |A| denotes its cardinality.
Definition 2.1 (QSP).
A QSP H is a discrete time Σ-valued strictly stationary, ergodic stochastic process, i.e.
H={X
t
:X
t is a Σ-valued random variable for tεN∪{0}}.
A stochastic process is ergodic if moments can be calculated from a single, sufficiently long realization, and strictly stationary if moments are not functions of time.
Next the connection of QSPs to PFSA generators is formalized.
Definition 2.2 (σ-Algebra on Infinite Strings).
For the set of infinite strings on Σ, B is defined to be the smallest σ-algebra generated by {xΣω:xεΣ*}.
Lemma 2.3.
Any QSP induces a probability space (Σω, μ).
Proof. Using stationarity of QSP H, a probability measure μ: B→[0, 1] can be constructed by defining for any sequence xεΣ*\{λ}, and a sufficiently large number of realizations NR of H, with fixed initial conditions:
and extending the measure to elements of B\B via at most countable sums. It should be noted that μ(Σω)=ΣxεΣ*μ(xΣω)=1, and for the null word μ(xΣω)=μ(Σω)=1. For notational brevity, μ(xΣω) is denoted as Pr(x). Classically, states are induced via the Nerode equivalence, which defines two strings to be equivalent if and only if any finite extension of the strings is either both accepted or both rejected by the language under consideration. For the example techniques described herein a probabilistic extension is used.
Definition 2.4 (Probabilistic Nerode Relation).
(Σω, B, μ) induces an equivalence relation ˜N on the set of finite strings Σ* as
For xεΣ*, the equivalence class of x is denoted as [x]. It follows that ˜N is right invariant, i.e.
x˜Ny
∀zεΣ*,xz˜Nyz (2.2)
A right-invariant equivalence on Σ* always induces an automaton structure.
Definition 2.5 (Initial-Marked PFSA).
An initial-marked PFSA is a 5-tuple (Q, Σ, δ, π˜, q0), where Q is a finite state set, Σ is the alphabet, δ: Q×E→Q is the state transition function, and π˜: Q×Σ→[0, 1] specifies the conditional symbol-generation probabilities. δ and π˜ are recursively extended to arbitrary y=σ×εΣ* as δ(q, σx)=δ(δ(q,σ),x) and π˜(q, σx)=π˜(q,σ)π˜(δ(q, σ), x). q0εQ is the initial state. If the next symbol is specified, our resultant state is fixed; similar to probabilistic deterministic automata. However, unlike the latter, techniques described herein may lack final states. Additionally, techniques described herein may assume graphs to be strongly connected. Definition 2.5 has no notion of a final state, and later initial state dependence will be removed using ergodicity. First, the notion of a PFSA arises from a QSP is formalized.
Lemma 2.6 (from QSP to a PFSA).
If the probabilistic Nerode relation has a finite index, then there exists an initial-marked PFSA generator.
Proof Every QSP represented as a probability space (Σω, B, μ) induces a probabilistic automaton (Q, Σ, δ, π˜, q0), where Q is the set of equivalence classes of the probabilistic Nerode relation (definition 2.4), Σ is the alphabet, and
q0 is identified with [λ], and finite index of ˜N implies |Q|<∞
The above construction yields a minimal realization unique up to state renaming.
Corollary 2.7 (to Lemma 2.6: Null-Word Probability).
For the PFSA (Q, Σ, δ, π˜) induced from a QSP H:
∀qεQ,{tilde over (π)}(q,λ)=1. (2.5)
Proof. For qεQ, let xεZ* such that [x]=q. From equation (2.4),
Next, canonical representations are defined to remove initial-state dependence. π˜ is used to denote the matrix representation of π˜, i.e. π˜ij=π(qi, σj), qiεQ, σjεΣ. Further, the notion of transformation matrices Γσ is needed.
Definition 2.8 (Transformation Matrices).
For an initial-marked PFSA G=(Q,Σ, δ, π˜, q0), the symbol-specific transformation matrices Γσε{0, 1}|Q|×|Q| are
States in the canonical representation (denoted as x) are identified with probability distributions over states of the initial-marked PFSA. Here, x denotes the string in Σ* realizing this distribution, beginning from the stationary distribution on the states of the initial-marked representation. x is an equivalence class, and hence x is not unique.
Definition 2.9 (Canonical Representations).
An initial-marked PFSA G=(Q, Σ, δ, π˜, q0) uniquely induces a canonical representation (QC, Σ, δC, π˜C), with QC being the set of probability distributions over Q, δC:QC×Σ→QC, and π˜C:QC×Σ→[0, 1], as follows.
For a QSP H, the canonical representation is denoted as CH.
Ergodicity of QSPs, which makes λ independent of the initial state in the initial-marked PFSA, implies that the canonical representation is initial state independent, and subsumes the initial-marked representation in the following sense: let E={eiε[0 1]|Q|, i=1, . . . , |Q|} denote the set of distributions satisfying
The following is noted:
Consequently, the initial-marked PFSA induced by a QSP H, with the initial marking removed may be denoted as PH, and referred to simply as a ‘PFSA’ (dropping the qualifier ‘initial-marked’). States in PH are representable as states in CH as elements of E. Next, a key result is established: a state arbitrarily close to some element in E in the canonical construction starting from the stationary distribution λ is always encountered.
Theorem 2.10 (ε-Synchronization of Probabilistic Automata).
For any QSP H over Σ, the PFSA PH satisfies
ε′>0,∃xεΣ*,∃θεε,∥x−θν≦ε′, (2.11)
where the norm used is unimportant.
Proof. See Appendix A.
Theorem 2.10 induces the notion of ε-synchronizing strings, and guarantees their existence for arbitrary PFSA.
Definition 2.11 (ε-synchronizing strings). An ε-synchronizing string xεΣ* for a PFSA is one that satisfies
∃θεε,∥x−θ∥≦ε (2.12)
The norm used is unimportant.
It should be noted that Theorem 2.10 does not yield an algorithm for computing synchronizing strings (theorem 2.18). It simply shows that one always exists. As a corollary, the techniques herein may estimate an asymptotic upper bound on the effort required to find it.
Corollary 2.12 (to Theorem 2.10).
At most O(1/ε) strings need to be analysed to find an ε-synchronizing string.
Proof. See appendix A.
Next, the basic principle of an inference algorithm is described. PFSA states are not observable; symbols generated from hidden states may be observed. This leads to the notion of symbolic derivatives, which are computable from observations.
The set of probability distributions over a set of cardinality k are denoted as D(k). First, a count function is specified.
Definition 2.13 (Symbolic Count Function).
For a string s over Σ, the count function #s: Σ*→N∪{0} counts the number of times a particular substring occurs in s. The count is overlapping, i.e., in a string s=0001, the number of occurrences of 00s are counted as 0001 and 0001, implying #s00=2.
Definition 2.14 (Symbolic Derivative).
For a string s generated by a QSP over Σ, the symbolic derivative is a function φs: Σ*→D(|Σ|−1) as
Thus, ∀xεΣ*, φs(x) is a probability distribution over Σ. φs(x) is referred to as the symbolic derivative at x. For qiεQ, π˜ induces a distribution over Σ as [π˜(qi, σ1), . . . , π˜(qi, σ|Σ|)]. This is denoted as π˜(qi,•). It can be shown that the symbolic derivative at x can be used to estimate this distribution for qi=[x], provided x is ε-synchronizing.
Theorem 2.15 (ε-Convergence).
If xεΣ* is ε-synchronizing then
Proof. The proof follows from the Glivenko-Cantelli theorem on uniform convergence of empirical distributions. See Appendix A for details.
Next, identification of ε-synchronizing strings given a sufficiently long observed string s is described. Theorem 2.10 guarantees existence, and corollary 2.12 establishes that O(1/ε) substrings need to be analyzed until an ε-synchronizing string is encountered. It should be noted that Theorem 2.10 and corollary 2.12 do not provide an executable algorithm, which arises from an inspection of the geometric structure of the set of probability vectors over Σ, obtained by constructing φs(x) for different choices of the candidate string x.
Definition 2.16 (Derivative Heap).
Given a string s generated by a QSP, a derivative heap Ds: 2Σ*→D(|Σ|−1) is the set of probability distributions over Σ calculated for a given subset of strings L⊂Σ* as follows:
D
S(L)={φS(x):xεL⊂Σ*}. (2.15)
Lemma 2.17 (Limiting Geometry).
Let D∞=lim|s|→∞limL→Σ*Ds(L), and U∞ be the convex hull of D∞. If u is a vertex of U∞, then
∃qεQ such that u={tilde over (π)}(q,•). (2.16)
Proof. Recalling theorem 2.15, the result follows from noting that any element of D∞ is a convex combination of elements from the set {π˜(q1, •), . . . , π˜(q|Q|, •)}
It should be noted that Lemma 2.17 does not claim that the number of vertices of the convex hull of D∞ equals the number of states, but that every vertex corresponds to a state. It should be noted that D∞ cannot be generated in practice since there is a finite observed string s, and only φs(x) for a finite number of x can be calculated. Instead, it can be shown that choosing a string corresponding to the vertex of the convex hull of the heap, constructed by considering O(1/ε) strings, gives an ε-synchronizing string with high probability.
Theorem 2.18 (Derivative Heap Approximation).
For s generated by a QSP, let Ds(L) be computed with L=ΣO(log(1/
Prob(x0 is not ε-synchronizing)≦e−|s|εO(1). (2.17)
Proof. The result follows from Sanov's theorem for convex set of probability distributions. See Appendix A for details.
Based on the framework described above with respect to definitions, lemmas, and theorems 2.1 through 2.18. QSP analysis application 216 and computing device 200 may implement one or more algorithms for inferring a PFSA. In one example, an algorithm may be referred to as ‘Generator Extraction Using Self-similar Semantics’, or GenESeSS. In one example, for an observed sequence s, an algorithm may include the following: identification of ε-synchronizing string x0, identification of the structure of PH, i.e. transition function δ, and identification of arc probabilities, i.e. function π˜.
In one example, computing device 200 may identify a ε-synchronizing string x0 by constructing a derivative heap Ds(L) using the observed trace s (definition 2.16), and set L consisting of all strings up to a sufficiently large, but finite, depth. In one example, an initial choice of L may include as log|Σ|1/ε. If L is sufficiently large, then the inferred model structure will not change for larger values. Computing device 200 may then identify a vertex of the convex hull for D∞, via an algorithm for computing the hull. For example, Barber C B, Dobkin D P, Huhdanpaa H. 1996 the quickhull algorithm for convex hulls. ACM Trans. Math. Softw. 22, 469-483, which is incorporated by reference herein, provides an algorithm for computing the hull. Computing device 200 may choose x0 as the string mapping to this vertex.
In one example, computing device 200 may generate δ as follows. For each state q, computing device 200 may associate a string identifier xqidεx0Σ*, and a probability distribution hq on Σ (which is an approximation of π˜-row corresponding to state q). Computing device 200 may extend the structure recursively as follows:
In one example, computing device 200 may identify arc probabilities as follows:
It should be noted that although other competing techniques may use a similar recursive structure extension. These techniques have no notion of ε-synchronization i.e., they are restricted to inferring only synchronizable or short-memory models, or large approximations for long-memory ones. In this manner, computing device 200 represents an example of a computing device configured to infer a probabilistic finite state automation.
Theorem 3.1 below provides a complexity analysis of the techniques described herein. While hq described above with respect to identification of the transition function δ approximates π˜ rows, arc probabilities may be found via normalization of traversal count. hq only uses sequences in x0Σ*, while traversal counting uses the entire sequence s, and is more accurate. GenESeSS has no upper bound on the number of states; which is a function of the complexity of the process itself.
Theorem 3.1 (Time Complexity).
Asymptotic time complexity of GenESeSS is
Proof. See appendix A for details.
Theorem 3.1 shows that GenESeSS is polynomial in O(1/ε), size of input s, model size |Q| and alphabet size |Σ|. In practice, |Q|<<1/ε, implying that
An identification method is said to identify a target language L* in the PAC sense if it always halts and outputs L such that
∃ε,δ>0,P(d(Lk,L)≦ε)≧1−δ, (3.3)
where d(•, •) is a metric on the space of target languages. A class of languages is efficiently PAC-learnable if there exists an algorithm that PAC-identifies every language in the class, and runs in time polynomial in 1/ε, 1/δ, the length of sample input, and inferred model size. The PAC-learnability of QSPs can be proven by first establishing a metric on the space of probabilistic automata over Σ.
Lemma 3.2 (Metric for Probabilistic Automata).
For two strongly connected PFSAs G1 and G2, denote the symbolic derivative at xεΣ* as φsG1(x) and φsG2(x), respectively. Then,
define s a metric on the space of probabilistic automata on Σ.
Proof. Non-negativity and symmetry follow immediately. Triangular inequality follows from noting that ∥φs1G1(x) and φs2G2(x)∥∞ is upper bounded by 1, and therefore for any chosen order of the strings in Σ*, have two I∞ sequences, which would satisfy the triangular inequality under the sup norm. The metric is well defined since for any sufficiently long s1 and s2, the symbolic derivatives at arbitrary x are uniformly convergent to some linear combination of the rows of the corresponding π˜ matrices.
Theorem 3.3 (PAC-Learnability).
QSPs for which the probabilistic Nerode equivalence has a finite index are PAC-learnable using PFSAs, i.e. for ε, η>0, and for every sufficiently long sequence s generated by QSP H, P′H can be computed as an estimate for PH such that
Prob(Θ(PH,P′H)≦ε)≧1−η. (3.4)
The algorithm runs in time polynomial in 1/ε, 1/η, input length |s| and model size.
Proof. GenESeSS construction implies that, once the initial ε-synchronizing string x0 is identified, there is no scope of the model error to be more than ε. Hence
Prob(Θ(PH,P′H)≦ε)=1−Prob(∥φs(x0)−x0{tilde over (π)}∥∞>ε)Prob(Θ(PH,P′H)≦ε)≧1−e−|s|εO(1) (using equation (A 12)).
Thus, for any η>0, if this is |s|=O(1/E log 1/η), then the required condition of equation (3.4) is met. The polynomial runtimes are established in theorem 3.1.
In should be noted that some of the example techniques described herein are immune to Kearns' hardness result, since E>0 enforces state distinguishability.
The application of GenESeSS to simulated data is described below.
Switching between these repeating distributions on right concatenation of symbols induces the structure shown in (d) for ε=0.05. The inferred model is already strongly connected, and input data is run through it to estimate the IT matrix (see the inset of (d). The model is recovered with correct structure, and with deviation in π˜ entries smaller than 0.01. This example infers a non-synchronizable machine (recall that M2 exhibits LRDs), which often proves to be too difficult for reported unsupervised algorithms (see comparison with the other techniques in FIG. 7, which can match GenESeSS performance only at the expense of a large number of states, and yet still fails to infer true LRD models). Hidden Markov model learning with latent transitions can possibly be applied; however, such algorithms do not come with any obvious guarantees on performance.
The application of GenESeSS to model and predict experimental data is described below. In the example below GenESeSS is applied to two experimental datasets: photometric intensity (brightness) of a variable star for a succession of 600 days, and twice-daily count of bow-fly population in a jar. While the photometric data are quasi-periodic, the population time series has non-trivial stochastic structure. Quantization schemes, with four symbols, are shown in Table 1.
The inferred PFSAs are respectively illustrated in
In both cases, ARMAX models are learned for comparison. ARMAX models include auto-regressive and moving average components, assume additive Gaussian noise, and linear dependence on history. ARMAX model order (p, q) is chosen to be the smallest value-pair providing an acceptable fit to the data (see Table 1), such that increasing the model order does not significantly improve prediction accuracy. The respective models with a portion of the time series are learned (
As illustrated, ARMAX does well in the first case, even for longer prediction horizons (see
In
In some examples, the techniques described herein may be based on the following intuition: if the recent pattern of changes was encountered in the past, at least approximately, and then often led to a positive change in the immediate future, then a positive change in the immediate future is expected. Causal states simply formalize the different classes of patterns that need to be considered, and are inferred to specify which patterns are equivalent.
This allows quantitative predictions to be made. For example, state q0 is the equivalence class of histories with last two consecutive years of increasing temperature change (two 1 s), and our inferred model indicates that this induces a 42 percent probability of a positive rate of change next year (that such a pattern indeed leads uniquely to a causal state is inferred from data, and not manually imposed). Additionally, the maximum deviation between the inferred and the true hidden model is limited by a user-selectable bound. For comparison, a standard auto-regressive moving-average (ARMAX) model is learned, which actually predicts the mean quite well. However, GenESeSS captures stochastic changes better, and gains deeper insight into the causal structure in terms of the number of equivalence classes of histories, their inter-relationships and model memory.
In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to (1) tangible computer-readable storage media which is non-transitory or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transient, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term “processor,” as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules. Also, the techniques could be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or provided by a collection of interoperative hardware units, including one or more processors as described above, in conjunction with suitable software and/or firmware.
Various examples have been described. These and other examples are within the scope of the following claims.
This application claims the benefit of U.S. Provisional Application No. 61/791,171, filed on Mar. 15, 2013, which is incorporated by reference in its entirety.
This invention was made with government support under grant ECCS 09141561 awarded by NSF CDI. This invention was made with government support under grant HDTRA 1-09-1-0013 awarded by DTRA. The government has certain rights in the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US14/29333 | 3/14/2014 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61791171 | Mar 2013 | US |