This invention generally relates to data correlation, data association, signal processing, and, more particularly, to systems and methods for encoding signals into SSM Models, decoding signals from encoded SSM Models, and matching signals to a plurality of SSM Models.
In many contexts, it is helpful to associate a data input with a previously received or encoded data input in order to perform a responsive action or address an error. Additionally, it may be helpful to encode a data input for a first time so that it can be used with future associations. These associations are used in a variety of fields, including pattern and sequence recognition, robotics, artificial intelligence, machine learning, etc. However, many conventional methods of performing these encoding and decoding operations have limitations in terms of computational complexity, sequence length, discrete vs. continuous signal operation, and robustness to noise.
Embodiments of the present disclosure address the limitations associated with conventional methods of encoding, decoding, and matching data inputs. These and other advantages of the invention, as well as additional inventive features, will be apparent from the description of the invention provided herein.
This disclosure describes a biologically-inspired representation for associating data inputs and a family of algorithms that encode and decode this representation. After encoding, this representation can be used to recall one data input given another data input, even if the second data input is not identical to the one used during encoding. This representation can also be used for matching of data inputs to previously encoded models based on the length of the decoded sequence or based on the similarity of the decoded output to one of the data inputs. This representation generalizes and extends the SSM Sequence Model (SSM) that was described in U.S. Pat. No. 10,007,662, entitled “Systems and Methods for Recognizing, Classifying, Recalling and Analyzing Information Utilizing SSM Sequence Models,” filed on Jan. 9, 2015, the entirety of which is hereby incorporated by reference thereto.
The extended SSM model described here generalizes the SSM model to work with weighted sequences. This generalization is done for both discrete-time and continuous-time signals. The properties of the model are both explained and proved using the theory behind the z-transform and the Laplace transform. Emphasis is placed on deriving sufficient conditions for accurate decoding. Two new families of algorithms are introduced: the ZUV family for discrete sequences and the SUV family for continuous spike trains. The ZUV family of algorithms utilizes the unilateral z-transform with parameter z and weighting functions u and v, and the SUV family of algorithms utilizes the Laplace transform with parameter s and weighting functions u and v.
As will be described more fully in the paragraphs below, present herein is an overview of the encoding and decoding algorithms for discrete sequences that were introduced in U.S. Pat. No. 10,007,662, including aspects of the present disclosure that build upon the previous disclosure. Also provided herein is a theoretical model for the discrete-time representation that follows from the concatenation theorem for the unilateral z-transform, which is stated and proven in the present disclosure. The discrete-time model is then extended to work with weighted sequences and the ZUV family of algorithms is introduced. It also proves sufficient conditions under which the ZUV decoding algorithm can decode SSM Models for sequences of arbitrary length. The discrete model is then applied to sequences that may contain gaps, and the ZUV algorithms are extended to work with these types of sequences.
The present disclosure also proves the concatenation theorem for the Laplace transform and uses it to describe a continuous-time model that works with spike trains. In the continuous-time model, the timing of the spikes is not constrained to be at discrete intervals, i.e., spikes can come in at any time. The continuous-time model is also extended to work with weighted spike trains, particularly in the form of the SUV family of algorithms. The properties of the SUV decoding algorithm are described, and its robustness to noise is demonstrated. This model is then generalized to work with functionals. That is, the spike-based model becomes a special case of the general functional-based model when the functionals are set to shifted Dirac's deltas.
The properties of the ZUV and SUV models allow both the encoding and the decoding to be performed in parallel on multiple computational units. This enables embodiments in which the encoding and decoding time is commensurate with the duration of the signals.
In further embodiments, the representations described herein can be distributed and replicated over a plurality of computational units so that each of these units holds only a subset of the SSM model. Thus, the encoding or decoding process can continue even if some computational units fail.
This disclosure enables using weighting functions to encode collections of signals of arbitrary length into SSM models and decode collections of signals of arbitrary length from SSM models. In embodiments, the decoding process may end early or become quiescent if the collection of signals used to decode does not fit the model sufficiently well. In other embodiments, the signals decoded from a model can be compared to the signals available during the decoding and a match can be detected if there is sufficient similarity between them. In other embodiments, pattern matching is implemented by analyzing the lengths of decoded collections of signals, wherein the lengths are used as a similarity measure. These properties of the extended SSM model enable a new class of distributed systems and representations. In embodiments, this is used to implement pattern matching in a way that does not require comparing the elements of SSM matrices.
Contexts and applications for these various algorithms and models are presented herein. Embodiments of the present disclosure use weighting functions during encoding and decoding. The models and algorithms can be utilized for approximate pattern matching, pattern completion, and pattern association. In these embodiments, the patterns can be represented using collections of signals. Additionally, the models and algorithms can be used in robotics, speech and sound recognition, and computer vision. In robotics, embodiments of the present disclosure perform interactive object recognition, learn affordances of objects and detect these affordances across sensory modalities. In the field of computer vision, further embodiments perform object recognition, including recognition of partially occluded objects, and face recognition. In other embodiments, the models and algorithms can be used to build, search, and update an associative memory using a collection of signals. In addition, the models can be used for predicting, completing, and correcting biological sequences, which may include both DNA sequences and protein sequences. It should be noted that these contexts and applications for use of the algorithm and models are exemplary only, and the algorithms and models are not limited strictly thereto.
Other aspects, objectives, and advantages of the invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings.
The accompanying drawings incorporated in and forming a part of the specification illustrate several aspects of the present invention and, together with the description, serve to explain the principles of the invention. The following paragraphs provide a brief description of each figure.
The area under the curve is still equal to 1.
While the invention will be described in connection with certain preferred embodiments, there is no intent to limit it to those embodiments. On the contrary, the intent is to cover all alternatives, modifications and equivalents as included within the spirit and scope of the invention as defined by the appended claims.
2 Review of Encoding and Decoding Algorithms
This section provides a quick overview of the encoding and decoding algorithms for discrete SSM Sequence Models. The focus of this disclosure is on identifying the shortcomings of these algorithms that motivated the extensions and generalizations described in this disclosure.
2.1 Sequences
The encoding algorithms work with a pair of sequences. In other words, they take two different sequences as input arguments. Because the order of the two sequences matters, we will use S′ to denote the first sequence and S″ to denote the second sequence. To further distinguish between these two sequences, we will use Greek letters to spell the sequence S′ and English letters to spell the sequence S″. This convention will be used throughout this disclosure.
A sequence of letters can be easily converted into a sequence of numbers, and vice versa. One way to perform this conversion is to use a lookup table. For example, the ASCII table is one commonly used method in computer applications.
2.2 The Histogram of a Sequence
By counting the number of times that each character appears in a given sequence, we can compute the histogram for the sequence. For example, the sequence S′=βαγβ has one α, two β, and one γ.
This bar chart is useful for visualizing the histogram, but it is not very convenient for working with it. Instead, the same information will be represented with a vector. For the sequence S′=βαγβ this vector is h′=[1, 2, 1]. In other words, the values of the histogram bin counters become the elements of the vector h′. In general, this vector is of size M′, where M′ is the size of the alphabet Γ′.
Similarly, the histogram for the second sequence S″=ABAB can be represented with the vector h″=[2, 2] because the sequence contains two A's and two B's. This vector is of size M″, which is the size of the abbreviated English alphabet Γ″ in this example.
2.3 The Histogram of a Sequence Over Time
By definition, the histogram of a sequence is computed for the entire sequence. For some applications, however, it may be useful to compute a histogram only for a prefix of the sequence. The encoding algorithm described later in this chapter incrementally computes the histograms for all possible prefixes of the Greek sequence.
2.4 Open Bigrams
Two characters that occur one after another in a sequence form a bigram. For example, the sequence S′=βαγβ has three bigrams: βα, αγ, and γβ. In general, a sequence of length T has T−1 bigrams. Bigrams have a long history in machine learning and artificial intelligence, but we will not use them. Instead, we will use open bigrams.
An open bigram can be formed between any two characters as long as the first character occurs temporally before the second one. In other words, it is no longer required for the two characters to be adjacent in the sequence. For the sequence S′=βαγβ the open bigrams are: βα, βγ, ββ, αγ, αβ, and γβ. In a further extension of this idea, we allow each character to form an open bigram with itself. The reasons for this will become clear later, but for now this adds four additional open bigrams to the list: ββ, αα, γγ, and ββ. Thus, for this sequence there are 10 open bigrams. In general, for a sequence of length T, there are T(T+1)/2 open bigrams. Therefore, a list of open bigrams is a much more dense sequence representation than a list of regular bigrams.
2.5 Cross-Sequence Open Bigrams
If we have two different sequences that unfold in parallel over time, then we can generalize the concept of open bigram to cross-sequence open bigram. The principles for forming one of these are similar to the previous case, but now the first character in the cross-sequence open bigram can only come from the first sequence and the second character can only come from the second sequence. The temporal restriction still applies, i.e., the second character cannot be temporally before the first character. A character is no longer allowed to form an open bigram with itself, but it can form an open bigram with the character in the same position in the second sequence. For a pair of sequences, each of which is of length T, there are T(T+1)/2 cross-sequence open bigrams. The rest of this document uses only cross-sequence open bigrams, which will be called open bigrams for the sake of brevity.
2.6 The SSM Matrix
Given two sequences S′ and S″, the open bigrams formed between their characters can be organized in a matrix M, which is called the SSM matrix.
When constructing a matrix, the order of the two sequences matters. To illustrate this,
2.7 Encoding Example
The encoding algorithm is an efficient way of counting the open bigrams in a pair of sequences and arranging the resulting counts in a matrix format. This section gives a quick overview of this computational procedure.
The last row of
The number of open bigrams that need to be counted grows with each iteration. There is only 1 during the first iteration, 2 during the second, 3 during the third, and 4 during the fourth. In total there are 10 of them in this example. This begs the question: How can the algorithm keep up with this ever increasing number of open bigrams and still maintain its computational complexity?
The last row of
In other words, the algorithm uses the vector h′ to perform the computation more efficiently. It uses the fact that, no matter how many open bigrams need to be counted at each iteration, there will be at most M′ unique ones. That is, the first alphabet has a finite and fixed size, and therefore there will be at most that many unique open bigrams at each iteration (recall that the second character in each of these open bigrams is always the same). Furthermore, the value of the histogram h′ can be reused from one iteration to the next, after incrementing only one of its bin counters. In other words, the histogram is computed incrementally—it does not need to be recomputed from scratch during each iteration.
To summarize, during each encoding iteration, the current character from S′ indicates which bin counter of h′ will be incremented by one. The current character from S″ selects the matrix column to which h′ must be added. The current character from S″ also determines which bin of h″ should be incremented. Thus, during each iteration, the algorithm needs to update only one bin of h′, only one column of the matrix, and only one bin of h″. Note that the vector h″ is computed by the encoding algorithm, but it is not used to update the matrix. Instead, it is used at a later time by the decoding algorithm.
2.8 The SSM Model
This section defines the SSM model, which is used by the decoding algorithm and the evaluation script described later in this chapter.
The encoding SSM model for the sequence pair (S′, S″) is defined as the matrix M and the vectors h′ and h″ that are computed by the encoding algorithm. To give a concrete example, we will use the sequences S′=βαγβ and S″=ABAB from the previous section. The matrix in this case is of size 3×2. The vector h′, which represents the histogram for the sequence S′, is a column vector of size 3. The histogram vector h″ for the sequence S″ is a row vector of size 2.
In general, the size of the computed matrix is M′×M″, where M′ is the alphabet size for the sequence S′ and M″ is the alphabet size for the sequence S″. The vectors h′ and h″ are of size M′ and M″, respectively. Once again, all three of these are computed by the encoding algorithm. However, the decoding algorithm, which is described in the next section, needs only the matrix and the second histogram. That is, the decoding algorithm does not need h′ in order to decode S′ from the matrix. Therefore, the first histogram can be discarded after the encoding is done.
The decoding SSM model for the sequence pair (S′, S″) is defined as the matrix M and the vector h″ that are computed by the encoding algorithm.
2.9 Decoding Example
This section gives an example that illustrates the decoding algorithm.
The last row of
The computational complexity of this algorithm is O(TM″). This is comparable to the complexity of the encoding algorithm, which is O(TM′).
2.10 Decoding Limitations of Regular Matrices
This section analyzes the decoding properties of SSM matrices. The analysis shows that for some pairs of sequences of length three these matrices are not uniquely decodable. The analysis also shows that these decoding limitations increase as the sequence length increases.
Without loss of generality, all examples in this section use pairs of sequences that are constructed from an abbreviated Greek alphabet with only two letters and an abbreviated English alphabet with only two letters as well.
2.10.1 Sequences of Length 1
When the two sequences are of length one, there are only four pairs of sequences from which a matrix can be constructed. These sequence pairs are: (α, A), (α, B), (β, A), and (β, B).
2.10.2 Sequences of Length 2
There are only four possible Greek sequences of length two that can be constructed from a two-letter alphabet: αα, αβ, βα, and ββ. Similarly, there are only four possible English sequences: AA, AB, BA, and BB. Thus, in this case, there are 16 possible combinations of a Greek sequence and an English sequence. All 16 matrices for these sequence pairs are shown in
It is relatively easy to verify that all 16 matrices are uniquely decodable. Once again, it is assumed that the English sequence that was used to encode the matrix is available at run time. The histograms for the English sequences are shown in
2.10.3 Sequences of Length 3
If the input sequences are at least three characters long, then the mapping from sequence pairs to matrices is no longer unique. In other words, when T=3 there are at least two different pairs of sequences that map to the same matrix. For example, both (αββ, ABA) and (βαα, ABA) map to the matrix shown in
Visual inspection shows that there are four groups of aliased models (their matrices are highlighted in gray in
To summarize, for T=3 there are 64 possible sequence pairs. Each pair maps to a matrix and a histogram vector h″. Thus, there are 64 models. Of these, 56 are unique and 8 are aliased. The aliased models can be split into 4 groups of 2, such that in each group both M and h″ are the same. For the first sequence pair in each group, the decoding algorithm returns the correct S′ sequence because it picks the Greek letter that is first in alphabetical order. For the second pair the decoding algorithm returns the aliased S′ sequence, i.e., the one that belongs to the other pair. For the 56 non-aliased sequence pairs the decoding algorithm always returns the correct S′ sequence. Thus, for T=3 there are 4 (or 6.25%) wrong decoding outcomes (i.e., an aliased S′ is decoded) and 60 (or 93.75%) correct outcomes. The correct outcomes, however, can be split into two groups. The first group contains 56 sequence pairs that are uniquely mapped to an SSM model. The second group contains 4 sequence pairs that have an aliased mapping, but for which the decoding algorithm returns the correct S′ because of the way it does tie breaking.
2.10.4 Sequences of Length Up to 10
This section extends the decoding evaluation from the previous section to sequences of length up to 10. As the sequences become longer, the number of sequence pairs grows exponentially. If the sizes of the two alphabets are M′ and M″, then there are (M′)T×(M″)T possible sequence pairs of length T. For example, when M′=M″=2 and T=10 there are 210×210=1, 048, 576 possible sequence pairs.
At this point it should be obvious that evaluation by hand is neither feasible nor desirable. To get around this problem, we used a computer script. The script takes M′, M″, and T as parameters and then exhaustively enumerates all possible (M′)T×(M″)T sequence pairs. For each pair, the script runs the encoding algorithm and computes a matrix and a histogram for the second sequence. The script then evaluates both the encoding outcomes and the decoding outcomes as described below.
To characterize the encoding outcomes, the script compares the SSM model of each sequence pair against the models of all other sequence pairs. If there is no match, then the encoding is counted as unique. If it finds a match, then the encoding is counted as aliased. That is, there are at least two different sequence pairs that map to the same M and h″. Once this check is done for all pairs, the script reports the percentage of unique and aliased sequence pairs.
For the decoding outcomes, the script attempts to decode each model after it is encoded. To explain this process, let (S1, S2) be a sequence pair for which a model was computed. The script then calls the decoding algorithm with S2 as a parameter and compares the decoded sequence to S1 (i.e., the same one that was used to encode the matrix). There are four possible outcomes of this process: 1) the decoded sequence is the same as the sequence S1 that was used during encoding; 2) the decoded sequence is different from S1, but it is equal to one of the sequences in the aliased pairs; 3) the decoded sequence is of length T, but it is neither the correct sequence nor an aliased sequence; and 4) the decoded sequence is wrong and its length is shorter than T. The last case corresponds to a decoding process that got “stuck”, i.e., the algorithm reached a point at which it couldn't subtract h″ from any row of the matrix without some matrix elements becoming negative in the process.
Thus, there are two encoding outcomes and four decoding outcomes. Combining these outcomes leads to eight different cases that are summarized in
Consider the diagram in the upper-left corner in
The diagrams in the bottom row of the figure are for aliased encoding. In this case two (or more) sequence pairs map to the same model. This is indicated with the sequence pairs (S1, S2), . . . , (Sp, Sq) that are connected with double arrows to the model. As described above, these aliasing effects are detected by the script using exhaustive enumeration. For evaluation purposes, however, only one of these sequence pairs is considered the main one during a particular testing iteration. For the sake of explanation, let (S1, S2) be that pair. Thus, when testing the decoding outcome, the script will provide the sequence S2 as input and compare the output sequence to S1. If the decoded sequence is equal to S1 (see the 1-st column in the figure), then the decoding is considered correct. In some cases the decoded sequence is from one of the aliased pairs (e.g., Sp as in the 2-nd column). In the remaining two cases the decoded sequence is wrong (indicated with Sw in the 3-rd column) or wrong and short (indicated with Sws in the 4-th column).
There are several interesting things to note here. First, the performance drops quite rapidly as T increases. The plot in the upper-left cell in
These results indicate that the performance of the decoding algorithm drops quite rapidly as the sequence length increases. The next two sections describe one possible extension of the SSM model, and its associated algorithms, that performs better according to these metrics.
2.11 Encoding Example with Exponential Decay
The exponential decay also affects the vector h″. In this case, however, the decay affects only what is added to this vector. In other words, the elements of h″ don't decay from one iteration to the next. What decays is the increment value, which is added to only one element. Note that for the vector h′ the increment value is implicitly set to 1 and it remains the same for all iterations. In this case the increment value is {circumflex over (z)}, which is initially set to 1 and decays in half (divided by z) from one iteration to the next.
A side effect of the exponential decay is that the elements of the matrix are no longer integer numbers. In fact, all three components—h′, M, and h″—now have real values. Also, just like h″, the contents of the matrix don't decay during the encoding process. In other words, what is added to the matrix stays in the matrix.
Note that in the exponential model the word histogram is no longer an accurate description for h′ or h″. A better word might be ‘history’, i.e., the history of each character in the corresponding input sequence. In any case, we will continue to use h′ and h″ to denote these two vectors. For lack of a better word sometimes we may refer to them as histograms. In should be noted, however, that these two vectors reduce to proper histograms only if z=1 (i.e., when there is no exponential decay or growth as in the previous sections). When z=1 the vector h′ is indeed the histogram of the characters in the first sequence S′ and the vector h″ is indeed the histogram of the second sequence S″. Once again, this is no longer the case when z≠1.
2.12 Decoding Example with Exponential Decay
The elements of M and h″ that are modified during the last iteration of the algorithm are highlighted in red and green in the last row of
At the end of the decoding process both the matrix and the vector h″ should contain only zeros. If this is not the case, then the decoding process probably got stuck. The next section analyzes how the decodability properties of the exponential model depend on the sequence length.
2.13 Decoding Limitations of Exponential Matrices
This section shows that the same limit of T=3 holds for the deterministic decoding of exponential matrices as well. One difference in this case is that the mapping from sequence pairs to models is now one-to-one, i.e., due to the exponential decay there is no aliasing. This section also analyzes the decodability properties of the model for sequences of length up to 10.
The example from
2.14 Summary
This chapter provided a quick overview of the encoding and decoding algorithms for discrete sequences that were described in our previous document. Emphasis was added on evaluating the decodability properties of the model in each case. Section 2.10 showed that for sequences of length 3 the mapping from sequence pairs to models is aliased (i.e., many-to-one) and that the decoding process is no longer deterministic. Section 2.13 showed that the exponential version of the algorithms eliminates aliasing effects but the decodability limit is still equal to 3.
These results prompted the search for a new class of algorithms that could overcome this decodability limit and for a set of conditions under which deterministic decoding is always possible. This led to the development of the ZUV family of algorithms, which are described in the next couple of chapters. The algorithms described in this chapter can be viewed as special cases of the ZUV algorithms.
3 Discrete-Time Formulation
This chapter shows that the value of each element in a dual exponential SSM matrix is equal to the value of the unilateral z-transform, evaluated at a specific z, of the cross-correlation of the two right-sided sequences that correspond to the row channel and the column channel. This result is justified by the concatenation theorem and its corollaries, which are stated in Sections 3.4 and 3.5. Chapter 4 provides examples that complement the theory described here.
3.1 Infinite Sequences
A sequence is a collection of numbers that are arranged in a specific order. An infinite sequence is a sequence that has infinitely many numbers. In this chapter it is assumed that, by default, sequences consist of complex numbers. The cases in which the elements of the sequences are restricted to real numbers are explicitly indicated in the text.
There are two types of infinite sequences: right-sided sequences and two-sided sequences. A right-sided sequence is a collection of complex numbers that is indexed by nonnegative integers. A two-sided sequence is a collection of complex numbers that is indexed by the set of all integers, which consists of the positive integers, the negative integers, and zero.
3.2 The Z-Transform
This section introduces the z-transform of a sequence. If the sequence is right-sided, then only the unilateral z-transform can be obtained from it. If the sequence is two-sided, then both the unilateral z-transform and the bilateral z-transform can be derived. The formal definitions are given below.
The domain of a+, which is also called the region of convergence (ROC), consists of all complex numbers for which the series converges. More formally,
Definition 3.5. Let y=( . . . , y−1, y0, y1, . . . ) be a two-sided infinite sequence. The bilateral z-transform of y is the function y(z) that maps a complex scalar z to the value of the bilateral power series derived from y and evaluated at z−1. More formally,
The domain of y, i.e., its region of convergence, consists of all complex scalars z for which the power series converges. More formally,
3.3 The Cross-Correlation Theorem for the Z-Transform
Cross-correlation is an operation on a pair of sequences that is similar to convolution. Unlike convolution, however, cross-correlation is not a commutative operation. That is, the order of the two sequences is important for cross-correlation. Therefore, it makes sense to talk about the first and the second sequence for cross-correlation, but not for convolution. To distinguish between these two operations we will use ★ for convolution and ★ for cross-correlation.
3.3.1 Cross-Correlation: Definitions and Properties
Some problems require only the right tail of the cross-correlation sequence, e.g., calculating the unilateral z-transform of a★b. In these special cases n is a positive integer or zero, which implies that max(0, −n)=0. Therefore, the sum in formula (3.6) can start from 0, which leads to the following simplified expression for the elements of the cross-correlation sequence:
Definition 3.8. Let a=(a0, a1, . . . , aT−1) and b=(b0, b1, . . . , bT−1) be two right-sided finite sequences of length T. Then, the discrete cross-correlation of a and b is a two-sided finite sequence of length 2T−1, i.e.,
(a★b)=((a★b)−(T−1), (a★b)−(T−2), . . . , (a★b)−1, (a★b)0, (a★b)1, . . . , (a★b)T−2, (a★b)T−1). (3.8)
Furthermore, the n-th element of this sequence is given by the following formula:
If only the right tail of the cross-correlation sequence is needed, then (3.9) can be simplified as follows:
(u+v)★(x+y)=u★x+u★y+v★x+v★y. (3.15)
(αx)★y=
x★(αy)=α(x★y), (3.17)
for each α ∈ . Note that the complex scalar α is conjugated only in the first equation.
(αu+βv)★(φx+ψy)=
x★y=
*y, (3.19)
where denotes the elementwise complex conjugate of the reverse of x, i.e.,
i
=
x★y=★. (3.21)
Similarly, let y=( . . . , y−2, y−1, y0, y1, y2, . . . ) be a two-sided sequence obtained by padding b with infinitely many zeros on the left. More formally,
Then, the cross-correlation of a and b is equal to the cross-correlation of x and y, i.e.,
(a★b)n=(x★y)n, for each n ∈ ={ . . . , −2, −1, 0, 1, 2, . . . }. (3.24)
3.3.2 The Cross-Correlation Theorem
The cross-correlation theorem, which is stated below, gives a formula for the bilateral z-transform of the cross-correlation of a pair of two-sided infinite sequences. It is similar to the convolution theorem, but because cross-correlation is not commutative there are some differences. In this case, the value of the z-transform of the cross-correlation at z can be obtained by multiplying the complex conjugate of the z-transform of the first sequence evaluated at the reciprocal of the complex conjugate of z by the z-transform of the second sequence evaluated at z.
Then, the value of the bilateral z-transform of the cross-correlation of x and y at z is equal to the product of the complex conjugate of the value of the bilateral z-transform of x at 1/
x★y(z)=
If the two sequences are right-sided, then there is another version of the cross-correlation theorem. This version states that the value of the bilateral z-transform of the cross-correlation at z is equal to the product of the complex conjugate of the value of the unilateral z-transform of the first sequence evaluated at 1/
Then, the value of the bilateral z-transform of the cross-correlation of a and b at z is equal to the product of the complex conjugate of the value of the unilateral z-transform of a at the reciprocal of the complex conjugate of z and the value of the unilateral z-transform of b at z. More formally,
a★b(z)=
Notice that in the second version of the theorem there is an asymmetry, i.e., in the formula
a★b(z)=
the bilateral z-transform is used in the left-hand side, but the unilateral z-transform is used in the right-hand side. This is due to the fact that the cross-correlation of two right-sided sequences is a two-sided sequence.
3.4 The Concatenation Theorem for the Z-Transform
This section states two versions of the concatenation theorem. The first version is for two-sided sequences. The second version is for right-sided sequences and its proof relies on the proof of the first theorem. The following lemma is used in the proof of the theorem.
Let x′ be a two-sided sequence that is obtained from x by replacing xT and all elements that follow it with zeros. More formally, x′n=H(T−1−n)xn, for each n ∈ ={ . . . , −2, −1, 0, 1, 2, . . . }, where H(n) denotes the Heaviside function, which is defined as follows:
In other words,
Similarly, let y′ be a two-sided sequence that is derived from y using the same procedure that was used to derive x′ from x, i.e., y′n=H(T−1−n)yn, for each n ∈ . In other words,
Also, let x″ be a two-sided sequence that is obtained from x by replacing all elements up to and including xT−1 with zeros and keeping the remaining elements unchanged. More formally, x′n′=H(n−T)xn for each n ∈ . In other words,
Similarly, let y″ be a two-sided sequence that is derived from y using the same procedure that was used to derive x″ from x, i.e., y″n=H(n−T)yn for each n ∈ . That is,
x★y
+(z)=(x′+x″)★(y′+y″)+(z)=x′★y′+(z)+x′★y″+(z)+x″★y′+(z)+x″★y″+(z), (3.42)
and each of the four terms in the right-hand side of (3.42) is well-defined and finite.
Let x′ be a two-sided sequence that is obtained from x by replacing xT and all elements that follow it with zeros. More formally, x′n=H(T−1−n)xn, for each n ∈ ={ . . . , −2, −1, 0, 1, 2, . . . }, where H(n) denotes the Heaviside function, which is defined as follows:
In other words,
Similarly, let y′ be a two-sided sequence that is derived from y using the same procedure that was used to derive x′ from x, i.e., y′n=H(T−1−n)yn for each n ∈ . In other words,
Also, let x″ be a two-sided sequence that is obtained from x by replacing all elements up to and including xT−1 with zeros and keeping the remaining elements unchanged. More formally, x′n′=H(n−T)xn for each n ∈ . In other words,
Similarly, let y″ be a two-sided sequence that is derived from y using the same procedure that was used to derive x″ from x, i.e., y′n=H(n−T)yn for each n ∈ . That is,
Then, the value of the unilateral z-transform at z of the cross-correlation of x and y can be expressed as
x★y
+(z)=x′★y′+(z)+x″★y″+(z)+
When the two input sequences are right-sided (i.e., causal), there is another version of the concatenation theorem, which is stated below.
Let a′ be a right-sided sequence that is obtained from the sequence a by replacing aT and all elements that follow it by zeros. More formally, a′n=H(T−1−n)an for each n ∈ +={0, 1, 2, . . . }, where H(n) denotes the Heaviside function, i.e.,
In other words, the elements of the sequence a′ are defined as follows:
Similarly, let b′ be a right-sided sequence that is derived from b using the same approach that was used to derive a′ from a, i.e., b′n=H(T−1−n)bn for each n ∈ +={0, 1, 2, . . . }. That is, the elements of b′ are given by:
Also, let a″ be a right-sided sequence that is obtained from a by replacing all elements up to and including aT−1 with zeros and keeping the remaining elements unchanged. More formally, a″n=H(n−T)an for each n ∈ . In other words,
Similarly, let b″ be a right-sided sequence that is derived from the sequence b such that b″n=H(n−T)bn for each n ∈ . That is,
Then, the value of the unilateral z-transform at z of the cross-correlation of a and b can be expressed in the following form:
a★b
+(z)=a′★b′+(z)+a″★b″+(z)+
3.5 Special Cases of the Concatenation Theorem
This section states as corollaries several special cases of the concatenation theorem for right-sided sequences that have finite length. These corollaries are the mathematical foundation for both the encoding and the decoding algorithm.
Similarly, let v′ be a finite sequence of length K that is obtained from v by replacing vT and all elements that follow it with zeros, i.e., v′n=H(T−1−n)vn for n ∈ {0, 1, 2, . . . , K−1} so that
Also, let u″ be a finite sequence of length K that is obtained from u by replacing all of its elements up to and including uT−1 with zeros, i.e., u″n=H(n−T)un for each n ∈ {0, 1, 2, . . . , K−1}. In other words,
Similarly, let v″ be a finite sequence of length K that is obtained from v by replacing all of its elements up to and including vT−1 with zeros, i.e., v″n=H(n−T)vn for each n ∈ {0, 1, . . . , K−1} so that
n★v
+(z)=u′★v′+(z)+u″★v″+(z)+
Note that Corollary 3.20 does not have any convergence conditions, unlike some of the previous theorems, because the sequences u and v are finite. Thus, both series derived from these finite sequences converge and they also converge absolutely.
u★v
+(z)=u′★v′+(z)+(z)vT, (3.66)
where is a finite sequence of length T+1 that is obtained by reversing and conjugating the elements of u.
u★v
+(z)=
4 Discrete-Time Examples
This chapter provides examples that illustrate the z-transform theorems and the properties of exponential SSM matrices. Most of the examples in this chapter use right-sided finite sequences to illustrate the essence of a theorem or to visualize how an algorithm works. The theorems, however, are more general and apply to infinite sequences as well.
4.1 Types of Sequences
The z-transform theorems that were described previously use four different types of sequences. Some theorems are true for all four types. Others are valid for only a subset of them. The following examples illustrate the differences between these sequence types.
4.2 Bilateral Z-Transform Example
4.2.1 Calculating the Z-Transform For One Specific Value of z
To illustrate the bilateral z-transform we will give an example using the decimal number system, which should be familiar to everyone. Every number in the decimal system can be viewed as a sequence of digits. For example, the number 2147.514 can be viewed as a two-sided finite sequence d=(d−3, d−2, d−1, d0, d1, d2, d3), the elements of which are: 2, 1, 4, 7, 5, 1, and 4.
The same number can also be represented with an infinite two-sided sequence as shown in
The magnitude of this number in the decimal system is equal to the value of the bilateral z-transform of the two-sided digit sequence d, evaluated at z=10. In other words,
The notation {d} is typically used for the bilateral z-transform of the sequence d. This notation, however, is for the entire z-transform, i.e., for all possible values of z. In this case, however, we need the value of the z-transform at one specific z, e.g., z=10. This requires an extra set of brackets to specify that, i.e., {d}(z), which makes the notation too cumbersome. Therefore, we will simplify the notation by putting the part in the curly brackets in a subscript. Thus, the value of the bilateral z-transform, evaluated at z, of the sequence d will be denoted with d(z). Similarly, the value of the unilateral z-transform at z of the sequence a will be denoted with a+(z).
This value is not the z-transform of the sequence d. Instead, this is the value of the z-transform of d evaluated at the specific point z=10. To get the entire z-transform we need to perform similar calculations for all possible values of z.
4.2.2 Calculating the Z-Transform for All Values of z
As another example, consider the number 1101.101, which can be represented as a finite two-sided sequence of digits as shown in
To calculate the bilateral z-transform of b at a specific point we need to pick some z and plug it into the formula for the z-transform. For example, if we pick z=2 we get the following result:
This result could be interpreted as the value of this number in the binary number system. If we pick z=10, then we get the value in the decimal number system:
In fact, we could pick any other value of z and perform a similar calculation. For example, if we pick z=0.4 or z=−2.5, then we get:
To plot the z-transform of this sequence we need to perform similar calculations for all possible values of z.
In general, z can be a complex number. Visualizing the transform in that case is not easy as it requires a four-dimensional plot.
In this example, the digit sequence was finite and for a finite sequence the value of the z-transform is always bounded. For infinite sequences, however, it is possible that for some values of z the value of the z-transform will diverge to either positive infinity or negative infinity. For example, the z-transform of the infinite digit sequence 0.333(3) evaluated at z=1 is equal to infinity because evaluating this value requires adding an infinite number of 3's.
4.3 Unilateral Z-Transform Example
The unilateral z-transform is similar to the bilateral z-transform, but in this case only the sequence elements at nonnegative indices are used in the calculations. Therefore, the unilateral z-transform is typically used with right-sided or causal sequences. If for some reason the sequence is two-sided, then its left tail is simply ignored.
Let b=(b0, b1, b2) be a right-sided finite sequence of length three. The elements of this abstract sequence are shown in
b
+(z)=b0z0+b1z−1+b2z−2. (4.6)
In other words, the unilateral z-transform of b is a function of z that maps the elements of b and the value of z to the value of b+(z).
To make this example more concrete, let b0=1, b1=4, and b2=2, i.e., let b=(1, 4, 2).
b
+(z)=1z0+4z−1+2z−2. (4.7)
Using this formula, the value of the transform can be calculated for any z. For example, for z=4, we have:
If we perform similar calculations for all possible values of z, then we can plot the unilateral z-transform. This is shown in
So far in this example z was restricted to be a real number. In general, however, z can be a complex number. If we allow z to be complex, then the value of the z-transform can also be a complex number. Visualizing the z-transform in that case is a challenge as it requires a four-dimensional plot. In the most general case both z and the elements of b can be complex numbers. Visualizing the z-transform in this case would require a four-dimensional plot as well.
4.4 Convolution Examples
The discrete convolution of two infinite right-sided sequences a and b is defined as follows:
The outcome of this operation is a sequence, which is called the convolution sequence. Sometimes the resulting sequence is also called the Cauchy product of a and b.
To illustrate this operation we will use two right-sided sequences of length three: a=(a0, a1, a2) and b=(b0, b1, b2). As with some of the earlier examples, we can write each sequence on a separate tape. In this visualization each tape has equally-sized boxes and each box contains only one element of the sequence that the tape represents.
The convolution sequence, which is denoted by (a★b)=((a★b)0, (a★b)1, . . . ), is computed iteratively such that only one element of this sequence is computed during each iteration. Which element? That depends on the offset between the two tapes, where the offset is defined as the number of boxes in the horizontal direction that separate a0 and b0. For example, to compute the n-th element (a★b)n of the convolution sequence the a-tape and the b-tape must be placed at an offset n relative to each other. Once this is done, the value of (a★b)n can be computed by multiplying all vertically aligned elements from a and b and then adding all such pairwise products. If a sequence element is not aligned with an element from the other sequence, then that specific product is assumed to be zero.
For n=0 the two tapes are aligned such that a0 is directly above b0 (see the top part of
For n=1 the b-tape is shifted one position to the right, relative to the configuration for n=0. Now a0 is directly above b1 and a1 is directly above b0. By multiplying the elements that line up vertically and adding the two partial results we get: (a★b)1=a0b1+a1b0. This is the value of the element at index 1 in the convolution sequence.
For n=2 the b-tape is shifted one position to the right, relative to the previous step. Now the three elements of a overlap with the three elements of b. Because the temporal order of b is reversed, however, the result is: (a★b)2=a0b2+a1b1+a2b0. This is the element at index 2 in the resulting sequence.
Continuing in the same way we can compute all elements of the convolution sequence. Because both a and b are finite sequences, however, at some point the two tapes will no longer overlap. In our example this occurs when n=5. In this case the resulting product is assumed to be 0. Thus, (a★b)5=0. The same is true for all n>5, but these iterations are not shown in the figure.
To make this a bit more concrete,
4.4.1 The Unilateral Z-Transform of the Convolution Sequence
Let a=(a0, a1, a2) and b=(b0, b1, b2) be two right-sided sequences of length three. The convolution of a and bis a sequence, which is denoted by (a★b)=((a★b)0, (a★b)1 . . . ).
The unilateral z-transform of the convolution sequence can be computed from
a★b
+(z)=(a0b0)z0+(a0b1+a1b0)z−1+(a0b2+a1b1+a2b0)z−2+(a1b2+a2b1)z−3+(a2b2)z−4. (4.10)
If we perform the multiplications and arrange the resulting terms such that the first half of the rows are left justified while the second half are right justified, then we will get the following:
By grouping the terms in each of the columns of this expression we get:
This result is the essence of the convolution theorem for the unilateral z-transform. This theorem is true even if a and b are infinite right-sided sequences.
4.4.2 The Bilateral Z-Transform of the Convolution Sequence
For the right-sided sequences a=(a0, a1, a2) and b=(b0, b1, b2) used in the previous example the bilateral z-transform of a★b is equivalent to the unilateral z-transform of a★b. In other words, a★b(z)=a★b+(z)=a+(z)b+(z). This is true for all right-sided sequences because the convolution of two right-sided sequences is itself a right-sided sequence.
For a pair of two-sided sequences x and y there is another version of the convolution theorem, which states that
x★y(z)=x(z)y(z). (4.13)
In other words, the value of the bilateral z-transform at z of the convolution of x and y is equal to the product of the bilateral z-transform of x at z and the bilateral z-transform of y at z.
4.5 Cross-Correlation Examples
The discrete cross-correlation of two infinite right-sided sequences a and b is a two-sided sequence, the elements of which are defined as follows:
Alternatively, the formula for the n-th element of the cross-correlation sequence can be stated as:
assuming that the product
To illustrate this operation,
From this example it should be clear that cross-correlation is similar to convolution, but also that there are some key differences. First, we don't need to reverse the second sequence. Its elements appear on the tape in their original order. Thus, the temporal order of both sequences is preserved by this operation. Second, we now need negative indices to index all elements of the cross-correlation sequence. In other words, the cross-correlation of two right-sided sequences is a two-sided sequence. This was not the case for convolution. Third, each element of the first sequence must be conjugated before it is multiplied by its corresponding element of the second sequence because this is how the operation is defined. If the first sequence is a real sequence, then the conjugation can be dropped as it has meaning only for complex numbers. For complex sequences, however, the conjugation is required. Finally, to distinguish between these two operations, we will use ★ for cross-correlation and ★ for convolution.
To make this example a bit more concrete, let's consider the case when a=(2, 2, 1) and b=(1, 2, 3).
4.5.1 Cross-Correlation is Not Commutative
Unlike convolution, cross-correlation is not a commutative operation. In other words, swapping the order of the two sequences leads to a different result. To demonstrate this,
4.5.2 The Bilateral Z-Transform of the Cross-Correlation Sequence
Let c be a two-sided sequence. The bilateral z-transform of c is defined as:
This definition is true for any two-sided sequence c. In particular, if we set cn=(a★b)n, then we can calculate the bilateral z-transform of the cross-correlation of a and b. In other words, because the cross-correlation of the sequence a and the sequence b is itself a sequence it is possible to compute the bilateral z-transform of that sequence as well. That is,
As in the previous examples, let a=(a0, a1, a2) and b=(b0, b1, b2). Because in this case both a and b are finite right-sided sequences formula (4.17) simplifies to:
To compute the bilateral z-transform of a★b we can expand the double sum in (4.18). Alternatively, we can multiply each element (a★b)n of this cross-correlation sequence by z−n and then add all products. In other words,
a★b(z)=(a★b)−2z2+(a★b)−1z1+(a★b)0z0+(z★b)1z−1+(a★b)2z−2. (4.19)
Substituting the values for (a★b)n from
a★b(z)=(
If we perform the multiplications in each row of (4.20) and arrange the terms in columns, such that they are grouped by their common elements of b, then the following pattern emerges:
If we “push up” the columns of the previous expression so that they lineup on top, then we get:
a★b(z)=
Swapping the order of the rows, such that the first becomes last and the last becomes first, leads to the following expression:
a★b(z)=
After factoring out the common terms and powers of z in each row we get:
a★b(z)=
Finally, this can be expressed as:
In other words, the value of the bilateral z-transform, evaluated at z, of the cross-correlation of a and b can be expressed as the product of the complex conjugate of the unilateral z-transform of a evaluated at 1/
4.5.3 The Unilateral Z-Transform of the Cross-Correlation Sequence
Similar to the previous section, let c be a complex sequence. The unilateral z-transform of c is defined as:
Note that in this case the lower-bound of the sum starts from 0 and not from −∞ as it was the case with the bilateral z-transform. If c is a two-sided sequence, then the elements in its left tail are simply ignored for the purposes of calculating c+(z).
Let a and b be two right-sided sequences. The cross-correlation of a and b is a two-sided sequence denoted by (a★b)=( . . . , (a★b)−2, (a★b)−1, (a★b)0, (a★b)1, (a★b)2, . . . ). If we set cn=(a★b)n in formula (4.26), then we can calculate the unilateral z-transform of this cross-correlation sequence as follows:
Once again, only the elements in the right tail of the cross-correlation sequence are needed to calculate b+(z). These elements are indexed by n, which is either positive or zero. Therefore, the inner sum in (4.27) can start from m=0, because max(0, −n)=0 when n≥0.
To give a concrete example, let a=(a0, a1, a2) and b=(b0, b1, b2). The cross-correlation sequence for a and b was already calculated and is shown in
Alternatively, we could compute only the elements of the cross-correlation sequence for which n≥0. This is shown in
From
Unlike formulas (4.12), (4.13), and (4.25), the expression in (4.28) cannot be factored into a product of two z-transforms. However, as described in Section 4.6, this expression can be rewritten as the sum of three different terms, each of which can be computed incrementally as the two sequences unfold in time.
4.5.4 An Alternative Formula for a★b+(z) that Uses the Heaviside Function
This section states the formula for the unilateral z-transform of the cross-correlation of two sequences in an alternative form. To derive the new formula, we will start with formula (4.23) for the bilateral z-transform, which is replicated below:
a★b(z)=
The terms of this formula are arranged in a grid pattern such that each row contains the same element of the sequence a and each column contains the same element of the sequence b. If we index the rows with j and the columns with k, then each of these terms will have the form
Our goal is to derive a similar expression for the unilateral z-transform of a★b. To do this, we will start with formula (4.29) and highlight in gray all terms that don't appear in a★b+(z), i.e.,
All of these terms are in the lower-triangular part of the grid. Note that these same terms are also highlighted in gray in
Note that in
If we remove the highlighted terms from formula (4.31), then we get:
which is an expression for the value of the unilateral z-transform, evaluated at z, of the cross-correlation of a and b.
The formula for a★b+(z) can also be expressed in the following alternative form:
In this formulation, H is the Heaviside function, which is defined as:
In other words, if the argument n of H (n) is greater than or equal to 0, then the function is equal to 1. If the argument is less than 0, then the function is equal to 0. This function is often called the unit step function and sometimes it is denoted with u(n).
It is worth spending some time to study formulas (4.29) and (4.30) and how they relate to formulas (4.32) and (4.33). From these it should be clear that the ‘+’ in a★b+(z) ignores the left tail of the cross-correlation sequence. It should also be clear that the same effect can be achieved with the Heaviside function. That is, the double sum in (4.33) enumerates all possible combinations of the j and k indices, but the H function multiplies some of them by 0 and others by 1. The ones that are multiplied by zero are the shaded elements in the lower-triangular part of (4.31), which come from the left tail of the cross-correlation sequence.
Formula (4.33) offers a compact way to express the value of the unilateral z-transform at z of a★b. From an algorithmic point of view, however, this expression is not computationally efficient. The reason for this is that the double sum in (4.33) explicitly enumerates all possible combinations of the two indices j and k. In other words, even though almost half of all terms are multiplied by the zeros generated by the Heaviside function they are still enumerated by the formula. Section 4.6 describes another way to calculate the same value that is much faster. Nevertheless, it is worth remembering formula (4.33) as it will be used in some sections below.
4.5.5 Six Different Formulas for Computing a★b+(z)
Let a=(a0, a1, . . . , aT−1) and b=(b0, b1, . . . , bT−1) be two complex sequences of length T. The formula for the unilateral z-transform of the cross-correlation of a and b can be expressed as a double sum in six different ways. More specifically, each formula contains elements of a, elements of b, and powers of z. Thus, there are three choices for the index of the outer sum and two choices for the index of the inner sum. In total, there are 3×2=6 possible combinations. Each combination leads to a formula, but all six formulas compute the same result, i.e., a★b+(z).
4.6 Concatenation Theorem Example
To illustrate the concatenation theorem we will use two abstract right-sided sequences of length five: a=(a0, a1, a2, a3, a4) and b=(b0, b1, b2, b3, b4), which are shown in
Without loss of generality, we will represent both a′ and a″ with sequences of length five that are padded with the appropriate number of zeros. In other words, a′=(a0, a1, a2, 0, 0) and a″=(0,0, 0, a3, a4). Now the original sequence a can be represented as the elementwise sum of the “prefix” and the “suffix,” i.e., a=a′+a″. Similarly, the sequence b can be represented as b=b′+b″, where b′=(b0, b1, b2, 0, 0) and b″=(0, 0, 0, b3, b4).
If we set a′=(a0, a1, a2) and a″=(a3, a4), then it is also possible to represent a as the concatenation of a′ and a″, i.e., a=a′∥a″. A similar representation can also be used for b such that b=b′∥b″. The theorem was first proved in this way, which is why we called it the concatenation theorem. Mathematically speaking, however, the notation and the proof are simpler if the prefixes and the suffixes are padded with zeros.
The concatenation theorem states that the value of the unilateral z-transform at z of the cross-correlation of a and b can be expressed as the sum of three terms. The first of these terms is the unilateral z-transform of a′★b′ evaluated at z. The second term is the unilateral z-transform of a″★b″ also evaluated at z. Finally, the third term is the product of the complex conjugate of the unilateral z-transform of a′ evaluated at 1/
In other words, the concatenation theorem states that:
a★b
+(z)=a′★b′(z)+a″★b″(z)+
The rest of this section starts with the left-hand side of this expression and shows that by rearranging and grouping its terms we can derive the right-hand side.
If we perform the multiplications in (4.36) and arrange the resulting expression such that the terms from the first row are now placed along the main diagonal of an imaginary grid, the terms from the second row are placed on the superdiagonal, and so on until the only term from the last row is placed in the upper-right corner, then we will get the following:
The terms of the previous expression can be split into three groups as shown below:
Thus, the unilateral z-transform of a★b can be expressed as the sum of P, Q, and R, i.e.,
a★b
+(z)=P|Q|R. (4.39)
The expression for P contains only elements of a′ and b′, i.e., elements from the prefixes of the two sequences. Furthermore, P can be expressed as the unilateral z-transform of the cross-correlation of a′ and b′ (see also
Similarly, Q contains only elements from the suffixes of the two sequences and can be expressed as the unilateral z-transform of a″★b″ (see also
The expression for R contains terms from both a′ and b″. In other words, this is the only expression that does not respect the prefix-suffix boundary shown in
By adding the expressions for P, Q, and R we can get the equation for the concatenation theorem:
To summarize, the concatenation theorem splits the computation of the unilateral z-transform of the cross-correlation of a and b into three expressions. The first of these expressions depends only on the elements of a′ and b′. Thus, it does not depend on the suffix of a and the suffix of b. The second expression depends only on the elements of a″ and b″. Thus, it does not depend on the prefix of a and the prefix of b. The third expression depends only on a′ and b″. In other words, it depends on the prefix of the first sequence and on the suffix of the second sequence. Luckily, this third expression can be expressed as the product of the unilateral z-transform of a′ evaluated at 1/
Finally, it is worth stating explicitly something that should be clear from the previous discussion, but may be lost in all of the details. The concatenation theorem is not about a fast way of computing the cross-correlation of two sequences. It is about a fast way of computing the unilateral z-transform, evaluated at one specific z, of the cross-correlation of two sequences. There is a difference between these two. For example, to compute the cross-correlation sequence we need to compute each and every one of its elements. To compute the unilateral z-transform at z of the cross-correlation sequence, however, we don't need to compute the individual elements of this sequence explicitly. Instead, each element is computed implicitly as a set of product terms, the sum of which is equal to that element. This sum, however, is never performed. Instead, the product terms for each element of the cross-correlation sequence are multiplied by their corresponding power of z and are then added in a specific order with the product terms for the other elements of the cross-correlation sequence to the large sum that constitutes the unilateral z-transform. At the end, the result for a★b+(z) is still the same, but the freedom to add these terms in this specific order allows for a fast incremental computation. The encoding algorithm uses this property, which gives it its nice computational complexity. The algorithm is illustrated in one of the following sections.
4.6.1 Alternative Derivation
This subsection gives an alternative derivation of the concatenation theorem using the same two sequences, a=(a0, a1, a2, a3, a4) and b=(b0, b1, b2, b3, b4), as in the previous example. Once again, the sequence a is split into a prefix a′ and a suffix a″ such that a=a′|a″. The sequence b is split in a similar way such that b=b′|b″. This representation is illustrated in
Because both a and b can be expressed as the sum of two sequences, the cross-correlation of a and b can be expressed as follows:
a★b=(a′+a″) * (b′+b″). (4.44)
Using the properties of cross-correlation this can be further expanded as:
a★b=a′★b′+a′★b″+a″★b′+a″★b″. (4.45)
Furthermore, because the z-transform is a linear operation we can take the unilateral z-transform of both sides of the previous equation to obtain:
a★b
+(z)=a′★b′+(z)+a′★b″+(z)+a″★b′+(z)+a″★b″+(z). (4.46)
This expression has four terms. The first term, a′★b′+(z), is equal to P as derived above in equation (4.40). Because this term appears in the formula for the concatenation theorem we don't need to modify it any further. Similarly, the fourth term, a″★b″+(z), is equal to Q, which was derived in equation (4.41), and thus it also does not need to be modified any further.
The second term in (4.46) is a′★b″+(z). This term is equal to the expression for R that was derived in (4.42). To see why this is the case, consider
a′★b″
+(z)=a′★b″+(z). (4.47)
Using the cross-correlation theorem for the bilateral z-transform, i.e., formula (4.25), this derivation can be continued in the following way:
a′★b″
+(z)=a′★b″+(z)=
As expected, the right-hand side of this expression is equal to the expression for R.
The third term in equation (4.46) is a″★b′+(z). This term, however, is equal to 0 and thus it can be dropped.
By combining all of these results we can express formula (4.46) in the following way:
If we express R in terms of (4.48) and drop the third term because it is equal to zero, then we will get:
a★b
+(z)=a′★b′+(z)+a″★b″+(z)+
which is the familiar expression for the concatenation theorem.
4.7 Two Special Cases of the Concatenation Theorem
This section illustrates two special cases of the concatenation theorem. In the first case the two sequences a and b are split such that the two suffixes are both of length 1. In the second case the sequences are split such that the two prefixes are of length 1. The reason why these two special cases are interesting is because they lay the mathematical foundations for the encoding and the decoding algorithm, respectively.
4.7.1 When Both Suffixes are of Length One
The two sequences in this example are a=(a0, a1, a2, . . . , ak−1, ak) and b=(b0, b1, b2, . . . , bk−1, bk). Both sequences are of length k+1. In this special case the sequences are split as shown in
Once again, it is mathematically more convenient if we represent both the prefix and the suffix with sequences of length k+1 that are padded with the appropriate number of zeros. In this case, a′=(a0, a1, a2, . . . , ak−1, 0) and a″=(0, 0, 0, . . . , 0, ak). The sequence a can be obtained from the elementwise sum of a′ and a″, i.e., a=a′+a″. Similarly, the sequence b is the sum of b′ and b″, where b′=(b0, b1, b2, . . . , bk−1, 0) and b″=(0, 0, 0, . . . , 0, bk).
The concatenation theorem applied to these sequences states that:
a★b
+(z)=a′★b′+(z)+a″★b″+(z)+
Because in this special case a″=(0, 0, 0, . . . , 0, ak) and b″=(0, 0, 0, . . . , 0, bk) it is easy to see that a″★b″(z)=
a★b
+(z)=a′★b′+(z)+
By factoring out the common term bk, the previous expression can be further simplified to:
a★b
+(z)=a′★b′+(z)+(āk+z−k
Furthermore, the term in the brackets simplifies to the value of the unilateral z-transform of the reversed and conjugated sequence a (the entire sequence a, not just the prefix a′), evaluated at z. In other words,
Thus, in this special case, the concatenation theorem simplifies to:
a★b
+(z)=a′★b′+(z)+(z)bk (4.55)
This expression justifies the encoding algorithm. In this notation a★b+(z) can be interpreted as the value of an element of the SSM matrix at the end of some iteration. This matrix element corresponds to the row associated with a and the column associated with b. Similarly, a′★b′+(z) is the value of the same matrix element at the beginning of the iteration. Finally, (z) can be interpreted as the value of the element of the vector h′ that corresponds to the a-channel. For binary sequences, the value of bk determines if the addition should be performed during this iteration. For example, if bk=1, then the a-th element of h′ is added to the corresponding matrix element. On the other hand, if bk=0, then the matrix element remains the same. It is worth emphasizing, once again, that this formula is for just one matrix element.
4.7.2 When Both Prefixes are of Length One
To explain the second special case we will use the same two sequences as in the previous example. In this case, however, the sequences a and b are split such that the two prefixes are of length 1 and the two suffixes are of length k. This split is shown in
Using the convention from the previous section, the two prefixes a′ and b′ can be represented with sequences that contain one element followed by k zeros, i.e., a′=(a0, 0,0, . . . , 0) and b′=(b0, 0, 0, . . . , 0). Similarly, the two suffixes a″ and b″ can be represented with sequences that have one leading zero followed by k elements, i.e., a″=(0, a1, a2, . . . , ak−1, ak) and b″=(0, b1, b2, . . . , bk−1, bk). As before, the sequence a can be represented as the elementwise sum of a′ and a″, i.e., a=a′+a″. Similarly, b=b′+b″.
Applying the concatenation theorem to the sequences in this special case we get:
a★b
+(z)=a′★b′+(z)+a″★b″+(z)+
Because both a′ and b′ contain only one non-zero element we can simplify this formula by noting that a★b+(z)=
a★b
+(z)=
By factoring out the common term
a★b
+(z)=
Using the properties of the z-transform, the expression in the brackets can be simplified to b+(z), i.e., the unilateral z-transform of the entire sequence b, not just the suffix b″. That is,
Using this result, the concatenation theorem simplifies to:
a★b
+(z)=
By rearranging the terms, the formula can also be stated in this form:
a″★b″
+(z)=a★b+(z)−
This expression is the mathematical justification for the decoding algorithm. The term a″★b″+(z) can be interpreted as the value of an element of the SSM matrix after the first decoding iteration. The term a★b+(z) can be interpreted as the value of the same matrix element before the decoding starts. The term b+(z) can be interpreted as the value of the b-th element of the vector h″, i.e., the one that corresponds to the b-channel. For binary sequences, the value of
4.8 Three Different Ways to Calculate the Same Sum
By definition, the value of the unilateral z-transform, evaluated at z, of the cross-correlation of two right-sided sequences a and b is a sum. Each term of this sum is equal to the product between an element of the sequence (a★b) and a corresponding negative power of z. Each element of the cross-correlation sequence, however, is also expressible as a sum. Thus, the z-transform expression can be viewed as a sum of sums. If all terms of this expression are expanded, then certain regularities emerge that make it possible to compute the value of a★b+(z) in three different ways.
To give a concrete example, let a=(a0, a1, a2, a3, a4) and b=(b0, b1, b2, b3, b4) be two right-sided sequences. The value of the unilateral z-transform, evaluated at z, of the cross-correlation of a and b is given by formula (4.37), which is replicated below:
This formula expresses the value of a★b+(z) as a sum and arranges the individual terms of this sum in a specific grid pattern. Each term of this sum has the following form:
4.8.1 Summing Along the Diagonals
The first method of computing a★b+(z) starts by adding the terms in each diagonal of formula (4.62) and then adds all partial results.
The five diagonal sums in this example can be expressed in the following form:
diag0=(
diag1=(
diag2=(
diag3=(
diag4=(
The terms in the parentheses are equal to the elements of the cross-correlation sequence (a★b). Thus, the sum in (4.62) can be expressed as:
which is equal to the value of the unilateral z-transform of a★b, evaluated at z.
4.8.2 Summing Along the Columns
The second method calculates the same value, a★b+(z), but it groups the terms of formula (4.62) based on their common element from the sequence b. As shown in
4.8.3 Summing Along the Rows
The third way of calculating the sum factors out the common element
4.8.4 Summary
All three of these methods produce the same result, namely the value of the unilateral z-transform, evaluated at z, of the cross-correlation of a and b. This should not be surprising as all three methods add the same terms, they just add them in different order. The first method is the traditional method of computing a★b+(z). It groups the terms of formula (4.62) along the diagonals and then adds all diagonal sums. If the elements of the cross-correlation sequence (a★b) are known, then this should be the preferred way to calculate a★b+(z). However, if the cross-correlation sequence is not known in advance, then one of the other two methods should be used as they can be implemented to run faster by reusing partial results from the previous iterations, which is not possible with this method.
The second method groups the terms by columns and then adds the values of all column sums. This method can be further optimized as the value of the next column sum can be efficiently computed using the value of the previous column sum. This is the method that the encoding algorithm uses.
The third method is used by the decoding algorithm. Instead of computing the value of a★b+(z), however, the decoding algorithm starts with this value and subtracts the values of the row sums from it, one by one. Computational efficiency can be achieved in this case as well, because it is possible to quickly calculate the value of row k+1 given the value of row k.
5 ZUV Algorithms
This chapter extends the encoding and decoding algorithms to work with exponentially weighted sequences. These extensions were designed to overcome the decoding limitations described in Chapter 2. The modified algorithms can decode the matrices for sequence pairs of arbitrary length.
The names of the new algorithms start with the prefix ZUV. The three letters in this prefix correspond to three parameters of the algorithms that have the following meaning: z is the point at which all unilateral z-transforms in the formulas are evaluated; u is a parameter that determines the rate of exponential decay (or growth) of the elements of the first sequence; and v is another parameter that determines the rate of exponential decay (or growth) of the elements of the second sequence.
In all previously described encoding algorithms the input character sequences S′ and S″ were represented with a collection of binary sequences. The ZUV encoding algorithm also works with a pair of character sequences, each of which is represented with a set of binary sequences. Before these binary sequences are processed, however, the encoding algorithm scales each of them using exponentially decaying (or exponentially growing) weights. The resulting scaled sequences are no longer binary. The parameter u controls the exponential weights for the sequences that jointly represent S′.
In
The mapping shown in
To explain the mathematical justifications behind the ZUV algorithms we will extend the derivations from Chapter 4 to exponentially weighted sequences. Using this previous methodology, we will focus on only one element of the matrix. Without loss of generality, we will pick the element in the a-th row and b-th column. This element, which will be denoted with Ma,b, is equal to the value of the unilateral z-transform at z of the cross-correlation of the sequences a and b. In other words Ma,b=a★b+(z). So far this is similar to the derivations in Chapter 4. The difference is that the sequences a and b are now exponentially weighted as described below.
Let a=(a0, a1, . . . , aT−1) and b=(b0, b1, . . . , bT−1) be two binary sequences, i.e., each of their elements is equal to either 0 or 1. Also, let a be an exponentially weighted version of â. In other words, a=(a0u0, a1u−1, . . . , aT−1u−(T−1)), where u is a parameter that determines the rate of decay (or growth) of the weight assigned to each element of â. If the sequences are represented with vectors, then a will be equal to the element-by-element product of â and u, where u=(u0, u−1, . . . , u−(T−1)). Note that the sequence a is no longer binary. Similarly, let b be an exponentially weighted version of {circumflex over (b)}. That is, b=(b0v0, b1v−1, . . . , bT−1v−(T−1)), where v is a parameter that determines the rate of decay (or growth) of the weight for each element of {circumflex over (b)}. If the sequences are treated as vectors, then b will be equal to the element-by-element product of {circumflex over (b)} and v, where v=(v0, v−1, . . . , v−(T−1)). The sequence b is not binary either.
Using the methodology described in Chapter 4, we can express the value of the unilateral z-transform at z of the cross-correlation of the exponentially weighted sequences a and b as follows:
All terms in this sum have the following form:
This pattern suggests that the terms of formula (5.1) can be grouped in three different ways, i.e., there are three different ways to compute this sum (see Section 4.8). First, the terms can be grouped by their common power of z. This corresponds to adding the terms along each diagonal and then adding all partial sums, which is the traditional way to compute a★b+(z). Second, the terms can be grouped based on their common bkv−k factor. In this case, computing the overall sum is done by adding the terms in each column and then adding all column sums. Because each column sum can be computed very quickly using the value of the previous column sum this leads to a nice optimization that is used by the ZUV encoding algorithm. Finally, the terms of (5.1) can be grouped by their common
For the sake of convenience, the encoding formulas are shown below:
The decoding formulas from are also shown below:
M
a,b
[k+1]=Ma,b[k]−
h″
b
[k+1]=(h″b[k]−bk)z. (5.7)
The ZUV algorithms use these same formulas, but replace all instances of ak and bk with aku−k and bkv−k, respectively. The derivations and optimizations are discussed in the next two sections.
5.1 ZUV Encoding Algorithm
This section describes the ZUV encoding algorithm. This is done in three steps. First, the update formulas are derived for individual elements of the matrix and the two vectors. Next, the algorithm is described. Finally, four numerical examples of encoding are given for different values of the parameters z, u, and v.
Let a=(a0u0, a1u−1, . . . , aT−1u−(T−1)) and b=(b0v0, b1v−1, . . . , bT−1v−(T−1)) be two exponentially weighted sequences of length T. Let a′=(a0u0, a1u−1, . . . , aT−2u−(T−2), 0) and a″=(0, 0, . . . , 0, aT−1u−(T−1)) be two other sequences of length T such that the sequence a can be obtained from the elementwise sum of a′ and a″, i.e., a=a′+a″. Also, the sequence b can be represented as the sum of b′ and b″, where b′=(b0v0, b1v−1, . . . , bT−2v−(T−2), 0) and b″=(0, 0, . . . , 0, bT−1v−(T−1)). The concatenation theorem for the unilateral z-transform applied to the sequences a and b states that
a★b
+(z)=a′★b′+(z)+a″★b″+(z)+
Because in this special case both a″ and b″ contain only one element that is not explicitly set to zero, the previous formula can be simplified as shown in Section 4.7.1, i.e.,
a★b
+(z)=a′★b′+(z)+(z)bkv−k, (5.9)
where it is assumed that k=T−1. The individual terms of this expression can be interpreted as follows:
In other words, a★b+(z) is the value of the matrix element Ma,b in the a-th row and b-th column after the k-th iteration. Similarly, a′★b′+(z) is the value of the same matrix element after the (k−1)-st iteration. The term (z) is the value of the a-th element of the vector h′ during the k-th iteration. Finally, bkv−k is the k-th element of the exponentially weighted sequence b. A similar reasoning can be used to extend this formula to any k between 0 and T−1. This turns this formula into an iterative update formula.
Formula (5.10) requires the value of v−k. To avoid computing this value from scratch during each iteration we will use a helper variable {circumflex over (v)}. This variable is initially set to 1. It is updated using the following recurrence: {circumflex over (v)}[k]={circumflex over (v)}[k−1]/v, where v is one of the parameters of the ZUV algorithm. Thus, {circumflex over (v)}[k]=v−k. Using this helper variable we can rewrite the bottom row of (5.10) as follows:
M
a,b
[k]=M
a,b
[k−1]+bkh′a[k]{circumflex over (v)}[k]. (5.11)
Formula (5.11) uses the value of h′a[k], i.e., the value of the a-th element of the vector h′ during the k-th iteration. This value can be computed with the following iterative formula
which is derived from formula (5.3). In other words, to compute the new value of h′a at iteration k, this formula uses the old value of h′a at iteration k−1, and divides it by z. It also adds the conjugate of the k-th element of a, i.e.,
Formula (5.12) needs the value of u−k, which must be computed for each iteration. To avoid doing extra work, we will introduce another helper variable û such that û[k]=u−k. Initially this variable is set to 1 and it is updated as follows: û[k]=û[k−1]/u. Substituting û[k] into (5.12) we get the following update formula:
The ZUV encoding algorithm also needs to compute the vector h″. Adapting formula (5.5) to the exponentially weighted sequence b we get
h″
b
[k]=h″
b
[k−1]+(bkv−k)z−k. (5.14)
Using the helper variables {circumflex over (v)}[k]=v−k and {circumflex over (z)}[k]=z−k, this expression can be rewritten as
h″
b
[k]=h″
b
[k−1]+bk{circumflex over (v)}[k]{circumflex over (z)}[k]. (5.15)
To get a better understanding of how formula (5.15) works, recall that if the sequence length is T=k+1 the value of h′b[k] is equal to b″+(z). In other words, the b-th element of the vector h″ is equal to the value of the unilateral z-transform at z of the exponentially weighted sequence b. Since b=(b0v0, b1v−1, . . . , bkv−k) we can express h′b[k] as follows:
h″
b
[k]=(b0v0)z0+(b1v−1)z−1+ . . . +(bk−1v−(k−1))z−(k−1)+(bkv−k)z−k. (5.16)
Using the helper variables {circumflex over (v)} and {circumflex over (z)}, we can rewrite this formula as follows:
Because the sum of all terms except the last one is equal to h″b[k−1], it should be easy to see why this is equivalent to (5.15).
To summarize, the ZUV encoding algorithm uses the following iterative formulas:
The algorithm also uses three helper variables: {circumflex over (z)}, û, and {circumflex over (v)} such that {circumflex over (z)}[k]=z−k, û[k]=u−k, and {circumflex over (v)}[k]=V−k . These are also computed iteratively.
The ZUV encoding algorithm has five input arguments. The first two are the two input sequences S′ and S″. It is assumed that these are integer sequences, such that each integer maps to a character from the corresponding alphabet. Also, it is assumed that the sizes of the two alphabets are M′ and M″, respectively. The other three input arguments are z, u, and v. Their meaning was described above. In this implementation these three arguments are assumed to be real numbers.
The algorithm starts by initializing the matrix M, which is of size M′ by M″, with zeros. It zeros the vector h′, which is a vector of size M′. It initializes the vector h″, a vector of size M″, with zeros as well. The three helper variables {circumflex over (z)}, û, and {circumflex over (v)} are initialized to 1.
The main loop of the algorithm goes from 1 to T, where T is the length of the two input sequences. If the sequence length is unknown, then the algorithm can read the sequences one character at a time until a timeout occurs or until a terminating character is reached.
The algorithm has two independent inner loops. The first inner loop divides the values of all elements of the vector h′ by z. This implements the division by z in formula (5.18). Because this algorithm works with real numbers, the conjugation in this formula can be dropped. Also, the multiplication by ak may not be performed explicitly since ak is binary (see the discussion below).
The incoming characters from both sequences are assumed to be integers. Formula (5.20) updates the value of one element of the h″ vector The multiplication by bk can be implicit in this algorithm because bk is either 0 or 1. In other words, the formulas described in this section use the exponentially weighted sequence b, but the underlying sequence {circumflex over (b)}=(b0, b1, . . . , bk) is binary. If bk=1 then the multiplication by bk can be skipped as anything multiplied by 1 is equal to itself. On the other hand, if bk=0, then the entire product is equal to zero so there is no need to perform the multiplication either. The algorithm can use the mutual exclusivity between the binary sequences that correspond to each element of the vector h″. For example, in the representation for the sequence S″=ABA shown in
The second inner loop of the algorithm updates the matrix by implementing formula (5.19). The value of h′i can be scaled by the current value of {circumflex over (v)} before the product is added to the corresponding element of the matrix. The value of h′i, however, is not modified. The multiplication by bk can be implicit here as well.
The helper variables {circumflex over (z)}, û, and {circumflex over (v)} can be updated by dividing each variable by the corresponding parameter z, u, or v. In other words, each update implements an exponential decay (or growth).
At the end of all iterations the algorithm returns the computed value of the matrix M, the vector h′, and the vector h″.
The computational complexity of this algorithm is O(TM′). This is the same complexity as with all previous encoding algorithms that are not performed on a parallel machine. In other words, the outer loop is executed T times and each of the two inner loops, which are independent of each other, is executed M′ times.
When studying these figures, recall that the contents of h′ decay with each iteration and that the rate of decay is controlled by z. Alternatively, if z=1, then the values in h′ don't decay (see
5.2 ZUV Decoding Algorithm
The decoding algorithm is justified by another special case of the concatenation theorem. In this case the prefixes of the two sequences are one character long. Let a=(a0u0, a1u−1, . . . , aT−1u−(T−1)) and b=(b0v0, b1v−1, . . . , bT−1v−(T−1)) be two exponentially weighted sequences of length T. Furthermore, let a′=(a0u0, 0, 0, . . . , 0) and a″=(0, a1u−1, a2u−2, . . . , aT−1u−(T−1)) be two other sequences of length T such that the sequence a is equal to the element-by-element sum of these two sequences, i.e., a=a′+a″. Similarly, let b=b′+b″, where b′=(b0v0, 0, 0, . . . , 0) and b″=(0, b1v−1, b2v−2, . . . , bT−1v−(T−1)) are two exponentially weighted sequences. The concatenation theorem, applied to these sequences, states that
a★b
+(z)=a′★b′+(z)+a″★b″+(z)+
As described in Section 4.7.2, for this special case, in which a′ and b′ have only one element that is not explicitly set to zero, the formula can be simplified to the following form
a″★b″
+(z)=a★b+(z)−
where a0u0 is the zeroth element of the sequence a.
The individual terms of formula (5.22) can be interpreted as follows:
In this case, Ma,b[0] is the value of the matrix element in row a and column b at the start of decoding. This is the same value that the encoding algorithm computed at the end of encoding. Ma,b[1] is the value of the same matrix element at the start of the next iteration. The term b+(z) can be interpreted as the value of the b-th element of the vector h″ at the start of decoding. For an arbitrary iteration, formula (5.23) can be stated as:
M
a,b
[k+1]=Ma,b[k]−
The decoding algorithm also needs to update the vector h″. Adapting formula (5.7) to exponentially weighted sequences we get
h″
b
[k+1]=(h″b[k]−bkv−k)z. (5.25)
To optimize the computation of the negative powers of u and v, we will use two helper variables û and {circumflex over (v)} such that û[k]=u−k and {circumflex over (v)}[k]=v−k. These variables are initially set to 1. During each iteration û is updated as follows: û[k+1]=û[k]/u. Similarly, {circumflex over (v)} is updated using the following recurrence: {circumflex over (v)}[k+1]={circumflex over (v)}[k]/v. Using these helper variables, the update formulas for the ZUV decoding algorithm can be stated as follows:
M
a,b
[k+1]=Ma,b[k]−
h″
b
[k+1]=(h″b[k]−bk{circumflex over (v)}[k])z. (5.27)
Note that the update of the matrix element is first, followed by the update of the vector h″.
The ZUV decoding algorithm has six input arguments. The first three arguments are the matrix M, the vector h″, and the character sequence S″. The other three arguments are the parameters z, u, and v, which were described above and after which the algorithm is named. All three arguments can be real numbers.
The algorithm can use two helper variables û and {circumflex over (v)} to compute the negative powers of u and v. Both of these can be initially set to 1. Their values can updated at the end of each iteration.
The main loop of the algorithm performs T iterations, where T is the length of the second sequence S″. To find the next character to decode, the algorithm iterates over all M′ rows of the matrix. For each row it also iterates over all M″ columns. In each of these iterations, the algorithm checks whether the elements of the vector h″, scaled by the current value of û, can be subtracted from their corresponding elements of the matrix without any of the matrix elements becoming negative. This condition must be true for all elements in the row. In other words, a single row element has veto power, which is suggested by the variable with the same name. If all elements in some row satisfy this condition, then the algorithm decodes the character that corresponds to this row. If no rows satisfy this condition, then the algorithms breaks out of its main loop and returns the partial sequence that has been decoded up to this point. If the elements of h″ are all zeros while the algorithm is searching for the next character to decode, the algorithm exits as well. In a way, this approach implicitly checks if T or the length of S″ is longer than the length of the sequences that were used to encode the matrix. If that were the case, then the vector h″ would be depleted before the last iteration and would contain only zeros.
Next, the algorithm performs the subtraction in formula (5.26). More specifically, it multiplies h″ by û and subtracts the resulting vector from the selected row of the matrix. This can be done in a loop that iterates over all elements of the row. Just in case, the algorithm may check if the new value of each row element is still positive. Finally, the algorithm appends the index of the decoded row to the output sequence S′. This process is repeated T times.
The incoming character from the second character sequence S″ can be stored in the variable b. Once again, it is assumed that the characters are uniquely mapped to the integers from 1 to M″. The value of the b-th element of the h″ is reduced by {circumflex over (v)}, as described by formula (5.27). The second part of this update, i.e., the multiplication by z that completes the left shift, is performed for all elements of h″. That is, formula (5.27) can be implemented by the algorithm in two parts; first the subtraction and then multiplication by z. Once again, because bk is binary, the multiplication by bk can be implicit. The same is true for the multiplication by ak in formula (5.26). This optimization can also be used during encoding and was explained in Section 5.1.
The algorithm also checks whether the element of h″ from which {circumflex over (v)} was subtracted becomes negative. If yes, then the algorithm exits and returns what was decoded up to that point. This condition should not be triggered if the same S″ is used for decoding as the one that was used during encoding.
After the last iteration the algorithm returns the decoded sequence S′. Note that the output sequence is not exponentially weighted. It is just a character sequence that is mapped to an integer sequence.
The computational complexity of this algorithm is O(TM′M″). If the search for the next character to decode is implemented to run in parallel, then the complexity can be reduced to O(TM″).
Finally,
5.3 ZUV Evaluation
Two sufficient conditions for deterministic ZUV decoding were derived. They depend only on the values of the parameters z, u, and v. These conditions are:
u·v≥2 or u≥2z. (5.28)
Because these two conditions are independent of each other, there are four possible cases depending on which one of them is satisfied or not satisfied. These four cases are listed in
6 Encoding and Decoding Algorithms for Sequences with Gaps
The algorithms described so far assumed that their input sequences are like words, i.e., that they contain no spaces. This chapter modifies the algorithms so that they can work with input sequences that are more like sentences, or strings, which may contain spaces. We will refer to these spaces as gaps. In the examples the gaps will be denoted with the underscore character, i.e., ‘_’. The algorithms introduced in this chapter are special cases of the ZUV algorithms when u=v=1. The ZUV algorithms for sequences with gaps are described in Chapter 7.
A gap can be modeled in several ways. One way is to treat the gap as yet another letter in the alphabet. In this case the algorithms do not have to be modified. The drawback of this approach is that the dimensions of the matrix have to be increased, i.e., both M′ and M″ have to be incremented by one, which requires additional storage for the matrices and also increases the amount of computation. This chapter models the gaps in a different way that keeps the alphabet size the same (much like the space symbol is not part of the English alphabet). As a result of this the matrix size remains the same, but the algorithms have to be modified. Understanding these changes and their effects on the encoding and decoding process could provide some valuable insights for understanding the continuous-time algorithms described in Chapter 8.
Character sequences can be represented with a collection of binary sequences.
6.1 The Encoding Algorithm (with Gaps)
The algorithm is similar to the previous encoding algorithms, but this one can handle sequences with gaps, while the previous ones cannot. The new modifications here are two if statements. The first one checks the incoming character on the sequence S′. If it is a gap, then the update of the vector h′ is skipped. The exponential decay of h′, however, is still performed at each iteration. The second if statement checks whether the incoming character on the sequence S″ is a gap, and if that is the case the updates of the vector h″ and the matrix M are skipped. The update of the helper variable {circumflex over (z)}, however, is performed during all iterations. In other words, a gap in S″ will suppress the update of h″, but the magnitude of {circumflex over (z)}, which will be added to h″ during the next iteration, will be properly updated.
6.2 The Decoding Algorithm (with Gaps)
Another feature of the algorithm is demonstrated on the third row of
The decoding algorithm is similar in structure to other decoding algorithms. The new things here are two if statements. The first one checks if the candidate character for decoding is a gap. If it is, then the vector h″ is not subtracted from any row of the matrix during this iteration. The second if statement checks whether the incoming character on the sequence S″ is a gap. If that is the case, then no element of h″ is decremented during this iteration. The location of the gaps in S″, however, does not affect the multiplication of all elements of h″ by z, which is always performed in the main loop.
The matrix row from which to subtract h″ is selected similarly to the other decoding algorithms. However, this algorithm is modified to return a null if no suitable row can be identified. This null character is treated as a gap, which is appended it to the output sequence. Another modification checks if the vector h″ contains only zeros. This case is also treated as a gap by the main algorithm. This condition is added in order to handle sequences that end with gaps more uniformly. Thus, if for some reason h″ is depleted and contains only zeros, the algorithm will output only gaps until the length of the output sequence reaches T. An alternative implementation is also possible in which the algorithm terminates immediately and returns the sequence decoded so far.
The computational complexity of this version of the decoding algorithm is O(TM′M″). In other words, the main loop runs for T iterations and during each one of them it calls the helper function, which runs in O(M′M″) time. The extra check during the search for the next decoded character does not affect the overall complexity because summing the elements of h″ takes only O(M″) time. If this search is implemented to run in parallel, then the overall complexity of the algorithm can be reduced to O(TM″).
7 ZUV Algorithms for Sequences with Gaps
This chapter describes modifications to the ZUV algorithms, which were introduced in Chapter 5, that enable them to work with sequences with gaps. These modifications are similar to the modifications that were added to the algorithms described in Chapter 6. In this case, however, gaps are introduced in input sequences that are exponentially weighted.
7.1 ZUV Encoding Algorithm (with Gaps)
The ZUV encoding algorithm with gaps is similar to the original version. The difference is that the encoding algorithm now checks if the current character in either of the two sequences is empty, i.e., if it is a gap. If this is the case for the character from the sequence S′, then the update for the vector h′ is skipped. If the character from the sequence S″ is empty, then both the update of the vector h″ and the update of the matrix M are skipped.
7.2 ZUV Decoding Algorithm (with Gaps)
The ZUV decoding algorithm with gaps is similar to the non-gap version. In this version, however, the character that is decoded during the current iteration can be a gap. Also, the incoming character from the second sequence can be a gap as well. The corresponding updates of the matrix M and the vector h″ are skipped in these cases.
7.3 Evaluation Results for ZUV with Gaps
This section describes an evaluation of the ZUV decoding algorithm that focuses on the case when the sequences may contain gaps.
In
7.4 Another Evaluation with Suffix Filtering of S″
This section repeats the exhaustive enumeration analysis from the previous section, but now the sequence S″ cannot end with a gap (or several gaps in a row).
To verify that this is indeed the reason for the aliasing reported in
All results in this chapter are for M′=2 and M″=2. It is sufficient to perform this analysis only for 2×2 matrices, because if a ZUV model is aliased for 2×2, then it will also be aliased for larger matrices, given that the values of z, u, and v remain the same. To summarize, if u>2z and either vz>2 of vz<1/2, then the ZUV decoding will be perfect, provided that the S″ sequence does not end with a gap.
7.5 Example for T=3
This section gives an example with sequences of length three that shows how a condition for unambiguous decoding of the ZUV model can be derived. This example uses a pair of binary channels a and b, instead of using character sequences. These channels can be viewed as representations of sequences drawn from alphabets that consist of only one character. That is, zeros in a and b correspond to gaps and ones correspond to characters. This example covers only the initial iteration and shows that the first element of a is decoded correctly. The parameters z, u, and v are assumed to be non-zero real numbers.
Let T=3 and let a, b ∈ {0, 1}T be two binary sequences of length T. In this case the ZUV decoding algorithm performs three iterations. For each iteration, the value of dh″b is given by the following equations:
d
h″
b[0]=b0v0z0+b1v−1z−1+b2v−2z−2, (7.1)
d
h″
b[1]=b1v−1z0+b2v−2z−1, (7.2)
d
h″
b[2]=b2v−2z0. (7.3)
The value of dMa,b[0] can be expressed as follows:
If a0=1, then the decoding constraint dMa,b[0]−dh″b[0]≥0 is satisfied because dh″b[0] is already a part of the sum in equation 7.4. Thus, we only need to focus on the case when a0=0 and derive the conditions for which the constraint dMa,b[0]−dh″b[0]≥0 is satisfied. That is, we need to find values for z, u, and v such that if a0=0, then the following inequality must hold:
dMa,b[0]<dh″b[0]. (7.5)
Expanding the value of dMa,b[0] as specified by (7.4) transforms (7.5) into the following inequality:
a
0
u
0 d
h″
b[0]+a1u−1 dh″b[1]+a2u−2 dh″b[2]−dh″b[0]<0. (7.6)
Subsequently, we can plug a0=0 and expand dh″b[0], dh″b[1], and dh″b[2] as follows:
a
1
u
−1(b1v−1z0+b2v−2z−1)+a2u−2(b2v−2z0)−(b0v0z0+b1v−1z−1+b2v−2z−2)<0. (7.7)
The terms in the previous inequality are grouped by the elements of a. Instead, we can regroup them by the elements of b as shown below:
−b0v0z0+b1v−1(a1u−1z0−z−1)+b2v−2(a1u−1+a2u−2z0−z−2)<0. (7.8)
Furthermore, in each of the terms it is possible to rearrange the powers of u, v, and z so that the inequality is expressed using integer powers of (uv) and u/z:
Multiplying both sides by −1, leads to the following alternative form:
This inequality is easier to express if we let w=u v and x=u/z:
b
0
w
0(x0)+b1w−1(x1−a1x0)+b2w−2(x2−a1x1−a2x0)>0. (7.11)
Furthermore, because a1, a2 ∈ 0, 1], a lower bound can be derived for the left-hand side of inequality (7.11). That is, if we set all a's to 1, then we get:
b
0
w
0(x0)+b1w−1(x1−x0)+b2w−2(x2−x1−x0)>0.
Therefore, a sufficient condition for (7.11) to be satisfied is for the left-hand side of the previous inequality to be positive.
Assuming that at least one of b0, b1, or b2 is nonzero, which is required to have a nonzero Ma,b, the previous inequality holds if each of the three expressions in the parentheses is positive. In other words, a sufficient condition for (7.12) to hold is that the following system of inequalities holds:
x0>0, (7.13)
x
1
−x
0>0, (7.14)
x
2
−x
1
−x
0>0. (7.15)
All three inequalities are satisfied if x>1.618. This constant, however, depends on the sequence length. If we let T→∞, then we can use an argument based on the formula for the sum of a geometric progression to prove that the larger system of inequalities is satisfied for each x≥2. In other words, a0 will be correctly decoded provided that uv>0 and u/z≥2 and at least one bi is 1 for i ∈ {0, 1, . . . , T−1}. This argument is generalized below to decoding of all elements of a using mathematical induction.
7.6 Decoding Theorems for the ZUV Model with Gaps
The following theorem proves that ū/z>2 is a sufficient condition for the correct decoding of the first element a0 of the binary sequence a, given the matrix element Ma,b and the vector element h″b. The proof examines two cases depending on the value of a0. If a0=1, then the proof shows that Ma,b−h″b≥0. On the other hand, if a0=0, then the proof shows that Ma,b−h″b<0. This is accomplished by using the formulas for the sum of a geometric progression to derive an upper bound for that difference.
The rest of this section uses a★b(u,v)(z) to denote the value of the matrix element Ma,b and b(v)(z) to denote the value of the vector element h″b. This notation captures all parameters that affect these values, which makes it more convenient to use in the proofs. The superscripts capture the exponential weighting applied to the elements of the binary sequences a and b. This notation also uses instead of + to denote the unilateral z-transform. More formally,
Theorem 7.1. Sufficient conditions for decoding of the first element of a binary sequence. Let a and b be two binary sequences of length T, i.e.,
a=(a0, a1, a2, . . . , aT−1) ∈ {0, 1}T, (7.18)
b=(b0, b1, b2, . . . , bT−1) ∈ {0, 1}T. (7.19)
In other words, if Ma,b−h″b is non-negative, then a0=1. On the other hand, if Ma,b−h″b is negative, then a0=0.
Theorem 7.1 implies that the ZUV decoding algorithm always decodes the first element of S′ correctly whenever ū/z≥2 and vz>0. This is true even if the S” sequence given to the algorithm at run time is not identical to the S″ sequence used for encoding.
The following theorem generalizes Theorem 7.1 to all elements of the binary sequence a.
for all t ∈ {0, 1, 2, . . . , T−1}. In this formula a[t, T−1] and b[t, T−1] denote the suffixes of a and b that start from at and bt, i.e.,
a[t, T−1]=(at, at+1, at+2, . . . , aT−1), (7.22)
b[t, T−1]=(bt, bt+1, bt+2, . . . , bT−1). (7.23)
Note that the unilateral z-transform formulas in (7.21) map to the values of Ma,b and h″b at at iteration t during decoding. That is, formula (7.21) is similar to (7.20), but it covers the general case. In other words, it covers not just the case when t=0, but also the cases when t=1, 2, . . . , T−1.
In general, the problem of decoding a given b, h″b, Ma,b, z, u, and v is ill-posed. There may be many solutions. However, under the conditions of Theorem 7.2 the decoding problem is well-posed, i.e., there is a unique solution and the decoding of a is perfect.
The next theorem generalizes Theorem 7.2 to a complete ZUV matrix. It states that if ū≥2z, vz>0, and the last character of S″ is not a gap, then the decoding is perfect, given that S″ is provided at run time. That is, under these conditions, there is an unique decoding path and there is no need for additional constraints, e.g., row constraints, because each element is always in agreement with all other elements in the same matrix row.
S′ ∈ {Γ′ ∪ ϵ}T. (7.24)
Let S″ be a sequence of length T drawn from Γ″ such that each element of S″ may be a gap, except for the last element, which is not a gap. More formally,
S″ ∈ {Γ″ ∪ ϵ}T−1×Γ″. (7.25)
Let u, v, and z be three non-zero complex numbers such that the following two conditions are satisfied:
(i) ū/z≥2, (7.26)
(ii) vz>0. (7.27)
Let M be an SSM matrix and let h″ be its corresponding vector computed by the ZUV encoding algorithm. Let Ŝ′ be a sequence computed by the ZUV decoding algorithm from M, h″, and S″. Then, Ŝ′=S′.
7.7 Distributed ZUV Encoding
This section states distributed versions of the ZUV algorithms. The encoding version is distributed by the elements of the matrix. The decoding version is distributed by the rows of the matrix.
The distributed ZUV encoding algorithm encodes just one matrix element, which is denoted with m to distinguish it from the entire matrix M. To encode the whole matrix one needs to run a separate instance of this algorithm for each matrix element. This distributed encoding possibility was mentioned several times. In fact, all encoding formulas were derived for a channel pair, where the two channels were called a and b. In this implementation the binary channel pair is (s′, s″). Note that these are labeled with small letters to distinguish them from S′ and S″, which denote character sequences. The complexity of this algorithm is O(τ), where T is the length of both s′ and s″.
7.8 Distributed ZUV Decoding
The computation in the ZUV decoding algorithm can be distributed by rows. To decode the entire matrix the distributed ZUV decoding algorithm decodes each row in parallel.
The algorithm has 6 inputs. The first input is m, which is an array that holds the values of the matrix elements in one row of the matrix. The second argument is the vector h″. The third argument is S″, which is the English sequence represented as a set of binary channels. Note that it is 2D in this case and the indexing is S″j,t, where j is one of the M″ channels and t is the current index into all channels. The last three arguments are z, u, and v, which control the exponential decay as usual. In this case, however, these can be arrays, not just numbers. Thus, this algorithm makes it possible to handle the case in which each element of the matrix has a different z, u, and v.
8 Continuous-Time Formulation for Spike Trains
This chapter derives the mathematical expressions for the SSM representation when the inputs are spike trains instead of discrete sequences. In this continuous-time formulation the temporal distances between the spikes are represented with real numbers and not with integers as in the discrete case. The proofs are analogous to the proofs in the discrete-time case, but now the formulas use functions instead of sequences.
8.1 The Continuous Cross-Correlation
The continuous cross-correlation has similar properties to the discrete cross-correlation, but it works with functions of time instead of discrete sequences. This section defines this operation and states some of its basic properties.
(f★g)(t)=∫−∞∝
where
The result of the continuous cross-correlation is a function, which is called the cross-correlation function (CCF). Formula (8.1) gives the value of the CCF at only one point. To get all values of this function we need to evaluate this formula for all real t.
If both f and g are real-valued functions, then the definition simplifies to:
(f★g)(t)=∫−∞∞f(τ)g(τ+t)dτ. (8.2)
In other words, the value of the function f no longer has to be conjugated. In fact, only f needs to be a real function; g can still be a complex-valued function.
(x+y)★(u+v)=x★u+x★v+y★u+y★v, (8.3)
where x, y, u, and v are complex functions with one real argument. This expression is true if the four cross-correlations in the right-hand side are well defined, i.e., given that the following four inequalities hold:
|∫−∞∞
|∫−∞∞
8.2 The Laplace Transform
This section defines the Laplace transform and states some of its properties that are relevant to the topic of this chapter.
f(s)=∫0∞f(t)e−stdt, (8.6)
where s is a complex number.
If f(t)=0 for t<0, then the Laplace transform can also be defined as follows:
f(s)=∫0∞f(t)e−stdt. (8.7)
The notation {f(t)} or {f} is typically used to denote the Laplace transform of the function f(t). This notation, however, is for the entire transform, which includes all values of s. In many of our formulas, however, we need only one value of the transform at one specific s, e.g., s=1. To specify that we can add an extra set of parentheses, i.e., {f}(s). We will use this notation in some of the formulas, but it is somewhat cumbersome. More often we will use a simpler notation in which the curly brackets are omitted and the function name is used as a subscript of . That is, f(s) will be used to denote the value of the Laplace transform of the function f(t), where the transform is evaluated at s.
f(s)=∫−∞∞f(t)e−stdt, (8.8)
where s is a complex number. The lower limit of the integral is now −∞ instead of 0−.
In the discrete case we used two different symbols, and +, to denote the bilateral and the unilateral z-transform of a sequence. In the continuous case the accepted notation is to use for the unilateral Laplace transform, which is typically called simply the Laplace transform. The bilateral Laplace transform is rarely used, but in order to distinguish between the two the symbol is often used. In other words, to complete the analogy with the discrete case, corresponds to + and corresponds to .
f+g(s)=f(s)+g(s), (8.9)
provided that f(s) and g(s) are well defined. Also, if c is a complex scalar, then
c f(s)=c f(s). (8.10)
Property 8.6. The Laplace transform of the Heaviside function H(t) is equal to 1/s. That is,
where H(t) is defined using the following formula:
Then, for each s in the domain of the Laplace transform of f, the value of the Laplace transform of g at s can be obtained by multiplying the value of the Laplace transform of f at s by e−as. More formally,
g(s)=e−asf(s), for each s ∈ domain(f). (8.14)
Theorem 8.8. The left-Shift Theorem for the Laplace Transform.
g(t)=f(t+a). (8.15)
Then, for each s in the domain of the Laplace transform of f the value of the Laplace transform of g at s can be computed using the following formula:
8.3 Dirac's Delta
The delta function, which is also often called Dirac's delta, is the standard way to model an impulse. Dirac's delta is usually modeled as the limit of a sequence of template functions of decreasing width and increasing height. The following definition introduces one such sequence.
Property 8.11. The Laplace transform of Dirac's delta is equal to 1 for any s, i.e., {δ(t)}(s)=1.
Note that Property 8.11 is true only if the lower limit of the integral is 0−, which is how the Laplace transform is defined. If that limit is set to 0, then only the right half of the template δn will be included in the region of integration and the result will be ½ instead of 1, i.e.,
The formulation described so far can be used to model a single spike and only if this spike is at time t=0. To model a spike at t=t0 we can shift the template function δn(t) by t0 to the right, i.e., we can use δn(t−t0). This shifted template function, which is shown in
Definition 8.13. The Laplace transform of δ shifted by t0 is defined as the function obtained by taking the limit of the sequence of Laplace transforms of each function in the model sequence for shifted δ as defined in Definition 8.12. More formally,
Property 8.14. The Laplace transform of a shifted Dirac's delta is equal to:
Theorem 8.15. Let f(t) be a complex function of a real argument and let t0 ∈ R be a real number such that the limit L of f(t) as t→t0 exists and is finite, i.e.,
provided that the limit in the left-hand side of (8.24) is well defined.
8.4 Modeling Spikes and Spike Trains
A spike is an event that has a limited temporal extent. We will model a spike that occurs at time t0 with a shifted Dirac's delta. The model for approximating the shifted Dirac's delta was defined in Section 8.3 as a sequence of progressively narrowing and peaking template functions δn(t−t0) as n→∞, where each shifted template function is defined as:
A spike train is a collection of spikes that are generated on the same channel. We will use the notation b=(b1, b2, . . . , bK) to denote a spike train b that has K spikes that occur at times b1, b2, . . . , bK. This notation assumes that the spike times are sorted in increasing order and that there are no duplicates in this list. We will model the spike train b as a sequence of functions b(n)(t), where each function is obtained by summing K shifted template functions δn(t−bk). The following definition states this more formally.
By analogy we can define the spike train a=(a1, a2, . . . , aJ) that contains J spikes that occur at times a1, a2, . . . , aJ as the sum of J shifted template functions, where the shifts are equal to the times at which the spikes occur. In other words,
In this case the number of spikes is J and the shifted template function is δm, which is defined as
Note that this chapter uses 1-based indexing for the spikes in the spike train, while the previous chapters used 0-based indexing for the elements of a sequence. Another difference is that in the discrete case there is a one-to-one correspondence between the index of an element and its temporal location in the sequence. In the continuous case the index of the spike does not correspond to the time at which the spike occurs. It is just an index into a list of times that don't occur at regular intervals and there is no formula for converting from spike indices to spike times. In other words, aj is the time at which the j-th spike occurred on channel a and j is just the index of that spike in the list of times that specify the spike train a=(a1, a2, . . . , aJ).
8.5 Operations on Spike Trains
This section defines some operations on spike trains and pairs of spike trains. These operations are used and extended in later sections.
8.5.1 The Laplace Transform of a Spike Train
As described above, a spike train can be approximated with a sum of shifted template functions. For example, the spike train a=(a1, a2, . . . , aJ), which has J spikes that occur at times a1, a2, . . . , aJ, can be approximated with the function a(m)=(a1, a2, . . . , aJ) in which each spike is modeled with the shifted template function δm(t−aj) that was defined in formula (8.29). For each m<<∞ the template δm has a nonzero width and a(m) can be treated just like any regular function. In particular, the Laplace transform of a(m) can be evaluated using the standard formula. As m approaches infinity, however, the Laplace transform of the spike train is defined as shown below.
In other words, the Laplace transform of the spike train a can be obtained from its approximation a(m), in which shifted templates δm of height m and width 1/m are used to model the spikes, and then taking the limit as m→∞. This derivation is shown below:
The Heaviside function is used to change the lower bound of the integral from 0− to −∞ in the fourth line of formula (8.31). The value of the integral remains the same because everything in the interval −∞ to 0− will be multiplied by 0, i.e., H(t−0−)=0 for t<0−. Note that 0− is used in both the integral and H to prevent cutting the δ-templates in half if aj=0.
To summarize, the value of the Laplace transform at s of the spike train a=(a1, a2, . . . , aJ), which has J spikes, is equal to:
If we assume that aj≥0 for all j=1, 2, . . . , J (i.e., if we assume that the spike train is causal), then the Heaviside function always evaluates to 1 and the previous expression simplifies to:
That is, a(s) is equal to the sum of J exponentials of the form e−sa
By analogy, the value of the Laplace transform at s of the spike train b=(b1, b2, . . . , bK), which has K spikes, is equal to:
Once again, if bk≥0 for all k=1, 2, . . . , K, then the Heaviside function is equal to 1 and this formula simplifies to:
8.5.2 The Cross-Correlation of Two Spike Trains
This section gives a mathematical formulation for the cross-correlation of two different spike trains. Let a=(a1, a2, . . . , aJ) be the first spike train, which consists of J spikes that occur at times a1, a2, . . . , aJ. Similarly, let b=(b1, b2, . . . , bK) be the second spike train, which has K spikes that occur at times b1, b2, . . . , bK.
The spikes on the first spike train will be modeled with the template function δm, which is defined as:
The spikes on the second spike train will be modeled with a different template function, δn, which is defined as:
In this case, n determines the height of the template for the second spike train, which may be different from the height m of the template for the first spike train.
As described above, the notation a(m)=(a1, a2, . . . , aJ) will be used to denote the approximation for the spike train a that is modeled with the template δm. The value of a(m)(t) is given by:
Similarly, the notation b(n)=(b1, b2, . . . , bK) denotes an approximation for the spike train b that uses the template δn. This approximation can be represented as follows:
The cross-correlation of a(m) and b(n) is formally defined below. Note that in (8.40) the conjugation in
(a(m)★b(n))(t)=∫−∞∞
For any m<<∞ and any n<<∞ the templates δm and δn have some temporal extent and the integral in (8.40) can be evaluated for a specific value of t in the usual way. For the cross-correlation of idealized spike trains, however, a different approach is needed that can be applied when there are two limits, i.e., when m→∞ and n→∞. In this case, both δm and δn tend to the delta function δ, but they do this independently of each other. This is addressed more formally in the next section in the context of the Laplace transform.
8.5.3 The Laplace Transform of the Cross-Correlation of Two Spike Trains
The Laplace transform of the cross-correlation of two spike trains a and b is defined using iterated limits of the Laplace transform of the cross-correlation of a(m) and b(n) as the width of the template δm and the width of the template δn tend to zero. A formal definition is stated below.
Using this definition we can derive a closed-form formula for the value of the Laplace transform of the cross-correlation of two causal spike trains, evaluated at s. This derivation is shown below.
Note that the conjugation in
The last step in formula (8.42) used the following substitution:
We will show that for each t0 ∈ the limit of fk(τ) as τ→t0 exists and is finite. This is done by deriving a closed-form expression for its value. Using the variable substitution {circumflex over (τ)}=bk−τ we can express this limit as follows:
Finally, we can substitute the result from (8.44) into (8.42) and then use Theorem 8.15 to express the value of a★b(s) in closed form as follows:
Thus, the formula for the Laplace transform of the cross-correlation of two causal spike trains is:
This formula filters spike pairs for which the spike in the second train precedes the spike in the first train. This filtering is done using the Heaviside function, which acts as an open bigram filter. Because the value of H(bk−aj) can be only 0 or 1, this expression reduces to a sum of exponentials. Each exponential in this sum is of the form e−s(b
There is a clear analog between formula (8.46) and the formula for discrete sequences. The main difference is that here the iterations are over spikes, where aj and bk are the times at which they occur, while in the discrete case the iterations are over sequence elements that are assumed to occur at fixed time intervals. Another difference is that here the number of spikes on the a and b channels don't have to be the same. In the sequence domain, however, the two sequences are usually assumed to have the same number of elements.
There is an interesting relationship between the argument s of the Laplace transform and the argument z of the unilateral z-transform. If we assume that the time is discretized, then the two are related as follows: s=ln z. Under these conditions, formula (8.46) produces identical results to the formula for the discrete case.
To better understand what formula (8.46) does we will take a closer look at two special cases. In the first special case s=ln 2. This is analogous to encoding with z=2 in the discrete case. In this case the formula simplifies to:
In the second special case, s=ln 1=0, which corresponds to encoding with z=1 in the discrete case. Now, formula (8.46) simplifies to:
The inner sum in this expression adds the number of spikes in the spike train b that occur after the j-th spike in the spike train a. The outer sum adds up these results for all j. Another way to interpret (8.48) is as follows:
In other words, the decaying exponential in this case reduces to a constant function f(t)=1.
It is worth mentioning that while formula (8.46) is the most compact way to state the value of the Laplace transform of the cross-correlation of a and b, this formula may obscure the elegance of the encoding algorithm that is described below. This expression computes the value of one matrix element, but does it very inefficiently. The reason is that the double sum iterates over all spikes on the first channel a and over all spikes on the second channel b. The encoding algorithm computes the same result but it does not need to enumerate all possible pairs of spikes. Instead, it performs the computation incrementally using a single pass through both spike trains. This results in a fast and elegant algorithm.
The interplay between the Heaviside function and the indices j and k is not easy to decouple. Nevertheless, the Heaviside function simplifies the expressions so that they are easier to manipulate. Some of the following sections will use this formula. But, once again, from an algorithmic point of view a direct implementation of formula (8.46) is not advisable.
8.6 Operations on Truncated Spike Trains
In some cases it is necessary to work with only a subsection of some spike train. We will use the notation b[t1, t2] to denote a truncated spike train that is derived from the spike train b by keeping only the spikes that occur in the temporal interval [t1, t2] and removing all other spikes. For example, if b=(b1, b2, . . . , bK), then b[t1, t2]=(bp, bp+1, . . . , bq), where p=min{k:bk≥t1} and q=max{k:bk≤t2}. This notation can be extended to open intervals as well. Note that the truncated spike train may have fewer spikes, but the remaining spikes are not shifted in time.
8.6.1 Modeling Truncated Spike Trains
A truncated spike train is defined similarly to a regular spike train (see Definition 8.17), but now the train is truncated using two Heaviside step functions. The first function cuts all spikes that occur before time t1. The second function cuts all spikes that occur after time t2. The following definition states this more formally.
b
[t
, t
]
(n)(t)=Π(t−t1−)Π(t2+−t)b(n)(t). (8.50)
To ensure that spikes that occur exactly at t1 or exactly at t2 are included in the truncated train, the definition uses left and right limits for these two boundaries. That is, it uses t1− as the left boundary and t2+ as the right boundary in the Heaviside functions. Because b(n)(t) is modeled as a sum of shifted template functions δn, this is needed to include the entire region where δn(t−τ) is non-zero into the region of integration as n approaches infinity even if τ=t1 or τ=t2.
To understand how the truncation process works, it is useful to study the interaction of two Heaviside step functions.
Using the properties of the limit, formula (8.50) can also be stated in the following alternative form:
Furthermore, by combining Definition 8.21 and Definition 8.17, which defines the value of b(n)(t), we get:
For open-ended intervals this formula can be adjusted as follows:
Note that the superscript pluses and minuses in these formulas are useful only when each formula is embedded in the limit of an integral as n→∞. Also, note that in these cases
is the innermost limit. This is illustrated in the following sections.
8.6.2 The Laplace Transform of a Truncated Spike Train
This operation on truncated trains is defined similarly to the Laplace transform of regular spike trains (see Definition 8.18). In this case, however, the Laplace integral is extended with two Heaviside functions that perform the truncation.
If we combine Definition 8.22 and Definition 8.21 we can expand (8.56) and derive an explicit formula for the Laplace transform of the truncated spike train b[t1, t2]. This derivation, which uses some properties of the Heaviside function, is shown below.
This derivation is similar to (8.31). The main difference is that, because the train is truncated, there are now three Heaviside functions instead of just one. Finally, we can apply Theorem 8.16 because e−st is a continuous function.
Next, we will derive three special cases of formula (8.57). The first special case computes the Laplace transform of the truncated spike train b[0, t]. In this case it is assumed that the original spike train b=(b1, b2, . . . , bK) is causal (i.e., bk≥0 for all k) and that t1=0 and t2=t. That is, only the tail of the spike train is cut after time t. Under these conditions H(bk−t1)=H(bk−0)=1 and H(bk)=1. Thus, formula (8.57) simplifies as follows:
The second special case computes the Laplace transform of the truncated spike train b[t, T]. In this case it is assumed that the original spike train b=(b1, b2, . . . , bK) is causal and that bk≤T for all k={1, 2, . . . , K}, i.e., all spikes occur no later than time T. Under these assumptions formula (8.57) simplifies as follows:
The third special case is similar to the second case, but now both sides of (8.59) are multiplied by est. This leads to the following expression:
This formula can be viewed as a special case of the left-shift theorem, i.e., Theorem 8.8, when the shifted function is a spike train. To see this, we can represent b[t, T] as the following difference:
b[t, T]=b[0, T]−b[0, t). (8.61)
Taking the Laplace transform of both sides we get:
{b[t, T]}(s)=b[0, T]}(s)−{b[0, t)}(s). (8.62)
Finally, we can multiply both sides by est and use the fact that {b[0, T]}(s)=b(s) to derive:
e
st
{b[t, T]}(s)=est(b(s)−{b[0, t)}(s)). (8.63)
Note that the right-hand side is similar to the right-hand side of (8.16). Thus, the left-hand side can be viewed as the Laplace transform of the truncated spike train b[t, T] that has been shifted to the left by t. In other words, assuming that the integration variable for the Laplace transform is τ, we get:
8.6.3 The Laplace Transform of the Cross-Correlation of Two Truncated Spike Trains
The definition for the cross-correlation of two truncated spike trains is similar to Definition 8.19, but uses the truncation notation. Once again, the conjugation can be dropped because the truncated spike trains are also modeled with shifted template functions, which are real-valued. The following definition states this more formally.
(a[t
The next definition, which is similar to Definition 8.20, formalizes the Laplace transform of the cross-correlation of two truncated spike trains.
This definition is used below to derive a closed-form formula for the Laplace transform the cross-correlation of two truncated spike trains. To reduce the length of the formulas, however, we will introduce two shortcut functions Fm and G n that are defined as follows:
F
m(t1, t2, aj, t)=H(t−t1−)H(t2+−t)δm(t−aj), (8.67)
G
n(τ1, τ2, bk, t)=H(t−τ1−)H(τ2+−t)δn(t−bk). (8.68)
Using the functions Fm and Gn, the first step of this derivation is to express the Laplace transform of the cross-correlation of a[t1, t] and b[τ1, τ2], which will be denoted with L, as follows:
The last step above used the shorthand notation gk(τ), which can be expressed as:
The value of
exists and is finite. The value of this limit in closed form is:
Now we can derive the final formula:
Next, we will derive two special cases of formula (8.72) that will be used in the following sections. In the first case it is assumed that t1=τ1=0 and t2=τ2=t and that aj≥0 and bk≥0 for all j and for all k. In other words, the two original spike trains a and b are causal and they are truncated at the same ending time t. Under these conditions formula (8.72) simplifies as follows:
In the second special case it is assumed that t1=τ1=t and t2=τ2=T and that all spikes in the original trains a=(a1, a2, . . . , aJ) and b=b2, . . . , bK) occur no later than time T, i.e., aj≤T and bk≤T for all j and for all k. Under these conditions formula (8.72) simplifies as shown below:
8.6.4 Modeling Reversed Spike Trains
In some cases the spikes in a spike train need to be temporally reversed. This section defines this operation and also the Laplace transform of a reversed spike train.
for each m ∈+={1, 2, . . . }. The time of the n-th spike in is given by the following formula:
()n=T−aJ+1−n, for each n ∈ {1, 2, . . . , J}. (8.76)
The reversed spike train can also be expressed as follows:
=(()1, ()2, . . . , ()J)=(T−aJ, T−aJ−1, . . . , T−a2, T−a1). (8.77)
It should be noted that the notation is useful only if the interval for reversal is specified. By default, it will be assumed that this interval is [0, T], i.e., =[0, T]. If that is not the case, then the interval must be explicitly provided. Also, if the right bound of the interval is not equal to T, then the original spike train must be truncated before it can be reversed (see below).
{[0, T]}(s)=e−sTa(−s). (8.78)
Definition 8.27. Truncated and reversed spike train. Let a=(a1, a2, . . . , aJ) be a causal spike train and let t≥0 be a real number. The truncated and reversed spike train [0, t] is obtained from a by reversing the times of the spikes on the interval [0, t]. The model for is a sequence of functions ([0, t](1)(τ), [0, t](2)(τ), . . . , [0, t](m)(τ), . . . ), where each function [0, t](m)(τ) is obtained by truncating and reversing a(m)(τ) on [0, t]. More formally,
for each m ∈+={1, 2, . . . }.
{[0, t]}(s)=e−st{a[0, t]}(−s). (8.80)
8.7 The Concatenation Theorem for Spike Trains
This section derives the concatenation theorem for spike trains. The derivation uses the result from Section 8.5.3, which derived the Laplace transform of the cross-correlation of two spike trains with finite number of spikes. This section also states several corollaries of the concatenation theorem for pairs of spike trains that meet certain conditions.
a′=(a1, a2, . . . , ap), (8.81)
a″=(ap+1, ap+2, . . . , aJ), (8.82)
where
p=max{j:aj≤C}. (8.83)
Note that by combining the list of spike times in a′ and a″ we can get back the original list of spike times in a, i.e., a=a′∥a″, where denotes concatenation.
b′=(b1, b2, . . . , bq), (8.84)
b″=(bq+1, bq+2, . . . , bK), (8.85)
where
q=max{k:bk<C}. (8.86)
a′←a[0, C], a″←a(C, √),
b′←b[0, C), b″←b[C, ∞). (8.87)
a★b(s)=a′★b′(s)+a″★b″(s)+
It is worth mentioning that the splitting of the two trains as stated in the theorem is designed to reduce the number of special cases that have to be considered if there are spikes that occur exactly at time C, which is the time of the split. This split reduces these cases from 4 to 1.
The concatenation theorem is stated with conjugations to keep the final formula similar to the formula for the discrete case. Another reason is that conjugations are needed in Chapter 9, which extends the theory to weighted spike trains. In this chapter, however, each spike is modeled with a shifted template function that always returns a real number. Therefore, the conjugations can be dropped. Thus, another way to state the concatenation theorem for spike trains is:
a★b(s)=a′★b′(s)+a″★b″(s)+a′(−s)b″(s). (8.89)
Note that the concatenation theorem implicitly binds two types of abstractions. The first abstraction is a list that contains the spike times for some spike train. Lists can be concatenated and truncated. The second abstraction is a sequence of functions that models a spike train, where each spike is modeled with a shifted template function δn(t−t0). Instead of concatenation, this abstraction allows for addition, which can be used to combine models. A spike train modeled in this way can also be truncated, but this requires the use of left- or right-limits with the Heaviside functions to correctly handle spikes that fall on one or both of the truncation boundaries. Without these limits the two types of abstractions lead to different results under some conditions. The list abstraction is used in the algorithms that are described later. The second abstraction is used to derive the theory and its mathematical formulas, which use the properties of the Laplace transform.
8.7.1 Special Cases of the Concatenation Theorem for Spike Trains
This section states two special cases of the concatenation theorem for spike trains as corollaries of Theorem 8.29.
The first corollary is a special case of the concatenation theorem when the two spike trains are split such that the suffix a″ is empty and the suffix b″ contains just one spike.
a★b(s)=a′★b′(s)+(s), (8.90)
where denotes the spike train obtained by reversing the spikes in a in the interval [0, T] (see Definition 8.25). More formally, the time of the n-th spike in is given by
()n=T−aJ+1−n, for n=1, 2, . . . , J, (8.91)
and the reversed spike train is given by
=(T−aJ, T−aJ−1, . . . , T−a2, T−a1). (8.92)
The second corollary is a special case of the concatenation theorem when the two spike trains are split such that the prefix a’ contains just one spike and the prefix b′ is empty.
a★b(s)=es a
8.7.2 Special Cases of the Concatenation Theorem for Truncated Spike Trains
The concatenation theorem also applies to truncated spike trains. The next two corollaries form the mathematical basis for the algorithms that are described later in this chapter.
{x[0, yn]★y[0, yn]}(s)={x[0, yn−1]★y[0, yn−1]}(s)+{[0, yn]}(s), (8.94)
where [0, yn] denotes the spike train obtained by reversing the truncated spike train x[0, yn] in the interval [0, yn]. In other words,
[0, yn]=(yn−xp, yn−xp−1, . . . , yn−x1), (8.95)
where p=max{j:xj≤yn}.
{x[0, y1]★y[0, y1]}(s)={[0, y1]}(s). (8.96)
Corollary 8.33. Let x=(x1, x2, . . . , xJ) and y=(y1, y2, . . . , yK) be two causal spike trains such that their spikes occur no later than time T, i.e., 0≤xj≤T and 0≤yk≤T for all j ∈ {1, 2, . . . , J} and for all k ∈ {1, 2, . . . , K}. Then, for any integer m ∈ {1, 2, . . . , J−1} the following formula holds:
{x[xm, T]★y[xm, T]}(s)=es x
Furthermore, in the special case when m=J, it has the following form:
{xJ, T]★y[xJ, T]}(s)=es x
The following two properties show that, under some conditions, the Laplace transform of the cross-correlation of two truncated spike trains is identical to the Laplace transform of the cross-correlation of two slightly different truncated spike trains. These properties were used to prove Corollary 8.32 and Corollary 8.33, which were stated above.
{x[0, t]★y[0, τ]}(s)={x[0, τ]★y[0, τ]}(s). (8.99)
{x[t, T]★y[τ, T]}(s)={x[t, T]★y[t, T]}(s). (8.100)
In other words, the Laplace transform of the cross-correlation of x[t, T] and y[τ, T] is equal to the Laplace transform of the cross-correlation of x[t, T] and y[t, T]. Another way to state this is that the spikes from y that fall in the temporal interval [τ, t) don't affect the result.
8.8 The SSM Model
The SSM model consists of three components: a matrix M, a vector h′, and a vector h″. In general, the matrix is of size M′×M″, h′ is a column vector of size M′, and h″ is a row vector of size M″. To make this more concrete, we will assume that M′=M″=2.
In this example the model is computed from four causal spike trains that are denoted with α, β, A, and B. Each element of the matrix is computed from two spike trains where the first train is denoted with a Greek letter and the second train is denoted with an English letter. The first element of h′ is computed from the spike train α and its second element is computed from the spike train β. Similarly, the first element of h″ is computed from the spike train A and its second element is computed from the spike train B.
8.8.1 The Model at the End of Encoding
At the end of encoding each element of the matrix is equal to the value of the Laplace transform of the cross-correlation of the corresponding pair of spike trains. Each element of the vector h′ is equal to the value of the Laplace transform of the corresponding spike train, which is denoted with a Greek letter, after this spike train has been reversed in the interval [0, T]. Finally, each element of the vector h″ is equal to the value of the Laplace transform of the corresponding spike train that is denoted with an English letter.
Using the formulas derived in the previous sections, the values of the three components of the SSM model at the end of encoding can also be stated as:
8.8.2 The Model at a Specific Time During Encoding
In some cases it is useful to know the value of a specific element of the model at a specific time during the encoding process. The previous formulas do not provide this information because they express the value of each element at the end of encoding.
Because the encoding algorithm does a single pass through all spike trains, going forward in time, the new formulas are given in terms of truncated spike trains. The truncation interval is [0, t] for all spike trains. To distinguish that these values are not the final values, but only the values during the encoding process, we will use a superscript e on the left, i.e., eh′, eh″, and eM. Because t may vary, the expression for each element is now a function of time.
Using the Laplace transform notation, the state of the model at time t during encoding can be expressed as follows:
Using the Heaviside function each of these formulas can be stated in an alternative form:
It is worth pointing out that all formulas in this section give the correct values for the elements at time t, but this is not how these elements are computed by the encoding algorithm. The algorithm uses iterative versions of these formulas, which are derived in Section 8.10.
8.8.3 The Model at a Specific Time During Decoding
The decoding process starts with the matrix M and the vector h″ and gradually depletes both of them down to zero. The initial values of M and h″, which are the same as their final values at the end of the encoding process, are shown in
The first set of formulas expresses the elements of h″ in terms of the Laplace transform of the corresponding truncated spike train and the elements of M in terms of the Laplace transform of the cross-correlation of two truncated spike trains. It is assumed that the spikes in all spike trains occur no later than time T. Thus, the truncation interval is [t, T] for all spike trains. More formally,
These formulas can also be stated in the following alternative form (see (8.74) and (8.60)):
8.8.4 The Formulas for an Abstract Element
For the sake of completeness, we will also state the formulas for an abstract element of the matrix and the two vectors. Using our convention, the matrix element will be called Ma,b, where a stands for any Greek letter and b stands for any English letter. Its corresponding elements in the two vectors will be denoted with h′a and h″b. Without loss of generality it will be assumed that the spike train a=(a1, a2, . . . , aJ) contains J spikes and the spike train b=(b1, b2, . . . , bK) contains K spikes. Two sets of formulas are given below. The first set uses notation that is based on the Heaviside function. The second set uses the Laplace transform notation.
At the end of encoding (i.e., at time T):
At time t during encoding:
At time t during decoding:
The encoding and decoding formulas for an abstract element can also be stated using the Laplace transform notation. These formulas are given below and also shown in
At the end of encoding (i.e., at time T):
h′
a=eh′a(T)={[0, T]}(s)=e−sT{a}(−s), (8.122)
h″
b=eh″b(T)={b}(s), (8.123)
M
a,b=eMa,b(T)={a★b}(s). (8.124)
At time t during encoding:
e
h′
a(t)={[0, t]}(s)=e−st{a[0, t]}(−s), (8.125)
e
h″
b(t)={b[0, t]}(s), (8.126)
e
M
a,b(t)={a[0, t]★b[0, t]}(s). (8.127)
At time t during decoding:
d
M
a,b(t)={a[t, T]★b[t, T]}(s), (8.128)
d
h″
b(t)=est{b[t, T]}(s). (8.129)
8.9 Duality of the Matrix Representation
This section shows that the values stored in the matrix at the end of encoding can be interpreted in two different ways. The first interpretation suggests how the matrix can be encoded. The second interpretation suggests how the matrix can be decoded.
To motivate the discussion we will start by repeating the formulas for the value of h′a during encoding and the value of h″b during decoding:
Also, recall that, at the end of encoding the value of the matrix element in row a and column b is given by the following formula:
8.9.1 Encoding View of the Matrix
Because each spike train contains a finite number of spikes, we can swap the order of the two sums in (8.132) to get the following result:
In other words, the element Ma,b of the matrix can be computed by adding the values of h′a at the times of the spikes on channel b.
This expression generalizes to all elements of the matrix. For example, the elements of a 2×2 matrix that is encoded from the spike trains α, β, A, and B can be expressed as follows:
8.9.2 Decoding View of the Matrix
Formula (8.132) can also be factored in another way that leads to the decoding view of the matrix. This derivation is shown below:
In other words, the value of the element Ma,b can also be computed by adding the values of dh″b at the times of the spikes on channel a. Because dh″b is computed during decoding, however, this is not how the matrix can be computed. Instead, this suggests how the matrix can be decoded. That is, if the value of dh″b is subtracted from the value of Ma,b at the times of the spikes in a, then the matrix can be depleted down to zero.
Once again, this view of the matrix requires knowing the spike times on channel a. In general, these times are not available during decoding as that spike train is not provided. Thus, any decoding algorithm will have to infer these times.
The expression in (8.135) generalizes to all elements of the matrix. For example, the elements of a 2×2 matrix can be expressed using the following formula:
8.10 Derivation of the Iterative Encoding Formulas
This section derives the iterative formulas that are used by the encoding algorithm, which is described in Section 8.11. These formulas are for the a-th element of the vector h′, the b-th element of the vector h″, and the element in the a-th row and b-th column of the matrix M. By analogy, these formulas can be extended to cover all elements of the three components of the SSM model.
8.10.1 Computing the a-th Element of the Vector h′
We would like to derive an iterative formula for computing the value of h′a at the time of the m-th spike on channel a in terms of its value at the time of the (m−1)-st spike on a. That is, we would like to express eh′a(am) in terms of eh′a(am−1). To do this we will start by splitting the truncated spike train a[0, am] into two segments:
a[0, am]=a[0, am−1]+a(am−1, am], (8.137)
where the second segment contains just one spike at time t=am. Recall that, at time t during encoding the value of the a-th element of the vector h′ is given by formula (8.125), which is replicated below:
e
h′
a(t)={[0, t]}(s)=e−st{a[0, t]}(−s). (8.138)
This formula is valid for any time t. In particular, if we set t=am and use (8.137), then the value of eh′a(am) can be expressed in the following way:
A similar approach can be used to express the value of eh′a at t=bn, i.e., at the time of the n-th spike on channel b. Let p be the index of the last spike on a that occurs no later than the time of the n-th spike on b, i.e., p=max{j:aj≤bn}. Then, a[0, bn] can be expressed as:
a[0, bn]=a[0, ap]+a(ap, bn], (8.140)
where a(ap, bn] is empty. Therefore,
To summarize, the two formulas for updating eh′a during the encoding process are:
e
h′
a(am)=eh′a(am−1)e−a(
e
h′
a(bn)=eh′a(ap)e−s(b
Note that these formulas are used at different times. The first one is used at the times of the spikes on channel a. The second one is used at the spike times on channel b. Because this is somewhat cumbersome, Section 8.10.4 combines these two into a single formula by using a combined timeline that includes the spike times from both channels.
8.10.2 Computing the b-th Element of the Vector h″
Let b=(b1, b2, . . . , bK) be a causal spike train that has K spikes. At time t during encoding the value of eh″b is given by formula (8.126), which is replicated below:
e
h″
b(t)={b[0, t]}(s). (8.144)
To derive an iterative formula for computing the value of eh″b(bn) in terms of eh″b(bn−1) we will start by representing the truncated spike train b[0, bn] as follows:
b[0, bn]=b[0, bn−1]+b(bn−1, bn]. (8.145)
The additivity of the Laplace transform implies that the Laplace transform of b[0, bn] is equal to the sum of the Laplace transform of b[0, bn−1] and the Laplace transform of b(bn−1, bn]. Furthermore, b(bn−1, bn] contains just one spike at t=bn and reduces to the delta function shifted by bn. Using these properties and formula (8.144), we can derive the following expression:
To summarize, the value of eh″b is updated only at the times of the spikes on channel b and the iterative update formula is:
e
h″
b(bn)=eh″b(bn−1)+e−sb
In other words, during encoding the value of the b-th element of the vector h″ at the time of the n-th spike on channel b is equal to the value of the same element at the time of the (n−1)-st spike plus e−sb
8.10.3 Computing the Matrix Element in the a-th Row and b-th Column
Let a=(a1, a2, . . . , aJ) and b=(b1, b2, . . . , bK) be two causal spike trains. The value of the matrix element Ma,b at time t of the encoding process is given by formula (8.127), which is replicated below:
e
M
a,b(t)={a[0, t]★b[0, t]}(s). (8.148)
In other words, at time t this element is equal to the value of the Laplace transform at s of the cross-correlation of the spike train a and the spike train b, both of which are truncated at time t.
This formula is valid for any time t. In particular, if we set t=bn−1, i.e., the time of the (n−1)-st spike on channel b, then we will get the following expression:
e
M
a,b(bn−1)={a[0, bn−1]★b[0, bn−1]}(s). (8.149)
Similarly, if we evaluate the same formula at the time of the n-th spike on channel b, i.e., at t=bn, then we will get:
e
M
a,b(bn)={a[0, bn]★b[0, bn]}(s). (8.150)
Corollary 8.32 implies that formula (8.150) can be expressed as follows:
The last term in the right-hand side is equal to the value of the a-th element of h′ at the time of the n-th spike on b, i.e., at time t=bn (see formula (8.141)).
In the special case when t=b1 (i.e., the time of the first spike on b), formula (8.151) reduces to:
which also follows from Corollary 8.32.
To summarize, during encoding, the value of the matrix element Ma,b is updated at the times of the spikes on channel b and it is computed using the following iterative formula:
e
M
a,b(bn)=eMa,b(bn−1)+eh′a(bn). (8.153)
In other words, the value of eMa,b at the time of the n-th spike on b is equal to the value of that same element at the time of the (n−1)-st spike on b plus the value of the a-th element of the vector h′ at the time of the n-th spike on b.
8.10.4 The Iterative Encoding Formulas for a Common Timeline
The previous sections derived iterative formulas for eh′a, eh″b, and eMa,b. This section rewrites these formulas and states them for a common timeline that includes all spikes on a and all spikes on b. The resulting formulas form the mathematical foundation for the encoding algorithm that is described in Section 8.11.
Let a=(a1, a2, . . . , aJ) and let b=(b1, b2, . . . , bK) be two causal spike trains from which the values of eh′a, eh″b, and eMa,b are computed. Also, let c=(c1, c2, . . . , cJ+K) be a list of spike times that combines all spikes from a and all spikes from b such that the resulting list c is sorted in increasing order.
It is possible to construct the array c from the elements of a and b. Because both a and b are initially sorted, the merging of the two spike trains can be accomplished in O(J+K) time. By definition, the original lists a and b contain no duplicates. It is possible, however, that an element of a may be equal to an element of b (e.g., two simultaneous spikes on two different channels). In that case the precedence is given to the spike from a, i.e., it will be listed before the spike from b in the list c.
In addition to the array c, it is also possible to generate another array â, which is a binary array of length J+K. The purpose of this array is to indicate the channel from which the spike in c came from. If âi=1, then the i-th spike in the combined timeline came from a. On the other hand, if âi=0, then the i-th spike came from b. In other words, for each i ∈ {1, 2, . . . , J+K} the value of the i-th element of â is defined as:
The encoding algorithm, which is described in the next section, computes both â and c implicitly. The array â is replaced with one boolean variable that is called spikeOnA, which keeps the origin of the most recent spike, i.e., spikeOnA=âi. The array c is not constructed either. Instead, the algorithm keeps only the two most recent elements in the variables t and tprev.
To make the formulas more amenable to an algorithmic implementation, we will use i as an index for the elements of c. We will also use square brackets instead of round brackets, e.g., eh′a[i] instead of eh′a(ci). We will use the value of âi to check if an element of c comes from a or b.
At the start of the encoding process all variables are initialized to zero. In other words,
eh′a[0]=0, (8.155)
eh″b[0]=0, (8.156)
eMa,b[0]=0. (8.157)
Note that in this case the 0-th iteration counter is used to capture the initial conditions. This index is not used for actual spikes because the first spike in any spike train has an index of 1.
If a spike from a and a spike from b coincide, then â and c must be constructed to ensure that the spike from a has a lower index than the spike from b in the common timeline. In other words, in addition to (8.154) the values in â and c also satisfy the following two conditions:
1)
c
i≤ci+1, for each i ∈ {1, 2, . . . , J+K−1}, (8.158)
2)
if ci=ci+1, then âi=1 and âi+1=0. (8.159)
If the pair (c, â) satisfies conditions (8.158) and (8.159), then the iterative formula for updating eh′a can be stated as follows:
for each i ∈ {1, 2, . . . , J|K}. This formula combines (8.142) and (8.143).
The update formula for the value of eMa,b is based on (8.153). It follows a similar logic:
for each i ∈ {1, 2, . . . , J|K}. Note that there is an implicit order dependency between formula (8.160) and formula (8.161). That is, the value of eh′a must be computed first before it is used to update the value of eMa,b.
The iterative update formula for the value of eh″b is based on (8.147). It can be stated as:
for each i ∈ {1, 2, . . . , J+K}.
If aj=bk, i.e., if two spikes on different channels coincide, then precedence is given to the spike from a (see the formulas in the first column of
The formulas in the previous subsections were stated as functions of time and applied to only one spike train, in which, by definition, there are no coincidences. When the formulas are restated for a common timeline, however, it is possible to have ambiguities (e.g., eh″b(aj)≠eh″bb(bk) even though aj=bk). Then, eh″b(t) is no longer a proper mathematical function because a function can have only one value for each point in its domain. The square bracket notation resolves this issue by assigning different values of the iteration counter to the two coincident spikes, i.e., it performs two iterations for each pair of coincident spikes.
8.11 The Encoding Algorithm
Given two spike trains a and b and a value for the parameter s, the algorithm returns the value of the matrix element Ma,b and the values of h′a and h″b. To encode the entire matrix, the algorithm can be run in parallel, i.e., one instance of the algorithm for each matrix element. This is possible because each element can be computed independently of all other elements.
If aj=bk for some j and some k, then the algorithm gives preference to the spike from a, but then performs another iteration to process the spike from b. During this second iteration h′a does not change because l=lprev due to the coincidence of the two spikes.
The computational complexity is O(J+K), where J is the number of spikes on channel a and K is the number of spikes on channel b.
8.12 Derivation of the Iterative Decoding Verification Formulas
This section derives iterative formulas for decreasing the value of the matrix element Ma,b and the vector element h″b down to zero. These formulas rely on knowing the times of the spikes on a, which are not available at run time. The goal of a proper decoding algorithm would be to estimate these values. Assuming that these estimates are correct, the formulas given here can be used to ensure that both the matrix element and the vector element will be depleted down to zero. In other words, this section states the formulas for verifying the solution obtained by a decoding algorithm.
8.12.1 Updating the Matrix Element in the a-th Row and b-th Column
Let a=(a1, a2, . . . , aJ) and b=(b1, b2, . . . , bK) be two causal spike trains such that all of their spikes occur before time T. The value of the matrix element Ma,b at time t during the decoding process is given by formula (8.128), which is replicated below:
d
M
a,b(t)={a[t, T]★b[t , T]}(s). (8.163)
If we evaluate this expression at the time of the m-th spike on channel a (i.e., at time t=am), then we will get the following formula:
d
M
a,b(am)=a[am, T]★b[am, T]}(s). (8.164)
Similarly, if we evaluate the same expression at the time of the (m+1)-st spike on a, then we will get another similar formula:
d
M
a,b(am+1)=a[am+1, T]★b[am+1, T]}(s). (8.165)
Using Corollary 8.33 we can express (8.164) as follows:
where the first term in the right-hand side is equal to the value of the b-th element of h” at the time of the m-th spike on a. At the very last iteration, i.e., at time t=aj, the following holds:
which also follows from Corollary 8.33.
After rearranging the three terms in (8.166), we get the following iterative formula for updating dMa,b from am to am+1:
d
M
a,b(am+1)=dMa,b(am)−dh″b(am). (8.168)
8.12.2 Updating the b-th Element of the Vector h″
During decoding the value of the b-th element of the vector h″ is given by formula (8.129), which is replicated below:
d
h″
b(t)=est{b[t, T]}(s). (8.169)
This formula is valid for any time t, but we would like to derive its iterative version. That is, we would like to express the value of dh″b at the time of the (n+1)-st spike on channel b in terms of its value at the time of the n-th spike on b.
This can be done by expressing the truncated spike train b[bn, T] as follows:
b[b
n
, T]=b[b
n
, b
n+1)+[bn+1, T]. (8.170)
By setting t to bn and by combining (8.169) and (8.170), we get:
After rearranging the terms, we can express dh″b(bn+1) using dh″b(bn) as follows:
d
h″
b(bn+1)=[dh″b(bn)−1]es(b
A similar approach can be used to derive the formula for dh″b at the time of the m-th spike on channel a. In this case, the idea is to set t=bp in (8.169) and to express b[bp, T] as:
b[b
p
, T]=b[b
p
, a
m)+b[am, T], (8.173)
where p=max{k:bk<am}. In other words, p is the index of the last spike on channel b that occurs strictly before the m-th spike on channel a. Using the properties of the Laplace transform we can express dh″b(bp) in terms of dh″b(am) as follows:
By rearranging the terms in the previous expression we get the following formula:
d
h″
b(am)=[dh″b(bp)−1]es(a
To summarize, the iterative decoding verification formulas for the b-th element of h″ are:
d
h″
b(bn+1)=[dh″b(bn)−1]es(b
d
h″
b(am)=[dh″b(bp)−1]es(a
The first formula is used to update this element at the time of the spikes on channel b. The second one is used at the spike times on channel a. Section 8.12.3 combines both of these into a single iterative update formula for a common timeline, which is denoted by c. Note that formula (8.177) is not strictly iterative because bp is temporally before am but it may also be temporally before am−1 and other spikes on a. This formula will be modified in the next section to make it truly iterative (i.e., updated at the current spike time using the result from the previous spike time in the common timeline). This modification also resolves the ambiguities that arise when spikes from a and b coincide.
The initial value of dh″b can be computed from formula (8.169) by setting t to zero, i.e.,
That is, the initial value of h″b during decoding is equal to the final value of h″b during encoding.
A small technical detail has to be addressed during the very first iteration. Let t1 be the time of the first spike on either channel a or channel b, i.e., t1=min(a1, b1). Then, by definition, the truncated spike train b[0, t1) contains no spikes. Thus, the Laplace transform of b[0, T] reduces to the Laplace transform of b[t1, T], i.e.,
Therefore, setting t=t1 in equation (8.169) leads to the following formula:
d
h″
b(t1)=est
In other words, the initial value dh″b(0), which is equal to eh″b(τ), is multiplied by est
8.12.3 The Iterative Decoding Verification Formulas for a Common Timeline
This section states the decoding verification formulas for a common timeline. Let a=(a1, a2, . . . , aJ) be the list of spikes that we want to verify. Let b=(b1, b2, . . . , bK) be the list of spikes on channel b that are available at run time. Finally, let c=(c1, c2, . . . , cJ+K) be another list that is derived from a and b by combining and sorting the spike times of these two lists in increasing order.
At the start of this process dh″b and dMa,b are equal to the values computed at the end of encoding, i.e., at the (J+K)-th encoding iteration. In other words, the initial conditions are:
d
h″
b[0]=eh″b[J+K]. (8.181)
d
M
a,b[0]=eMa,b[J+K]. (8.182)
Once again, the 0-th iteration counter is used to capture the initial conditions. Also, in keeping with the previous convention, we will use i instead of ci and the square bracket notation, e.g., dh″b(ci)=dh″b[i].
As in the encoding case, it is assumed that c and â are ordered in a way that ensures correct processing of coincident spikes (i.e., in the common timeline, spikes from a are processed before their coincident counterparts from b). More formally, (c, â) must satisfy the following two conditions, which are identical to (8.158) and (8.159):
1)
c
i
≤c
i+1, for each i ∈ {1, 2, . . . , J+K−1}, (8.183)
2)
if ci=ci+1, then âi=1 and âi+1=0. (8.184)
where â is a binary indicator array defined by (8.154).
Combining formulas (8.171) and (8.174) leads to the following formula for dh″b[i]:
This expression, however, works backward in time. To get the iterative update formula we need to rearrange the terms as follows:
where i ∈ {0, 1, 2, . . . , J+K−1}. This formula combines formulas (8.176) and (8.177). It states that the value of dh″b is multiplied by the exponential es(c
The value of the matrix element is updated as follows:
for each i ∈ {0, 1, 2, . . . , J+K−1}. This formula performs the updates specified by formula (8.168). Note that the updates are performed only at the spike times from a, otherwise the matrix element remains unchanged.
Processing the first spike in c requires special attention. As described in formula (8.180), the value of h″b is is updated as follows in this case:
dh″b[1]=dh″b[0]esc
The algorithm handles this case implicitly by augmenting this formula as follows:
d
h″
b[1]=dh″b[0]es(c
where c0=0. In other words, it augments the array c with an implicit 0-th spike at time t=0.
Similarly to the encoding case, the verification algorithm does not construct the array c explicitly. Instead, it uses the variables t and tprev to keep only its two most recent elements. The algorithm does not construct the array â either. Instead, it uses the boolean variable spikeOnA to track only its most recent element, i.e., spikeOnA is equal to âi+1. This ensures that coincident spikes are processed in the correct order.
8.12.4 Deriving the Update Formulas for dh″b from the Model
This section derives the common-timeline version of the iterative decoding verification formulas for dh″b shown in
The formulas in
The rest of this section applies these formulas to the four cases shown in
Case aa: In this case, both ci and ci+1 come from a. Therefore, only the first case of (8.190) and the first case of (8.191) apply. Also, in this case, ci<ci+1 because spikes on a don't coincide. Moreover, the truncated spike train b[ci, ci+1) is empty. This leads to the following expression:
Case ab: In this case, ci originates from a and ci+1 originates from b. Therefore, we need to use the first case of (8.190) and the second case of (8.191). Using the linearity of the Laplace transform and the fact that b[ci, ci+1] contains only one spike at ci+1, we can derive the following:
This is the only case in which there could be a coincidence, i.e., it is possible that ci=ci+1. However, formula (8.193) holds even for coincident spikes. That is, the truncated spike train b[ci, ci+1] would contain only one spike at ci+1, even if ci=ci+1.
Case ba: In this case, âi=0 and âi+1=1. This implies that only the second case of (8.190) and the first case of (8.191) apply. Using the fact that the truncated spike train b(ci, ci+1) contains no spikes, we can derive the following formula:
By the construction of c and á, the two spikes cannot coincide in this case, i.e., ci<ci+1. If they did coincide, then the spike from a would be listed first, which would be handled by the case ab.
Case bb: In this case, both ci and ci+1 originate from b. Thus, we can use the second case of (8.190) and the second case of (8.191) to derive the following update formula:
In this case, ci is strictly less than ci+1, because, by definition, the spike train b does not contain duplicate spikes. Thus, b(ci+1, T] is different from b(ci, T].
Even though there are four cases, they reduce to only two update formulas that depend only on the origin of the most recent spike. If the most recent spike is from a, then the previous value of dh″b is multiplied by es(c
8.12.5 Deriving the Update Formulas for dMa,b from the Model
This section shows how the update formulas for dMa,b from
d
M
a,b
[i]=
{a(ci, T]★b[ci, T]}(s), (8.196)
d
M
a,b
[i+1]={a(ci+1, T]★b[ci+1, T]}(s). (8.197)
Once again, we need to consider four cases, which depend on the origin of the two most recent spikes. These four cases are visualized in
Case aa: In this case, both ci and ci+1 originate from a. Thus, the truncated spike train b[ci, ci+1) is empty. Moreover, ci<ci+1, because the spikes in a cannot coincide. Therefore,
Rearranging the terms leads to the following update formula:
d
M
a,b
[i+1]=dMa,b[i]−dh″b[i+1]. (8.199)
Note that (8.198) used a property of the Laplace transform of the cross-correlation of a single spike and a spike train, which implies that:
{(ci+1)★b[ci+1, T]}(s)=esc
Case ab: Suppose that ci<ci+1, i.e., there is no coincidence. Then, the truncated spike trains a(ci, ci+1] and b[ci, ci+1) are empty. Therefore,
By the construction of c and a, this is the only case in which there can be a coincidence, i.e., it is possible that ci=ci+1. Even if ci=ci+1, however, it would still be true that dMa,b[i]=dMa,b[i+1], because the truncation intervals in (8.196) and (8.197) would be the same.
Case ba: In this case, ci originates from a, ci+1 originates from b, and there can be no coincidences, i.e., ci is strictly less than ci+1. Due to the construction of the common timeline, all coincidences are handled by the case ab. Thus, we can derive the following expression, which leads to the same update formula as in (8.199):
The two cancellations in this derivation can be explained as follows. Each spike in the interval a(ci+1, T] follows every spike in the interval b[ci, ci+1). However, only spike pairs in which the spike from a(ci+1, T] precedes or coincides with a spike from b[ci, ci+1) contribute to the value of {a(ci+1, T]★b[ci, ci+1)}(s). This implies that its value is zero. Similarly, ci+1 follows every spike in b[ci, ci+1), which implies that {(ci+1)★b[ci, ci+1)}(s) is also equal to zero. This derivation uses the property {(ci+1)★b[ci+1, T]}(s)=esc
Case bb: In the fourth case, both spikes come from b and there can be no coincidences. Using the fact that in this case a(ci, ci+1] is empty, we can derive the following update formula:
The second cancellation in this derivation is justified because the two truncated spike trains a(ci+1, T] and b[ci, ci+1) don't overlap and therefore the Laplace transform of their cross-correlation is equal to zero.
Similarly to the update formulas for dh″b, the four cases for dMa,b collapse to just two update formulas. These formulas depend only on the origin of the most recent spike, i.e., they depend on âi+1. The two formulas match the update formulas shown in
8.12.6 At the End of Decoding Verification dh″b and dMa,b are Equal to Zero
This section shows that at the end of the verification process the value of dh″b and the value of dMa,b are equal to zero. In other words, this section shows that all four formulas in
If âJ+K=1, then cJ+K comes from a. Therefore, the last spike on b occurs strictly before cJ+K (otherwise, the coincidence would be handled by the case ab). Therefore, the truncated spike train b[cJ+K, T] is empty. Thus,
d
h″
b
[J+K]=e
sc
{b[c
J+K
, T]}(s)=0. (8.204)
Moreover, the truncated spike train a(cJ+K, T] is also empty. Thus,
d
M
a,b
[J+K]=
{a(cJ+K, T]★b[cJ+K, T]}(s)=0. (8.205)
If âJ+K=0, then b(cJ+K, T] is empty. Therefore,
d
h″
b
[J+K]=e
sc
{b(cJ+K, T]}(s)=0. (8.206)
The truncated spike train a(cJ+K, T] is empty in this case too. Thus,
d
M
a,b
[J+K]=
{a(cJ+K, T]★b[cJ+K, T]}(s)=0. (8.207)
8.13 The Decoding Verification Algorithm
The decoding verification procedure that was described in Section 8.12 can be implemented by an algorithm for which the run-time complexity is O(J+K), where J is the number of spikes in a and K is the number of spikes in b. If the verification is successful, then the two values returned by this algorithm should be equal to zero.
Once again, this is a verification algorithm, not a decoding algorithm, because the spike train a is given to the algorithm. A decoding algorithm would have to infer the spike train a. Also, this algorithm verifies only one element of the matrix. Because the computation is local and does not depend on any other matrix element, however, different instances of this algorithm can be run in parallel. For example, to verify the entire matrix, one instance of the algorithm could be run for each matrix element.
9 Continuous-Time Formulation for Weighted Spike Trains
This chapter extends the theory described in Chapter 8 so that it can be applied to weighted spike trains. These extensions are then used to state the SUV family of algorithms, which can be viewed as the continuous-time counterparts to the ZUV algorithms for discrete sequences.
9.1 Modeling Weighted Spikes and Weighted Spike Trains
Section 8.4 described how to model spikes and spike trains. In that case all spikes were alike. This section extends the theory so that it can handle spikes that are weighted differently.
In the previous case each spike was modeled with a shifted template function δn(t−t0), which was defined as follows:
In this chapter we will use the same template function, but it will be weighted differently for different spikes.
Let c be a complex scalar. Then, the weighted and shifted template function cδn(t−t0) is defined as follows:
Note that in this definition the height of the template is scaled by c, but the width is not scaled. Therefore, the area under the curve is no longer equal to 1 if c≠1. Also, note that the scaled template could be complex, while the original one is always real.
To model a weighted spike train, we first need to introduce a notation for the weights that will be associated with each spike. Let v(t) be a weighting function and let b(n)(t) be the model for the spike train b=(b1, b2, . . . , bK). We will use the notation (vb(n))(t) to denote the spike train obtained after weighting b(n) by v(t). The superscript n indicates that the template function δn(t−bk) is used to model each spike before it is scaled by v(t).
for each n ∈ {1, 2, . . . }. In this notation b1, b2, . . . , bK denote the times at which the individual spikes occur. It is assumed that the list of spikes is sorted in increasing order and that this list does not contain any duplicates.
Using a similar approach the spike train a=(a1, a2, . . . , aJ) that is weighted by the function u(t) can be defined as:
In this case the shifted template function is δm and it is defined as follows:
9.2 Operations on Weighted Spike Trains
This section defines some operations on weighted spike trains. These are similar to the operations on spike trains defined in Section 8.5, but now the spike trains are weighted. As a result of this weighting, the notation and the formulas are slightly different. By default it will be assumed that all weighting functions are continuous functions.
9.2.1 The Laplace Transform of a Weighted Spike Train
Let a=(a1, a2, . . . , aJ) be a spike train and let u(t) be a complex function of a nonnegative real argument, i.e., u: 0+→. The value of the Laplace transform of the spike train a weighted by the function u(t) will be denoted by a(u)(s). That is, the superscript is the weighting function, the subscript is the spike train, and s is the argument of the Laplace transform. Note that u(t) is not the unit step function (a.k.a., Heaviside function), which is denoted with H(t) in this document.
If the weighting function is continuous, then the value of the Laplace transform of the weighted spike train can be expressed in terms of the values of the weighting function at each of the spike times. This derivation is shown below.
If the spike train a is causal (i.e., if aj≥0 for all j), then the Heaviside function in the previous expression will always be equal to 1 and the formula can be simplified as follows:
Furthermore, if the spike train a is causal and contains just one spike that occurs at time a1, i.e., a=(a1), then the formula reduces to the formula for the Laplace transform of a weighted Dirac's delta that is shifted to the right by a1. In this case the expression is:
Similarly, if v(t) is a continuous weighting function and b=(b1, b2, . . . , bK) is a spike train, then the Laplace transform of b weighted by v(t) is given by:
If the spike train b is causal, then this simplifies as follows:
Finally, if b is causal and has just one spike at time b1, i.e., b=(b1), then the formula reduces to:
b
(v)(s)=(v){δ(t−b1)}(s)=v(b1)e−sb
9.2.2 The Cross-Correlation of Two Weighted Spike Trains
This section defines the cross-correlation of two weighted spike trains. In extends both the theory and the notation described in Section 8.5.2.
Let a=(a1, a2, . . . , aJ) be a spike train that has J spikes that occur at times a1, a2, . . . , aJ. Also, let u(t) be a weighting function. As described in Section 9.1 the weighted spike train can be expressed as the sum of weighted and shifted template functions δm. In other words,
Similarly, if b=(b1, b2, . . . , bK) is another spike train that is weighted by the function v(t), then the weighted spike train can be expressed as
where δn is also a shifted template function.
The following definition formally states the model for the cross-correlation of two weighted spike trains.
Note that in (9.15) the conjugation over a(m)(τ) can be dropped because it is modeled with a template function δm that is a real-valued function of a real argument. The conjugation over
9.2.3 The Laplace Transform of the Cross-Correlation of Two Weighted Spike Trains
The Laplace transform of the cross-correlation of the spike train a=(a1, a2, . . . , aJ) weighted by the function u(t) and the spike train b=(b1, b2, . . . , bK) weighted by the function v(t) will be denoted with (u,v){a★b}(s) or with a★b(u,v)(s) for short. As shown below, the result of this operation is defined as the iterated limit of the Laplace transform of the cross-correlation of ua(m) and vb(n) as the width of the template δm and the width of the template δn tend to zero.
Starting with this definition, we can derive a closed-form formula for the Laplace transform of the cross-correlation of two causal weighted spike trains. The first step is shown below:
The previous expression uses the short-hand notation fk(τ), which is defined as:
To continue the derivation we will show that the limit of fk(τ) as τ→t0 exists and is finite for each t0 ∈. To do this, we will use the variable substitution {circumflex over (τ)}=bk−τ to derive the following closed-form expression for the value of this limit:
Formula (9.19) moves the Heaviside function out of the integral and the two limits. It also uses Theorem 8.16, which can be applied to the inner limit because v(bk−{circumflex over (τ)}+t)e−st is continuous. Finally, it evaluates the limit as τ→(bk−t0) of the result from Theorem 8.16. This limit exists and is finite.
To get the formula for the value of a★b(u,v)(s) we can combine (9.19) and (9.17) as shown below:
To summarize, the formula for the Laplace transform of the cross-correlation of two causal weighted spike trains is:
In the special case when u(t)=1 and v(t)=1 this formula reduces to formula (8.46), which defines a★b(s), i.e., the Laplace transform of the the cross-correlation of two unweighted spike trains. By extension, this formula also reduces to all the other special cases described in Section 8.5.3.
9.2.4 Some Additional Properties
v(t)=es
where s0 is a complex constant. Then,
(v)
{b}(s)={b}(s−s0). (9.23)
u(t)=e−e−
v(t)=es
where s0 is a complex constant. Then,
(u,v)
{a★b}(s)={a★b}(s−s0). (9.26)
u(t)=Ue−
v(t)=Ves
where s0, U, and V are complex constants. Then,
(u,v)
{a★b}(s)=(U,V){a★b}(s−s0). (9.29)
9.3 Operations on Weighted and Truncated Spike Trains
This section extends the theory described in Section 8.6. The new formulas can be used with truncated spike trains that are also weighted. The truncation is still performed using two Heaviside functions and left- and right-limits for the bounds.
9.3.1 The Laplace Transform of a Weighted and Truncated Spike Train
where
(vb[t
for each n ∈ {1, 2, . . . }.
Because each spike train is modeled as a sum of shifted and weighted template functions we can use Definition 9.9 to derive a closed-form expression for the Laplace transform of a weighted and truncated spike train in which the integral reduces to a sum. More formally,
Formula (9.32) is for a general case in which t1 and t2 can have arbitrary values. In the special case when the spike train b is causal and t1=0 and t2=t this formula simplifies as follows:
Another special case is when the spike train b is causal, the truncation interval is [t, T], and all spikes in b occur no later than time T. Now t1=t and t2=T and formula (9.32) reduces to:
Yet another special case can be derived from (9.34) after multiplying both sides by est. In other words,
As described in Section 8.6.2 this formula can be viewed as a special case of the left-shift theorem for the Laplace transform when the input function is a weighted spike train. The shift in this case is equal to t.
9.3.2 The Laplace Transform of the Cross-Correlation of Two Weighted and Truncated Spike Trains
This section gives the formal definition for the Laplace transform of the cross-correlation of a pair of weighted and truncated spike trains. It extends the theory that was presented in Section 8.6.3 to handle weighted and truncated spike train as well.
where m and n are two positive integers. Each of these functions is defined by the following equation:
Definition 9.11. The Laplace transform of the cross-correlation of two weighted and truncated spike trains. Let a=(a1, a2, . . . , aJ) be a spike train that has J spikes and let b=(b1, b2, . . . , bK) be another spike train that has K spikes. Let u(t) and v(t) be two weighting functions. Let t1 and t2 be two real numbers such that t1≤t2. Also, let τ1 and τ2 be a pair of real numbers such that τ1≤τ2. The Laplace transform of the cross-correlation of a[t1, t2] weighted by u(t) and b[τ1, τ2] weighted by v(t) is defined as the iterated limit of Laplace transforms of the functions in the model for the cross-correlation as m and n approach infinity. More formally,
The previous definition can be used as a starting point to derive a closed-form formula for the Laplace transform of the cross-correlation of two weighted and truncated spike trains. To keep the formulas manageable, we will use Fm and Gn to denote two helper functions that are defined as follows:
F
m(t1, t2, aj, t)=H(t−t1−)H(t2+−t)δm(t−aj), (9.38)
G
n(τ1, τ2, bk, t)=H(t−τ1−)H(τ2+−t)δm(t−bk). (9.39)
Let L denote the value of the Laplace transform of the cross-correlation of a[t1, t2] and b[τ1, τ2] where the spike trains are weighted by u(t) and v(t), respectively. Using the helper functions Fm and Gn we can express L as follows:
The inner integral in the previous formula was replaced with gk(τ), which can be expanded as follows:
To continue the derivation we need to show that the limit of gk(τ) as τ→aj exists and is finite, which is done below:
Finally, we can derive the following formula:
Before we move on to the next topic we will derive two special cases of formula (9.43). In the first special case it is assumed that both a and b are causal spike trains that are truncated to the interval [0, t]. That is, t1=τ1=0 and t2=τ2=t. Then, formula (9.43) simplifies as shown below:
In the second special case it is assumed that all spikes in a and b occur no later than time T and that both trains are truncated to the interval [t, T]. In other words, t1=τ1=t and t2=τ2=T. Then, formula (9.43) simplifies as follows:
9.3.3 Modeling Reversed and Weighted Spike Trains
A reversed spike train can be obtained from a causal spike train by reversing the temporal order of its spikes in some interval. This was formally defined in Definition 8.25. This definition is slightly adjusted below to handle spike trains that are truncated to the interval [0, t] before they are reversed.
for each m ∈ ={1, 2, . . . }.
(τ)=u(t−τ), (9.47)
for each t such that t−τ ∈ domain(u).
(τ)=(τ). (9.48)
The notation described below assumes that both the spike train and its weighting function are reversed on the same interval, i.e., [0, t]. This is necessary because the weights of the spikes need to be preserved after the reversal. The next definition states this more formally.
Definition 9.15. Weighted, truncated, and Reversed Spike Train.
([0,t](m))(τ)=H(t+−τ)((m))(τ)=H(t+−τ)u(t−τ)a(m)(t−τ), (9.49)
for each m ∈ {1, 2, . . . }.
{[0, t]}(s)=e−st(u){a[0, t]}(−s). (9.50)
Next, we will derive two special cases of Property 9.16. The first special case conjugates the weighting function, i.e., it uses ū instead of u. Thus,
Extending this derivation leads to the following alternative expression:
{[0, t]}(s)=e−st(ū){a[0, t]}(−s). (9.52)
The second special case assumes that all spikes in a occur before time T, i.e., aj≤T for all j. Under these conditions, formula (9.51) can be evaluated at time t=T to get the following result:
Furthermore, using the fact that a=a[0, T] the expression in (9.52) can be simplified as follows:
{[0, T]}(s)=e−sT(ū){a[0, T]}(−s)=e−sTa(ū)(−s). (9.54)
9.4 The Concatenation Theorem for Weighted Spike Trains
This section states the concatenation theorem for weighted spike trains. The theorem is an extension of the theorem for unweighted spike trains that was stated in Section 8.7.
a′=(a1, a2, . . . , ap), (9.55)
a″=(ap+1, ap+2, . . . , aJ), (9.56)
where
p=max{j:aj≤C}. (9.57)
Using this formulation, the original spike train a can be recovered from the two slices a′ and a″ by concatenating the two lists of spike times, i.e., a=a′∥a″, where ∥ denotes concatenation.
b′=(b1, b2, . . . , bq), (9.58)
b″=(bq+1, bq+2, . . . , bK), (9.59)
where
q=max{k:bk<C}. (9.60)
a′←a[0, C], a″←a(C, ∞), (9.61)
b′←b[0, C), b″←b[C, ∞). (9.61)
a★b
(u,v)(s)=a′★b′(u,v)(s)+a″★b″(u,v)(s)+
Note that in the statement of the theorem the two spike trains a and b are split slightly differently. The prefix a′ includes all spikes from a that fall in the closed interval [0, C] and the suffix a″ includes the remaining spikes from a that fall in (C, ∞). The train b, however, is split such that the prefix b′ includes all spikes from b that fall in the interval [0, C) and the suffix b″ includes all spikes from b that fall in the interval [C, ∞). This difference in the splits is intentional and its purpose it to eliminate the special cases that have to be considered otherwise if there are one or more spikes that occur exactly at time C.
9.4.1 Two Special Cases of the Concatenation Theorem
This section states two corollaries of the concatenation theorem for weighted spike trains. These corollaries cover special cases in which the splits of the two spike trains have certain properties.
Corollary 9.18 is a special case of the concatenation theorem for weighted spike trains when the two trains are split such that the suffix a″ is empty and the suffix b″ contains just one spike.
a★b
(u,v)(s)=a′|b′(u,v)(s)+v(bK)(s), (9.63)
where denotes a spike train obtained by reversing the spikes in a in the interval [0, T]. In other words, the time of the n-th spike in is given by
()n=T−aJ+1−n, for n=1, 2, . . . , J, (9.64)
and the reversed spike train is specified as follows:
=(T−aJ, T−aJ−1, . . . , T−a2, T−a1). (9.65)
Note that in formula (9.63) the notation (s) denotes the value of the Laplace transform at s of the reversed spike train that is weighted by the reversed and conjugated function u, where (t)=
Corollary 9.19 is a special case of the concatenation theorem for weighted spike trains when the two trains are split such that the prefix a′ contains just one spike and the prefix b′ is empty.
a★b
(u,v)(s)=
9.4.2 Special Cases of the Theorem for Weighted and Truncated Spike Trains
The concatenation theorem for weighted spike trains also applies to weighted and truncated spike trains. The following two corollaries of Theorem 9.17 are the mathematical justification for the SUV algorithms, which are described later in this chapter.
(u,v)
{x[0, yn]★y[0, yn]}(s)=(u,v){x[0, yn−1]★y[0, yn−1]}(s)+v(yn){[0, yn]}(s), (9.67)
where [0, yn] is a spike train that is obtained by reversing x[0, yn] in the interval [0, yn]. That is,
[0, yn]=(yn−xp, yn−xp−1, . . . , yn−x1), (9.68)
where p=max{j:x1<yn}. Also, (t)=
(u,v)
{x[0, y1]★y[0, y1]}(s)=v(y1){[0, y1]}(s). (9.69)
Corollary 9.21. Let x=(x1, x2, . . . , xJ) and y=(y1, y2, . . . , yK) be two causal spike trains that are weighted by the functions u(t) and v(t), respectively. It is assumed that the spikes in x and y occur no later than time T, i.e., 0≤xj≤T and 0≤yk≤T for all j ∈ {1, 2, . . . , J} and for all k ∈ {1, 2, . . . , K}. Then, for any integer m ∈ {1, 2, . . . , J−1} the following is true:
(u,v)
{x[x
m
, T]★y[x
m
, T]}(s)=
Also, in the special case when m=J, the following equation holds:
(u,v)
{x[x
J
, T]★y[x
J
, T]}(s)=
Finally, we will prove two properties that were used to prove Corollary 9.20 and Corollary 9.21, respectively.
(u,v)
{x[0, t]★y[0, τ]}(s)=(u,v){x[0, τ]★y[0, τ]}(s). (9.72)
(u,v)
{x[t, T]★y[τ, T]}(s)=(u,v){x[t, T]★y[t, T]}(s). (9.73)
In other words, the Laplace transform of the cross-correlation of x[t, T] weighted by u(t) and y[τ, T] weighted by v(t) is equal to the Laplace transform of the cross-correlation of x[t, T] weighted by u(t) and y[t, T] weighted by v(t). That is, the spikes from y that occur in the interval [τ, t) don't influence the result.
9.5 The SUV SSM Model
This section describes the SSM model for weighted spike trains, which is an extension of the model for non-weighted spike trains described in Section 8.8. To distinguish between the two models the new one will be called the SUV SSM model or simply the SUV model. This name comes from the three parameters of the model—s, u, and v—where s is the argument of the Laplace transform and u and v are two weighting functions. This is analogous to the ZUV model in the discrete-time case, which also had three parameters: z, u, and v.
The SUV model has three components: a matrix M and two vectors h′ and h″. The matrix is of size M′×M″, h′ is a column vector of size M′, and h″ is a row vector of size M″. Without loss of generality, the examples in this section will assume that M′=M″=2.
9.5.1 The Model at the End of Encoding
The formulas in
These three expressions follow from formulas (9.53), (9.11) and (9.21), respectively.
9.5.2 The Model at a Specific Time During Encoding
The formulas described in the previous section express the elements of the model at the end of the encoding process, which is assumed to be at time T. This section presents another set of formulas that express these values at any time t prior to that.
Using the notation for weighted and truncated trains the components of the model can be expressed as follows:
Note that all spike trains are truncated in the interval [0, t]. This suggests that the encoding can be accomplished with a single pass through all trains, which is what the SUV encoding algorithm does (see Section 9.8).
By adapting formulas (9.51), (9.33), and (9.44) the encoding formulas can also be stated in the following form:
where
All of these formulas are mathematically correct, but from a computational point of view they are not very efficient. Section 9.7 derives iterative versions of these formulas that are used by the SUV encoding algorithm.
9.5.3 The Model at a Specific Time During Decoding
Using formulas (9.45) and (9.35) the elements of M and h″ can also be stated as follows:
where
9.5.4 The Formulas for an Abstract Element
For the sake of convenience and completeness, this section states the formulas for an abstract element of the matrix and the two vectors. Let a=(a1, a2, . . . , aJ) and b=(b1, b2, . . . , bK) be two causal spike trains that contain J and K spikes, respectively. The matrix element that corresponds to this pair of spike trains will be denoted with Ma,b. The elements of the two vectors that correspond to Ma,b will be denoted with h′a and h″b. Using this convention, the encoding and decoding formulas are stated below in two different ways. The first approach uses the Heaviside function. The second approach uses the Laplace transform notation.
At the end of encoding (i.e., at time T):
At time t during encoding:
At time t during decoding:
The same set of formulas can also be stated using the Laplace transform notation for weighted and truncated spike trains.
At the end of encoding (i.e., at time T):
h′
a=eh′a(T)={[0, T]}(s)=e−sT (9.103)
h″
b=eh″b(T)=(v){b}(s), (9.104)
M
a,b=eMa,b(T)=(u,v){a★b}(s). (9.105)
At time t during encoding:
e
h′
a(t)={[0, t]}(s)=e−st (9.106)
e
h″
b(t)=(v){b[0, t]}(s), (9.107)
e
M
a,b(t)=(u,v){a[0, t]★b[0, t]}(s). (9.108)
At time t during decoding:
d
M
a,b(t)=(u,v){a[t, T]★b[t, T]}(s), (9.109)
d
h″
b(t)=est(v){b[t, T]}(s). (9.110)
9.6 Duality of the Matrix Representation
This section shows that the matrix values in the SUV model can be viewed in two different ways. The first view suggests how the matrix can be encoded. The second view suggests how it can be decoded. This duality is analogous to the one described in Section 8.9, but now the spike trains are weighted and as a result of this the formulas are slightly different.
At the end of the encoding process the value of the matrix element Ma,b is equal to:
The two views of the matrix factor this expression in two different ways and express it in terms of eh′a(t) and dh″b(t), which were defined as follows:
9.6.1 Encoding View
The matrix element Ma,b is encoded from two causal spike trains a and b. Because these spike trains contain a finite number of spikes we can change the order of the two sums in (9.111). This swap allows us to factor the expression as follows:
That is, the value of Ma,b can be represented as a weighted sum of the values of eh′a at the times of the spikes in b. The corresponding weights in this sum are given by the values of the weighting function v, also at the spike times in b.
This type of factorization applies to any element of the matrix. If the matrix is of size 2×2, then it has the following form:
9.6.2 Decoding View
Formula (9.111) can also be factored as a weighted sum of the values of dh″b(t) at the times of the spikes in a. That is,
In this case, the weights are equal to
Once again, this factorization applies to all elements of the matrix. In particular, a 2×2 matrix can be expressed as follows:
9.7 Derivation of the Iterative Encoding Formulas
This section derives the formulas that are used by the SUV encoding algorithm, which is described in Section 9.8. These formulas are iterative versions of formulas (9.106), (9.107), and (9.108).
9.7.1 Computing the a-th Element of the Vector h′
The iterative formula for computing the value of eh′a(am) in terms of the value of eh′a(am−1) can be derived using the additivity of the Laplace transform. Let a[0, am] be a truncated spike train that contains the first m spikes from a. Then, a[0, am] can be represented as the following sum:
a[0, am]=a[0, am−1]+a(am−1, am]. (9.118)
Also, recall that the value of eh′a at time t during encoding is given by the following formula:
e
h′
a(t)={[0, t]}(s)=e−st (9.119)
By setting t=am into the previous formula and then using (9.118), the value of eh′a(am) can be expressed in the following way:
The second formula expresses the value of eh′a at the time of the n-th spike in b in terms of its value at the previous spike in a. Let p be the index of that previous spike on channel a, i.e., p=max{j:aj≤bn}. In this case, the truncated spike train a[0, bn] can be expressed as follows:
a[0, bn]=a[0, ap]+a(ap, bn]. (9.121)
Note that the slice a(ap, bn] is empty because by definition there are no spikes after ap and before bn. Thus, by setting t=bn into (9.119) we get:
To summarize, the formulas for updating eh′a during encoding are:
e
h′
a(am)=eh′a(am−1)e−s(a
e
h′
a(bn)=eh′a(ap)e−s(b
The reason for stating two formulas is that each of them is used for a different purpose. The first one is used to update eh′a at the times of the spikes on channel a. The second one updates eh′a at the times of the spikes on channel b. Section 9.7.4 merges these two formulas into a single formula by using a common timeline for the spikes on both channels.
9.7.2 Computing the b-th Element of the Vector h″
The value of eh″b at time t during the encoding process is given by formula (9.107), which is replicated below:
e
h″
b(t)=(v){b[0, t]}(s). (9.125)
Our goal is to compute the value of eh″b incrementally, i.e., to compute eh″b(bn) in terms of its previous value eh″b(bn−1). To derive an iterative formula we can start by cutting the spike train b[0, bn] into two parts at time bn−1, i.e.,
b[0, bn]=b[0, bn−1]+b(bn−1, bn]. (9.126)
Note that the second slice contains just one spike that is at time bn, and thus it can be represented with a shifted delta function. Then, formula (9.125) and the additivity of the Laplace transform can be used to derive the following:
Thus, the iterative formula is:
e
h″
b(bn)=eh″b(bn−1)+v(bn)e−sb
That is, the value of the b-th element of h″ at time bn during encoding is equal to the value of the same element at time bn−1 plus the product between the value of the weighting function v(t) at time bn multiplied by e−sb
9.7.3 Computing the Matrix Element in the a-th Row and b-th Column
Let a=(a1, a2, . . . , aJ) and b=(b1, b2, . . . , bK) be two causal spike trains that are weighted by the functions u(t) and v(t), respectively. The matrix element that is encoded from these two spike trains is denoted with Ma,b. Its value at time t during encoding is equal to:
e
M
a,b(t)=(u,v){a[0, t]★b[0, t]}(s). (9.129)
That is, its value is equal to the Laplace transform at s of the cross-correlation of a and b, both of which are truncated at time t and weighted by u and v.
Because formula (9.129) is valid for any time t, we can set t to the time of the (n−1)-st spike in b, i.e., t=bn−1, to get the following:
e
M
a,b(bn−1)=(u,v){a[0, bn−1]★b[0, bn−1]}(s). (9.130)
Also, by setting t to the time of the n-th spike in b leads to:
e
M
a,b(bn)=(u,v){a[0, bn]★b[0, bn]}(s). (9.131)
Corollary 9.20 implies that (9.131) can be represented as the following sum:
Corollary 9.20 also implies that, in the special case when t=b1, the formula has the following form:
Therefore, the value of the matrix element eMa,b can be updated iteratively at the times of the spikes on channel b. The iterative update formula follows from (9.132) and (9.133), i.e.,
e
M
a,b(bn)=eMa,b(bn−1)+v(bn)eh′a(bn). (9.134)
In other words, at the time of the n-th spike on channel b during encoding, the value of the matrix element Ma,b is equal to its previous value at the time of the (n−1)-st spike in b plus the value of the weighting function at the time of the n-th spike in b multiplied by the value of the a-th element of the vector h′ at the time of the n-th spike in b.
9.7.4 The Iterative Encoding Formulas for a Common Timeline
This section expresses the encoding formulas for a common timeline that combines the spike times from a and b. The formulas derived here are more suitable for an algorithmic implementation. The encoding algorithm is described in the next section.
Let a=(a1, a2, . . . , aJ) and b=(b1, b2, . . . , bK) be two spike trains and let u(t) and v(t) be their corresponding weighting functions. Also, let c=(c1, c2, . . . , cJ+K) be a list of spike times that combines the spike times from a and b and sorts them in increasing order. Finally, let â=â1, â2, . . . , âJ+K) be a binary array such that its i-th element is defined as follows:
This array represents the origin of each spike. The elements of c and â can be computed using the algorithm described in Section 8.10.4. That algorithm merges the spike times in a and b to produce c. If an element of a is equal to an element of b, then it gives precedence to the spike from a.
The encoding algorithm assumes that the variables eh′a, eh″b, and eMa,b, which store the SSM model during encoding, are initialized to zero. More formally, the initial conditions are:
eh′a[0]=0, (9.136)
eh″b[0]=0, (9.137)
eMa,b[0]=0. (9.138)
These formulas use the 0-th iteration counter to capture the initial conditions, i.e., they use an implicit c0 that is at time t=0. Also, the formulas use square brackets instead of round brackets to indicate the iteration counter. This notation is useful for an algorithmic implementation, but it does not mean that time is discretized at regular intervals. Instead, the time is indexed at the times of the spikes. That is, the i-th index corresponds to an update that will be performed at time ci (this is different from the ZUV model, in which the time is discretized).
The a-th element of the vector h′ is updated using the following formula:
for each i ∈ {1, 2, . . . , J+K}.
The matrix element eMa,b is updated using the following rule:
for each i ∈ {1, 2, . . . , J+K}. The two formulas imply that eh′a is updated before its value is used to update the matrix element Ma,b.
The b-th element of the vector h″ is updated as follows:
for each i ∈ {1, 2, . . . , J+K}.
If two spikes occur at the same time, i.e., when aj=bk for some j and k, then they are processed individually in two separate iterations. The spike from channel a is processed first; the spike from channel b is processed in the next iteration. Because in the case of coincidence ci=ci−1, the second update does not affect eh′a, which remains unchanged since e−s(c
The formulas in the previous subsections were formulated for split timelines, i.e., separately for a and b. When a and b are merged into c, however, there can be ambiguities when spikes on a and b coincide (e.g., eh″b(aj)≠eh″b(bk) even though aj=bk). Defining the common timeline mapping as shown in
9.8 The SUV Encoding Algorithm
This algorithm is based on the formulas for a common timeline given in Section 9.7.4. The common timeline c, however, is not explicitly computed by the algorithm. Instead, it is constructed by implicitly merging the spikes from a and b, which is possible because the spike times in both a and b are sorted. Only the two most recent spike times are preserved and stored in the variables tprev and t. The boolean array â is not generated either because the algorithm needs only the relevant element of â, which is stored in the boolean variable spikeOnA. The computational complexity of the SUV encoding algorithm is O(J+K), where J is the number of spikes in a and K is the number of spikes in b.
The structure of the algorithm is similar to the algorithm for non-weighted spike trains, which was described in Section 8.11. Because the trains are now weighted, however, this requires some additional bookkeeping. The algorithm uses the helper variables û, {circumflex over (v)}, and ĝ to incrementally update the values of the exponential weights from their previous values. These updates use the following property of the exponential function: ex+y=exey or, in this case, e−st
In the previous sections, the weighting functions were stated in an abstract form, i.e., u(t) and v(t). The algorithm, however, needs to use concrete weighting functions. In this implementation, these functions are u(t)=Uc−ut and v(t)=Vc−ut, where the scaling constants U and V are assumed to be equal to 1. The parameters u and v determine the decay rates of the exponentials. Together with the parameter s, they form the three main arguments of the SUV encoding algorithm. The remaining arguments are two lists that represent the spike trains a and b. The algorithm returns the value of the matrix element Ma,b and the elements h′a and h″b of the two vectors. If both u and v are real, then all conjugations in the formulas can be dropped and the algorithm does not need to handle them.
The algorithm computes only one element of the matrix. Because the encoding of each matrix element is independent of the other elements, the entire matrix can be computed in parallel by running one instance of the algorithm for each matrix element.
A small technical detail that is worth mentioning is how the algorithm handles coincident spikes on a and b. Suppose that aj=bk for some j and k. In this case, the encoding formulas described in the previous section process the spike from a before the spike from b. More specifically, when aj=bk, the spike on a at time aj is processed in the first iteration and the spike on b at time bk is processed in the second iteration. This order can be enforced by the condition aj≤bk. Because aj=bk, however, this implies that t−tprev=0 when bk is processed. In this case, all updates that use t−tprev in the exponent have no effect, i.e., they reduce to multiplication by 1. That is, the updates of Ma,b and h″b will use the previous values of h′a, {circumflex over (v)}, and ĝ.
9.9 Derivation of the Iterative Decoding Verification Formulas
This section derives iterative formulas for verifying the solution obtained by a decoding algorithm. The formulas suggest how the values of Ma,b and h″b can be gradually depleted down to zero. The formulas assume that the spike train a is available, which is not the case for decoding. Thus, these are verification formulas and not decoding formulas. A decoding algorithm has to infer the times of the spikes on a.
9.9.1 Updating the Matrix Element in the a-th Row and b-th Column
Let a=(a1, a2, . . . , aJ) and b=(b1, b2, . . . , bK) be two causal spike trains such that all of their spikes occur no later than time T. At time t during decoding the matrix element Ma,b has the following value:
d
M
a,b(t)=(u,v){a[t, T]★b[t, T]}(s). (9.142)
This equation follows from formula (9.109) and is valid for any time t ∈ [0, T]. In particular, if we set t to the time of the m-th spike in a, i.e., t=am, then we will get:
d
M
a,b(am)=(u,v){a[am, T]★b[am, T]}(s). (9.143)
The same expression can also be evaluated at t=am+1, i.e., the time of the (m+1)-st spike in a, which leads to:
d
M
a,b(am+1)=(u,v){a[am+1, T]★b[am+1, T]}(s). (9.144)
Corollary 9.21 allows us to rewrite (9.143) as the following sum:
That is, the value of Ma,b at the time of the (m+1)-st spike in a is equal to its previous value at the time of the m-th spike in a minus the value of h″b at the time of the m-th spike in a, times the conjugated value of the weighting function u, also at t=am. In the special case when t=aJ, i.e., at the last iteration, Corollary 9.21 also implies that:
The iterative formula follows from (9.145) after rearranging its terms:
d
M
a,b(am+1)=dMa,b(am)−
9.9.2 Updating the b-th Element of the Vector h″
At time t during decoding the value of the b-th element of h″ is given by formula (9.110), which is replicated below:
d
h″
b(t)=est(v){b[t, T]}(s). (9.148)
This formula is mathematically correct, but it does not show how to iteratively compute the value of h″b. To derive an iterative formula, we will start by representing the truncated spike train b[bn, T] as the following sum:
b[b
n
, T]=b[b
n
, b
n+1)+b[bn+1, T]. (9.149)
In other words, we will split it into two non-overlapping pieces, where the cut is at the time of the (n+1)-st spike on channel b.
Now, if we set t=bn in (9.148) and then use (9.149) we will get the following expression:
Rearranging the terms in (9.150) leads to the following iterative formula:
d
h″
b(bn+1)=[dh″b(bn)−v(bn)]es(b
Using a similar approach, we can derive another update rule for dh″b, in this case, for the times of spikes on channel a. Let p be the index of the last spike in b that is strictly before the m-th spike on channel a, i.e., p=max{k:bk<am}. Then, the truncated spike train b[bp, T] can be split into two spike trains by a cut at time am as follows:
b[b
p
, T]=b[b
p
, a
m)+b[am, T]. (9.152)
Note that the first slice contains just one spike so its Laplace transform reduces to the Laplace transform of a shifted and weighted delta function.
Setting t=bp in (9.148) and then using (9.152) leads to the following expression:
After rearranging the terms, we get the following iterative formula:
d
h″
b(am)=[dh″b(bp)−v(bp)]es(a
To summarize, the two iterative decoding verification formulas for the the value of the b-th element of the vector h″ are:
d
h″
b(bn+1)=[dh″b(bn)−v(bn)]es(b
d
h″
b(am)=[dh″b(b)−v(bp)]es(a
The next section combines both of these formulas into one formula that uses a common timeline. This common timeline is denoted with c and it combines the spike times from a and b.
Finally, it is worth mentioning two special cases of formula (9.148). In the first case the value of t is equal to 0. This leads to the following expression:
In other words, the initial value of h″b at time 0 during decoding is equal to the final value of h″b at time T during encoding.
The second special case evaluates (9.148) at time t1=min(a1, b1). In other words, t1 is the time of the first spike on either channel a or channel b. In this case the formula reduces to:
Therefore, the value of h″b should be multiplied by est
9.9.3 The Decoding Verification Formulas for a Common Timeline
This section restates the formulas for Ma,b and dh″b that were derived previously in a more suitable form for an algorithmic implementation. The rewritten formulas use a common timeline c, which combines the spike times from a and b.
Let a=(a1, a2, . . . , aJ) be a causal spike train that is weighted by the function u(t). Also, let b=(b1, b2, . . . , bK) be another spike train that is weighted by the function v(t). Furthermore, let c=(c1, c2, . . . , cJ+K) be a list of spike times that combines the spike times from a and b and sorts them in increasing order.
The common timeline c can be constructed using the algorithm described in Section 8.10.4. That algorithm merges the sorted lists a and b. If two spikes coincide, then the spike from a precedes the spike from b. The algorithm also computes a binary array â=(â1, â2, . . . , âJ+K) that is used to trace each spike in c to its original spike train, which is either a or b. If a spike in c came from a, then the corresponding element of â is 1. On the other hand, if a spike in c came from b, then the value of the corresponding element of â is 0.
The initial conditions for the verification process can be stated as follows:
d
h″
b[0]=eh″b[J+K], (9.159)
d
M
a,b[0]=eMa,b[J+K]. (9.160)
In other words, the initial values during decoding are equal to the final values during encoding.
Formula (9.150) and formula (9.153) can be combined into one formula as follows:
In this case, the updates are performed only for the the spikes in b. The temporal order in formula (9.161), however, is reversed, i.e., dh″b[i] is expressed in terms of dh″b[i|1]. To fix this, we can rearrange it to get the following formula:
where i ∈ {0, 1, 2, . . . , J+K−1}.
Equation (9.147) leads to the following update rule:
for each i ∈ {0, 1, 2, . . . , J+K−1}. As expected, the matrix element dMa,b is updated only at the times of the spikes in a.
At the end of this process, the verification is successful if both the matrix element Ma,b and the vector element h″b contain zeros, i.e., dMa,b[J+K]=0 and dh″b[J+K]=0.
The ZUV decoding algorithm does not decay h″ before the initial iteration because the sequence indexing in the ZUV case is 0-based. Also, the multiplication in the ZUV algorithm is by z, but it can be viewed as a multiplication by z(t−t
9.10 The Decoding Verification Algorithm
The SUV decoding verification algorithm can be stated for only one element of the matrix, which is encoded from a pair of spike trains that are denoted with a and b. The computational complexity of this algorithm is O(J+K), where J and K are the number of spikes in a and b, respectively. Thus, the complexity is the same for both the encoding algorithm and the verification algorithm.
The algorithm uses the weighting functions u(t)=Ue−ut and v(t)=Ve−vt. The scaling constants U and V are both equal to 1 in this case. Furthermore, if the parameters u and v are real, then there is no need for conjugation.
As in the encoding algorithm, if aj=bk for some j and k, the algorithm performs two iterations to process this case. Precedence is given to the spike from a. During the second iteration, however, t−tprev=0. Thus, the value of h″ from the previous iteration is used in the second update.
10 SUV Decoding Algorithms
The verification algorithm described in Chapter 9 showed that it is possible to ‘roll back’ the SUV model from its encoded state back to zero in linear time. The SUV verification algorithm, however, relies on knowing the times of the spikes in S′. This chapter states a decoding algorithm that does not use S′. It also analyzes some of the conditions under which S′ is decoded accurately.
10.1 The SUV Decoding Problem
The main difference between ZUV decoding and SUV decoding is that the SUV model decouples the times at which spikes are emitted from the times when h″ is decreased. The spikes are emitted at the times of the spikes on dS′, while the updates to h″ happen at the times of the spikes on dS″. In other words, the time at which a spike can be emitted is not constrained in the SUV model, in contrast to the ZUV model where it can only happen at integer multiples of the discretization interval. Thus, the SUV decoding algorithm has to operate in a space with more “degrees of freedom” compared to the space of discrete sequences in which the ZUV decoding algorithm operates.
Let α(1), α(2), . . . , α(M′) be the M′ spike trains in S′. Let A(1), A(2), . . . , A(M″) be the M″ spike trains in S″. The SUV decoding algorithm takes the encoded matrix M, the vector h″, and the spike trains A(1), A(2), . . . , A(M″) as its inputs and computes the spike trains α(1), α(2), . . . , α(M′).
A useful property of the SUV decoding algorithm is its ability to handle cases when the spike trains in dS″ at decoding time may differ from their counterparts used for encoding the SUV model. It turns out that under certain constraints the SUV decoding algorithm can be robust to changing the spike times or deleting spikes in dS″.
The SUV decoding algorithm solves the problem “in real time”, i.e., it consumes the spikes in the dS″ channels in their chronological order and emits spikes on the dS′ channels as they are being decoded. That is, the algorithm does not use the spikes “from the future” in order to emit the spikes “in the present”. Only the data derived from the spikes “in the past” is used. In other words, there is no spike train buffering in the SUV decoding algorithm.
10.2 SUV Decoding for one Row of the Matrix
The SUV decoding algorithm uses a sequence of candidate spike times ψ=(ψ1, ψ2, . . . , ψL). The algorithm filters the times of candidate spikes to select only those that can be emitted in the decoded version of S′. This implies that the sequence ψ has to include all spike times in S′ to make accurate decoding possible. The sequence ψ can also include other candidate spike times.
This algorithm iterates through spike times similarly to the SUV verification algorithm. In contrast to the verification algorithm, however, the SUV decoding algorithm uses the ability to subtract h″ from m as a condition for selecting among candidates in the sequence ψ. In other words, unlike the verification algorithm, the decoding algorithm doesn't have a as one of its inputs. Instead, the decoding algorithm seeks to reconstruct a by filtering the possible spike times in ψ.
The computational complexity of the algorithm is O({circumflex over (K)}+LM″). In this formula, L is the number of candidate spike times in the sequence ψ and {circumflex over (K)} is the number of all spikes in the collection of spike trains S″. Similarly to other formulas in this document, M″ is the size of the second alphabet. In this case, this is the number of spike trains in S″.
When the algorithm is described in this form, it is necessary to build the vector ψ, which can be done using a merge sort of the N lists (i.e., spike trains), each of which is already sorted. The computational complexity is O(LN), where L is the total number of spikes in all spike trains in S. In distributed implementations or implementations in which the spikes come from an external input or device, this function may not be necessary.
10.3 SUV Decoding for a Complete Matrix
The SUV algorithm that decodes an entire matrix can use the same values of s, u, and v for all elements of the matrix. The computational complexity of this version of the algorithm is O({circumflex over (K)}+LM′M″), where L is the total number of candidate spike times in the list ψ.
10.4 Alternative Version of SUV Decoding for a Complete Matrix
In an alternative version of the decoding algorithm, the parameters s, u, and v can be customized for each element of the matrix. In this case, the computational complexity is O(({circumflex over (K)}+{circumflex over (L)})M′M″), because the algorithm needs to update each matrix element for each incoming spike or condidate spike time. If all updates run in parallel, then the run-time complexity of this algorithm reduces to O({circumflex over (K)}+{circumflex over (L)}).
10.5 Some Special Cases in which the Decoding is Accurate
This section focuses on several special cases of SUV decoding. The main goal is to describe when the SUV decoding algorithm can emit a spike. This decision is formalized using the sign of the difference between the value of the matrix element Ma,b and the value of dh″b weighted by the function u(t) at time t during decoding. This difference can be viewed as a strictly monotone function g(t). The proofs use the intermediate value theorem (from Calculus) to show that g(t) has only one zero in the decoding interval and that this zero is located at the correct decoding time. Theorem 10.1 covers the case when both spike trains consist of only one spike, i.e., a=(a1) and b=(b1). Theorem 10.2 extends this proof to the case when a=(a1) and b=(b1, b2, . . . , bK). In both cases it is assumed that a1<b1.
f(t)=u(t)dh″b(t)=u(t)H(b1−t)v(b1)e−s(b
Suppose that f(t) is strictly monotone on [0, b1], i.e., for each t1 & t2 such that 0≤t1<t2≤b1, f(t1)<f(t2) or f(t1)>f(t2). Then, f(t)=dMa,b(0) on t ∈ [0, b1] if and only if t=a1.
In other words, if there is only one spike on a and only one spike on b, and if the spike on b comes after the spike on a, then the function f(t)=u(t)dh″b(t) crosses the value of dMa,b(0) at exactly the right time during decoding, i.e., when t=a1. The spike on b doesn't even have to be present during decoding in order to decode a1 correctly. That is, if a spike on b arrives after a1 or even if it doesn't arrive at all, then the decoding will still be accurate.
The following theorem extends the condition imposed on the function f(t) by Theorem 10.1 to the case when K>1. Once again, if a1≤b1, then the equation f(t)=dMa,b(0) has only one solution on [0, b1] at t=a1.
Suppose that f(t) is strictly monotone on [0, b1] . That is,
f(t1)>f(t2) or f(t1)<f(t2), for each t1, t2 ∈ [0, b1] s.t. t1<t2. (10.4)
Then, the following equation
d
M
a,b(0)−f(t)=0 (10.5)
has a unique solution on the interval t ∈ [0, b1], which is located at t=a1.
The SUV decoding algorithm uses the function u(t)=e−ut. The algorithm allows spiking when both f(t) and dMa,b(0) are positive and when it is possible to subtract f(t) from dMa,b without making its value negative. This restricts (10.4) to only decreasing f(t), which maps to the inequality u>s in this case.
10.6 A Special Case of Incorrect Decoding
The following theorem describes a family of spike decoding problems where the function f(t) does not intersect the line dMa,b (0) at t=a1. In other words, this theorem can be viewed as a counter-example that shows that there are cases when the decoding algorithm can't decode the first spike accurately, even if f(t) is strictly decreasing.
d
M
a,b(0)−f(t)=0, (10.6)
has at most one solution on t ∈ [0, b1] . Furthermore, if t1 is the value of this solution, then t1<a1.
10.7 Definitions for Interleaving
That is, Kn denotes the number of spikes in the spike train A(n) for each n ∈ {1, 2, . . . , N}. As usual, it is assumed that the spikes in each spike train are ordered in increasing order based on their arrival time and that there are no duplicate spikes, i.e., A1(n)<A2(n)< . . . <AK
αj ∈ [0, A1(n)] ∪ (AK
for each j ∈ {1, 2, . . . , J} and each n ∈ {1, 2, . . . , N}.
The following theorem restates Definition 10.5 from the point of view of the collection A. This transforms (10.10) into three mutually exclusive inequalities.
0≤AK
αj≤A1(n)<AK
αJ≤A1(n). (10.13)
The next lemma proves that the inequality A1(n)<αj≤AK
To summarize, interleaving means that for each spike train A(n) in the collection A there is an inter-spike interval in αthat contains all spikes from A(n). Sufficient interleaving extends interleaving by also requiring each inter-spike interval in αto contain all spikes from at least one spike train in the collection A and also requiring at least one train in A to occur after the last spike in α.
Note that Definition 10.9 does not require any spikes in A to occur before a1. That possibility, however, is not explicitly excluded by this definition. Definition 10.11 rules out this possibility.
r=Sort(A1(1), A2(1), . . . , AK
10.8 Examples of Interleaving
This section gives several examples that help clarify the definitions from Section 10.7. In all examples the spike train α is shown in red; the collection of spike trains A is shown in blue.
10.8.1 Interleaving Between Collections of Spike Trains
In all of the previous examples, α was a single spike train. In this subsection α is a collection of two spike trains, i.e., α=(α(1), α(2)). The examples given in
10.9 Interleaving and Projections of Collections of Spike Trains
10.10 Sufficient Conditions for Accurate SUV Decoding
The following theorem states the conditions for accurate decoding of the first spike that apply when S′ consists of only one spike train α.
∃m ∈ {1, 2, . . . , M″} such that α1≤A1(m)<A2(m)< . . . <AK
The following lemma shows that other spike trains in A, i.e., spike trains that don't satisfy Condition 2 of Theorem 10.16, can't prevent the decoding of α1 at the correct time. More specifically, the lemma shows that the decoding constraints associated with these trains become satisfied for some t<α1. This lemma is used in the proofs of Theorem 10.16 and also in the proof of Theorem 10.19.
where u(t)=e−ut and v(t)=e−vt. Also, let Ma,b be the matrix element computed from a and b by the SUV encoding algorithm, i.e.,
f(a1)≤dMa,b(0). (10.20)
The following theorem uses mathematical induction to generalize Theorem 10.16 from α1 to all spikes in α. In other words, if α1 is decoded correctly, then the next stage of the decoding algorithm can be viewed as decoding the first spike in the segment of α that starts with α2. After α2 is decoded correctly, this reasoning can be applied to α3, then to α4, and so forth until all of the spikes in α are decoded.
The following theorem shows that even if a subset of a collection of spike trains A is sufficiently interleaved by α, then the decoding will still be accurate. In other words, there can be some redundancy in A, provided that a subset of A is sufficiently interleaved by α.
In this formulation there could be spikes in A at t=0. These spikes don't affect the decoding, so they can be removed form A and the results proven in Theorem 10.18 and Theorem 10.19 still apply. This follows from the continuity of u(t) at zero. Depending on the shape of u(t), there could be other intervals within [0, α1] on which spikes in A don't interfere with the decoding of α.
The following theorem generalizes Theorem 10.19 to collections of spike trains. More specifically, it states that if a subset of A is sufficiently interleaved by each train in the collection eα, then eα will be correctly decoded by the SUV decoding algorithm.
The following theorem extends the inductive argument used in Theorem 10.19 to a more general class of problems where the collection of sequences dA given to the SUV decoding algorithm can deviate from the collection eA used for encoding. It turns out that accurate decoding is possible even in this case, provided that the spikes in dA don't occur too early with respect to eα. This theorem shows that sufficient interleaving makes the SUV decoding algorithm robust to delaying or even deleting spikes in dA.
m=(m1, m2, . . . , mM″), (10.21)
h″=(h″1, h″2, . . . , h″M″), (10.22)
where
m
p=(u,v){eα★A(p)}(s), (10.23)
h″
p=(v){eA(p)}(s), (10.24)
for each p ∈ {1, 2, . . . , M″}. Finally, suppose that the list of candidate spike times ψ includes the times of all spikes in eα.
L(eα, eA1(p))=max({eα, ∈ eα:eαj≤eA1(p)} ∪ {0}). (10.25)
dα=eα, (10.26)
Theorem 10.21 can be generalized to decoding collections of spike trains. That is, if the two conditions of Theorem 10.21 are satisfied for the projection e{circumflex over (α)} derived from a collection eα, then each spike train in the collection eα will be decoded accurately. The following theorem formally states this generalization.
M
p,q
(u,v){eα(p)★eA(q)}(s), (10.27)
h″
q=e
L(e{circumflex over (α)}, eA1(p))=max({e{circumflex over (α)}j ∈ e{circumflex over (α)}: e{circumflex over (α)}j≤eA1(p)} ∪ {0}). (10.29)
Then, the SUV decoding algorithm accurately decodes eα from the model (M, h″), i.e.,
dα=eα. (10.30)
10.11 Examples of Robust Decoding in the Presence of Noise
This section gives several decoding examples in which dS″≠eS″ but dS′=eS′.
10.12 Summary
This chapter showed that the SUV decoding algorithm can accurately decode certain types of spike trains. More specifically, a sufficient interleaving condition was formulated and it was proven that it implies accurate decoding for certain combinations of the SUV model parameters. It was also shown that the sufficient interleaving condition can be generalized from decoding individual spike trains to decoding collections of spike trains.
Moreover, a theoretical investigation of the case in which the spikes used for the decoding differ from the spikes used for encoding suggests that the sufficient interleaving condition makes the SUV decoding algorithm robust to certain types of perturbations. That is, if the interleaving is sufficient and if the spikes used during decoding are delayed or even deleted with respect to their encoding counterparts, then the SUV decoding algorithm will decode the correct result.
Other extensions of SUV decoding are also possible. For example, instead of listing specific times, ψ can be a probability distribution for spiking within a specific time window.
11 Discrete- and Continuous-Time Theory Using Functionals
This chapter gives a theory from which the properties of single, dual, and exponential SSM matrices can be derived. The theory is built using real functions and functionals.
11.1 Functional Coupling and its Properties
As we will see in Section 11.3, the elements of SSM matrices can be expressed as applications of functionals, which are derived from the input sequences, to the arguments of a bivariate kernel function. The type of the resulting SSM matrix depends on the choice of the kernel function. This framework makes it possible to prove results that apply to single, dual, regular, and exponential SSM matrices by simply changing the kernel function, while the mapping from sequences to functionals remains the same.
Before we can derive these results, however, we need to introduce a higher-order function, Φ, that maps two functionals and a bivariate kernel function to a scalar. This scalar is obtained by applying the functionals to the two arguments of the kernel function. We will use the term functional coupling to refer to the higher-order function Φ and the term coupling value to refer to the scalar. These terms are formally defined below.
Φ(φ, ψ, k)=φ[x](ψ[y]k(x, y)), where φ, 104 ∈ {{}} and k ∈ {2}. (11.1)
The value attained by Φ for a specific triple of its arguments is called a coupling value. The bivariate function k is called a kernel.
In other words, Φ is a function from DΦ to , where DΦ is the Cartesian product of the set of all functionals (repeated twice) and the set of all bivariate functions. More formally,
D
Φ={{}}×{{}}×{2}. (11.2)
The domain of Φ consists of all triples (φ, ψ, k) in DΦ such that the right-hand side of (11.1) is well-defined. More formally,
domain(Φ)={(φ, ψ, k) ∈ DΦ s.t. ψ(k ∘ x,2)=f(x) ∈ domain(φ)}, (11.3)
where x,2 is the adapter function x,2(y)=(x, y) for each y ∈ .
The remainder of this section gives sufficient conditions for certain invariants of a functional coupling. In particular, it states the sufficient conditions for invariance with respect to reflection and the sufficient conditions for invariance with respect to translation.
k(x, y)=k(−y, −x); (11.4))
φ[x](ψ[y]k(x, y))=ψ[y](φ[x]k(x, y)). (11.5)
Then, the value of Φ(φ, ψ, k) is not affected if the functionals are reflected and their positions in the functional coupling are swapped. More formally,
Φ(ψ, φ, k)=Φ(φ, ψ, k). (11.6)
The following proposition states that a translation invariant kernel makes the functional coupling invariant with respect to the translation operator on functionals.
k(x, y)=k(x+u, y+v) for all (x, y) ∈ domain(k). (11.7)
Then, the functional coupling Φ is invariant with respect to the corresponding translation operators on functionals, i.e.,
Φ(uφ, vψ, k)=Φ(φ, ψ, k). (11.8)
11.2 Representing Discrete Sequences Using Functionals
Let S=S1S2 . . . ST be a sequence of length T drawn from the alphabet Γ={c1, c2, . . . , cM}, i.e., Si ∈ Γ for each i ∈ {1, 2, . . . , T}. To simplify the notation we will assume that the characters c1, c2, . . . , cM are sorted in alphabetical order or in some other fixed order. Using this assumption we can refer to the i-th character in the alphabet, ci, simply by using its alphabetical index, which is equal to i. Thus, the character sequence S can also be represented as an integer sequence or as a vector s=(s1, s2, . . . , sT) of length T that consists of the alphabetical indices of all characters in S. More formally,
s=(s1, s2, . . . , sT) ∈ {1, 2, . . . , M}T, such that Sj=cs
Let ω(s, φ) denote a function that maps a vector s to a vector of M functionals such that the i-th functional in the resulting vector is derived from all occurrences of the i-th alphabet character in the sequence S by adding shifted instances of the “template” functional φ. In other words,
ω(s, φ)=(ω(s, φ)1, ω(s, φ)2, . . . , ω(s, φ)M), (11.10)
where each element ω(s, φ)i is a functional that is defined using the following formula:
In the previous expression, δs
Also, in (11.11) it is assumed that zero times a functional evaluates to zero. More formally,
The properties of discrete SSM matrices can be derived from the special case in which φ=δ. In this case, the Dirac's delta is used to represent an instance of each character in the sequence. For this special case, the definition of ω(s, φ)=ω(s, δ) has the following form:
ω(s, δ)=(ω(s, δ)1, ω(s, δ)2, . . . , ω(s, δ)M), (11.13)
where
Note that there are two deltas now: the first one is the Dirac's delta, which is denotes with δ and is a functional that returns the value of its argument function at 0. The second one is the Kronecker's delta, which is denoted with δ and is a function that returns either zero or one, depending on its two arguments that are traditionally placed in the subscript. We will use different fonts to distinguish between these two deltas.
In the vector ω(s, δ), the value of the i-th functional ω(s, δ)i when applied to an argument function f is equal to the sum of the values of f evaluated at specific points that correspond to the indices of the character ci in the sequence S. More formally,
11.3 Expressing SSM Matrices Using Functional Coupling
This section shows that the elements of an SSM matrix can be expressed as functional coupling values using the sequence representation shown in (11.13). Different types of matrices (e.g., regular versus exponential) can be obtained by simply using a different kernel function.
Let S′=S′1S′2 . . . S′T be a sequence of length T drawn from the alphabet Γ′={a1, a2, . . . , aM′} and let S″=S″1S″2 . . . S″T be a sequence of length T drawn from the alphabet Γ″={b1, b2, . . . , bM″}. Let s′ and s″ be two integer vectors that contain the alphabetical indices of the characters in S′ and S″, respectively. These two vectors are defined similarly to (11.9), i.e.,
s′=(s′1, s′2, . . . , s′T) ∈ {1, 2, . . . , M′}T such that S′i=as′
s″=(s″1, s″2, . . . , s″T) ∈ {1, 2, . . . , M″}T such that S″j=bs″
Two vectors of functionals, ω(s′, δ) and ω(s″δ), will be used to represent the two sequences. To shorten the formulas, ω′ will be used as a shorthand notation for ω(s′, δ) and ω″ will be used as a shorthand notation for ω(s″, δ). In other words,
ω′=ω(s′, δ) and ω′i=ω(s′, δ)i for each i ∈ {1, 2, . . . , M′}, (11.18)
ω″=ω(s″, δ) and ω″j=ω(s″, δ)j for each j ∈ {1, 2, . . . , M″}. (11.19)
A similar shorthand notation will be used when there is only one sequence S:
w=w(s, (5) and wj=w(s, (5)1 for each i ∈ {1, 2, . . . , M}. (11.20)
The coupling value Φ(ω′i, ω″j, k) obtained for a pair of these functionals can be expressed in terms of the values of the kernel function on the grid {1, 2, . . . , T}×{1, 2, . . . , T} as follows:
where i ∈ {1, 2, . . . , M′} and j ∈ {1, 2, . . . , M″}.
11.3.1 Regular SSM Matrices
The following proposition shows that each element of a dual SSM matrix is equal to the coupling value of the functionals that correspond to its row and column. For regular matrices (i.e., ones with integer elements) the kernel function selects pairs of indices (p, q) from the two sequences such that p≤q.
D
ij=Φ(ω′i, ω″j, kD) for each i ∈ {1, 2, . . . , M′} and j ∈ {1, 2, . . . , M″}, (11.22)
where
The following corollary is a special case of Proposition 11.4 for “single-band” SSM matrices with integer elements.
X
ij=Φ(ωi, ωj, kD), for each i, j ∈ {1, 2, . . . , M}. (11.24)
11.3.2 Exponential SSM Matrices
This section derives the analogs of (11.22) and (11.24) for exponential SSM matrices. In this case, we will use the kernel kE, which is defined as follows:
The following proposition shows how dual exponential SSM matrices can be expressed using functional coupling.
D
(E)(S′, S″)ij=Φ(ω′i, ω″j, kE), for each i ∈ {1, 2, . . . , M′} and j ∈ {1, 2, . . . , M″}. (11.26)
By swapping S′ and S″ a similar result can be obtained for the other dual matrix D(E)(S″, S′). In other words, the element in row j and column i of that matrix can be represented with the following coupling value:
D
(E)(S″, S′)ji=Φ(ω″j, ω′i, kE), (11.27)
for each j ∈ {1, 2, . . . , M″}, and each i ∈ {1, 2, . . . , M′}.
The following corollary is a special case of Proposition 11.6 for the single-band case.
X
ij
(E)=Φ(ωi, ωj, kE), for i, j ∈ {1, 2, . . . , M}. (11.28)
11.4 Functionals for Reversed Sequences
Let f ∈ {} be a function such that domain(f) ⊆ [1, T]. Also, let g ∈ {→} be a function that is obtained by reversing f on [1, T], i.e., g(x)=f (T+1−x) for each x such that T+1−x ∈ domain(f). The function g can be expressed as follows: g=(T+1 ∘)f. In other words, reversing a function is equivalent to reflecting and translating it appropriately. This section shows that this idea can be extended to functionals that are derived from discrete sequences using (11.10) if the “template” functional is invariant with respect to reflection.
Let S=S1S2 . . . ST be a sequence of length T and let denote the sequence obtained by reversing the sequence S. In other words, j ∈ Γ={c1, c2, . . . , cM} such that
=STST−1 . . . S1, where j=ST+1−j, for each j ∈ {1, 2, . . . , T}. (11.29)
Similarly, let denote a vector obtained by reversing the vector s=(s1, s2, . . . , sT) , which was defined in (11.9). In other words,
=(sT, sT−1, . . . , s1), where j=sT+1−j, for each j ∈ {1, 2, . . . , T}. (11.30)
Given a “template” functional φ, we can use (11.11) to derive the following formula for the functional that represents occurrences of ci in :
Using (11.30) as an index conversion formula, the right-hand side of (11.31) can be rewritten in terms of the elements of s instead of the elements of :
Changing the index variable from p to q=T+1−p changes the previous equation as follows:
Because T+1−q=T+1 ∘−q, it is possible to “pull” T+1 out of the sum, i.e.,
The following proposition gives the formula for w(s, y), in terms of 9−, R., and ω(s, φ)i, which is possible because −q ∘= ∘ q and because is its own inverse.
ω(, φ)i=(T+1 ∘ )ω(s, φ)i, for each i ∈ {1, 2, . . . , M}. (11.35)
Corollary 11.9. Let φ be a functional that is invariant under reflection, i.e., φ=φ. Then,
ω(, φ)i=(T+1 ∘)ω(s, φ)i, for each i ∈ {1, 2, . . . , M}. (11.36)
For the special case of Corollary 11.9 in which φ=δ, equation (11.36) can be rewritten as:
ω(, δ)i=(T+1 ∘)ω(s, δ)i, for each i ∈ {1, 2, . . . , M}. (11.37)
11.5 Expressing ZUV Matrices Using Coupled Functionals
This section shows how to express the elements of a ZUV matrix using coupled functionals. The kernel function k(zuv) that can be used to do this mapping is shown below:
k
(zuv)(x, y)=H(y−x)u−xv−yz−(y−x), (11.38)
where H(y−x) denotes the Heaviside function, i.e.,
The following theorem uses k(zuv) as a kernel function for coupling functionals derived from character sequences to denote the elements of a ZUV matrix.
M
a
,b
(zuv)
=
a
★b
(u,v)(z)=Φ(ω′i,ω″j, k(zuv)),
where a(i)=(a0(i), a1(i), a2(i), . . . , aT−1(i)) is a binary sequence that indicates the occurrences of c′i in S′ and b(j)=(b0(i), b1(i), b2(i), . . . , bT−1(i)) is a binary sequence that indicates the occurrences of c″j in S″. That is,
11.6 Expressing SUV Matrices Using Coupled Functionals
The elements of an SUV matrix can also be expressed using coupled functionals. Because spike times may not be integer, however, this requires deriving both a suitable functional representation of spike trains and stating the kernel function for the SUV model.
The following definition shows how to map spike trains to functionals. This is accomplished by representing each spike in the train using shifted Dirac's deltas and summing over all spikes in the train. The resulting linear functional is a suitable mathematical representation of a spike train for the SUV mapping.
where δ is Dirac's delta.
If ψa is applied to a function f(t), then the result is equal to the sum over the values of f at the times of the spikes in a. More formally,
Let k(suv)(x, y) be the following kernel function:
k
(suv)(x, y)=H(y−x)e−uxe−vye−x(y−x), (11.45)
which is parametrized by the real numbers u, v, and s. In this formula, the term H(y−x) is the Heaviside function. That is,
Using the Heaviside function ensures that k(suv)(x, y)=0 when y<x.
The following theorem shows how to derive the formulas for the matrix element Ma,b(suv) in the SUV model using the coupling operator Φ with functionals that represent the spike trains a and b and the kernel k(suv).
M
a,b
(suv)=a★b(u,v)(s)=Φ(ψa, ψb, k(suv)). (11.48)
11.7 Summary
This chapter introduced a framework for studying and analyzing the properties of SSM matrices. It also introduced a notation that makes it possible to prove the properties of different types of SSM models. Using functionals enables the exploration of models in which the sampling process is imperfect, e.g., models where the sampling may require a certain amount of time to complete and concurrent samples can interfere or overlap with each other. These processes can be modeled using integral operators with narrow gaussian kernels or other narrowly localized kernels. This approach can also work with continuous-time sequences and can be applied to spike trains.
The template functional determines how each item in the sequence is represented. For example, using Dirac's delta as a template functional leads to a spike-based representation. Dirac's delta, however, is not the only possible template functional that can be used in this model. Certain properties of the templates may be used as necessary conditions for specific features of the model. For example, some features of the SSM model are preserved if the template functional is symmetric (i.e., invariant with respect to reflecting its argument function). Moreover, some properties of the SSM model are retained even if different template functionals are used for the two encoded sequences, if these templates commute.
The kernel function determines how each element of the encoded matrix is calculated. For example, using the kernel function k(zuv) with a spike-based representation derived from character sequences leads to ZUV matrices. Similarly, using the kernel function k(suv) with functionals derived from spike trains leads to SUV matrices.
All references, including publications, patent applications, and patents cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein.
The use of the terms “a” and “an” and “the” and similar referents in the context of describing the invention (especially in the context of the following claims) is to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. The terms “comprising,” “having,” “including,” and “containing” are to be construed as open-ended terms (i.e., meaning “including, but not limited to,”) unless otherwise noted. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g., “such as”) provided herein, is intended merely to better illuminate the invention and does not pose a limitation on the scope of the invention unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the invention.
Preferred embodiments of this invention are described herein, including the best mode known to the inventors for carrying out the invention. Variations of those preferred embodiments may become apparent to those of ordinary skill in the art upon reading the foregoing description. The inventors expect skilled artisans to employ such variations as appropriate, and the inventors intend for the invention to be practiced otherwise than as specifically described herein. Accordingly, this invention includes all modifications and equivalents of the subject matter recited in the claims appended hereto as permitted by applicable law. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the invention unless otherwise indicated herein or otherwise clearly contradicted by context.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
©2017, 2018 Alexander Stoytchev and Vladimir Sukhoy. All Rights Reserved.
This patent application claims the benefit of U.S. Provisional Patent Application No. 62/550,223, filed Aug. 25, 2017, the entire teachings and disclosure of which are incorporated herein by reference thereto.
Number | Date | Country | |
---|---|---|---|
62550223 | Aug 2017 | US |