The S-transform (ST) can be regarded as a hybrid of Gabor and continuous wavelet transforms, providing a “Time Frequency Representation” (TFR) of a signal by localizing with a Gaussian window that depends on the frequency. It has advantages over other TFR transforms, and its discrete 1D version has applications in signal processing, such as analysis of climatic, atmospheric and geophysical data, fault detection and diagnostics, exploration seismology, power system disturbance, as well as medical signals like ECG and EEG. In addition, its 2D extension is useful in image processing, such as analysis of medical images for differential diagnosis, also known as virtual biopsy.
Regarding 1D ST, there is a fast frequency-domain formula with complexity O(N2 log N) that finds the entire set of discrete ST, but it is inefficient if only a single ST value (called “local spectrum”) at some selected time-point, or a set of local spectra over a time interval, is needed. Also, the storage complexity is O(N2), which is impractical when N is large.
There have been attempts for faster ST, but the attempts are mainly approximations or simplifications, finding a subset of ST values (usually with frequency being a negative power of 2 like ½, ¼, ⅛, . . . ), and interpolating for the missing ones. The interpolation, by spline or other ways, is time-consuming. Moreover, these attempts may incur large error for a time series whose prominent frequency does not belong to the subset, for example, a sinusoidal signal of frequency 0.3.
Similarly, regarding the 2D ST, in an Nx×Ny monochrome image, each pixel has a totality of NxNy/4 complex ST values. A fast frequency-domain formula takes O(NxNy log NxNy) time to find these values for a pixel. The time taken to find all ST values over a region of interest in a 256×256 medical image is very long, for example. There have been attempts of speeding it up but these attempts are mainly approximations or simplifications.
An image processing device and methods for performing an ST are provided herein. The ST is a TFR that is popularly used in a variety of applications, but prohibitive in both storage and computation for large data size. Provided herein are methods for performing a Fast Time Frequency Transform for 1-dimensional data (i.e., FTFT-1D). For example, provided herein is a fast and accurate method to generate a highly compressed form of the values of ST directly, when N is so large that it is prohibitive to find and store the ST values first. For example, the method encodes the TRF information uniformly and so can then be used for analyzing the TRF correctly and processing the data efficiently and effectively. The compression can help storage, transmission and visualization of ST. Also provided herein is a method to find the values of ST at individual points, called local spectra, instantaneously and accurately. This is useful for real-time monitoring, control, manipulation and filtering. The methods provided herein are memory-efficient, robust and adaptive.
Also provided herein are fast and highly accurate methods to find the 2D ST magnitudes at each pixel in an image, called 2D Fast Time Frequency Transform (i.e., FTFT-2D Pixel). These methods do not strain memory requirements much and are robust for different medical images.
Additionally, provided herein are fast and highly accurate methods to compute the 2D ST magnitudes and their statistics for a region of interest (ROI) in an image or for the entire image, called 2D Fast Time Frequency Transform (i.e., FTFT-2D ROI). With the FTFT-2D Pixel algorithm above, it is possible to find the magnitude of the 2D ST, also called local spectrum, at each pixel in the image. In many applications, a user may not be interested in the spectral content at a single pixel, as it varies from one pixel to the next especially for high frequencies. Instead, the user may be more concerned with the spectral distribution and statistics over an ROI in the image. To obtain the spectral distribution and statistics over an ROI, the local spectrum is individually found for each pixel in the image and then accumulated to obtain the statistics. The FTFT-2D ROI methods discussed herein make this computation more efficient. For example, the matrix multiplication for the pixels in the ROI is made more efficient by first computing a set of left matrices for the x coordinates or a set of right matrices for the y coordinates. Additionally, the ST magnitudes at low frequencies measure the global spectral information and so do not change much, especially where the high frequency content is small.
An example method of generating a compressed form of values of a one-dimensional S-transform (ST) for a time series in an image processing device and generating an approximate form of ST is provided herein (i.e., FTFT-1D). For example, the method can include setting primary parameters, setting a data size N, determining basis values for the data size N and inputting a time series of data size N. The method can also include determining a set of prominent frequency indexes, expanding and accumulating the basis values for pure complex sinusoids (PCS) with frequencies in the set of prominent frequency indexes to form compressed ST values using the primary parameters, decompressing accumulated basis values for a high set and copying the ST values for a low set.
An example method of determining local spectrum at a pixel in an image processing device is provided herein (i.e., FTFT-2D Pixel). The method can include setting parameters, receiving an input image, determining a low band, a medium band and a high band of frequency components and preparing basis values for each of the low band, the medium band and the high band. The method can further include determining a two-dimensional Fourier Transform (FT) of the image as a matrix H, receiving an input coordinate of the pixel and determining S-transform (ST) magnitudes at the input coordinate of the pixel using the matrix H and the basis values.
An example method of determining S-transform (ST) magnitudes and statistics in a region of interest (ROI) in an image processing device is provided herein (FTFT-2D ROI). The method can include setting parameters, receiving an input image, determining a low band, a medium band and a high band of frequency components and preparing basis values for each of the low band, the medium band and the high band. The method can further include determining a two-dimensional Fourier Transform (FT) of the image as a matrix H, receiving an indication of the region on interest (ROI) and determining the S-transform (ST) magnitudes and the statistics in the ROI using the matrix H and the basis values.
It should be understood that the above-described subject matter may also be implemented as a computer-controlled apparatus, a computer process, a computing system, or an article of manufacture, such as a computer-readable storage medium.
Other systems, methods, features and/or advantages will be or may become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features and/or advantages be included within this description and be protected by the accompanying claims.
The components in the drawings are not necessarily to scale relative to each other. Like reference numerals designate corresponding parts throughout the several views.
a)-1(g) are diagrams illustrating the various ST, SC, TT and Offset TT-Transform (OTT) values for a multi-sinusoid example;
a)-(g) are graphs illustrating accuracy (for magnitude and phases) for ST by FTFT-1D methods provided herein as compared to exact ST;
h)-(k) are graphs illustrating accuracy for ST by FTFT-1D methods provided herein as compared to exact ST;
a)-(g) are diagrams illustrating the various ST, TT and OTT values for a random time series or signal h[n];
a)-(e) are graphs illustrating results of ST by FTFT-2D Pixel methods provided herein as compared to exact ST;
a)-(b) are diagrams illustrating an example general-shaped ROI for skipping strategies discussed herein;
a)-(b) are diagrams illustrating another example general-shaped ROI for skipping strategies discussed herein;
a)-(e) are graphs illustrating results of ST by FTFT-2D ROI methods provided herein as compared to exact ST;
a)-(e) are graphs illustrating results of ST by FTFT-2D ROI methods provided herein as compared to exact ST; and
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. Methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present disclosure. As used in the specification, and in the appended claims, the singular forms “a,” “an,” “the” include plural referents unless the context clearly dictates otherwise. The term “comprising” and variations thereof as used herein is used synonymously with the term “including” and variations thereof and are open, non-limiting terms. While implementations will be described for performing an S-transform in the context of performing image processing techniques, it will become evident to those skilled in the art that the implementations are not limited thereto, but are applicable to other types of signal processing, such as analysis of climatic, atmospheric and geophysical data, fault detection and diagnostics, exploration seismology, power system disturbance, as well as medical signals like ECG and EEG.
Provided herein is a fast and accurate method to generate a highly compressed form of ST directly (without the need to find the ST values first). The compression reduces the original size of O(N2) to O(N log N). It encodes the TRF information uniformly and so can then be used for analyzing the TRF correctly and processing the data efficiently and effectively. The compression can help storage and transmission of ST.
Additionally, provided herein is a method to decompress the compressed values to generate the ST values for the entire series for visualization, analysis or process (e.g. filtering). If N is too large for all the ST values to be held in memory, the ST values can be saved in auxiliary storage as they are being generated, or discard an ST value immediately after it has been used.
Alternatively or additionally, provided herein is a method to visualize, analyze or process the local spectrum of ST at a single time. For example, it is possible to find the ST at individual points instantaneously and accurately by generating directly the compressed form of the local spectrum at the desired time only and then decompressing it to obtain the local spectrum there. This is useful for real-time monitoring, control, manipulation and filtering.
FTFT-1D
Definition of Transforms
Fourier Transform
The 1-dimensional discrete Fourier Transform (FT) of a (complex) time series h[n] of size N is a “Frequency Representation” and is shown in (1) below.
where h[n]=h(nI) and H[k]=H(k/NI), I is the sampling interval and n, k are the time and frequency indices respectively. As used herein, [ ] denotes discrete functions of integers, while ( ) denotes continuous functions of real or complex numbers. Additionally, the term “time series” is used herein and is intended to encompass a “signal”.
The 1-dimensional Inverse discrete Fourier Transform (IFT) is shown in (2) below.
where h[n]=h(nI) and H[k]=H(k/NI), I is the sampling interval and n, k are the time and frequency indices respectively. It should be understood that FT and IFT can be computed efficiently by Fast Fourier Transform (FFT), which is especially fast for the radix-2 case where N is a power of 2.
S-Transform
The 1-dimensional Continuous S-Transform of a (complex) function of time h(t) is a joint function of time t and frequency f and is shown in (3) below.
The 1-dimensional Continuous S-Transform can be regarded as a hybrid of Gabor Transform and Wavelet Transform, depending on near neighbours for high frequencies, and far ones for low frequencies.
The 1-dimensional discrete S-Transform (ST) is given by the frequency-domain formula shown in (4) below.
where n and k go from 0 to N−1. This is different in the summation endpoints in the frequency-domain formula, where the summation goes from N−1. Better results can be obtained by summing from −N/2 to N/2−1. It should be understood that the upper equation in (4) is for the non-zero voice, with time complexity of O(N2 log N) because the summation is actually an IFT, which can be found by FFT in O(N log N). Additionally, the lower equation in (4) provides that the zero voice S[n, 0] is simply the mean intensity.
Although FFT is relatively fast, (4) is only useful in finding all the ST for n=0, . . . , N−1. However, even using FFT, (4) is slower than the methods discussed below. Additionally, (4) is inefficient in finding a single local spectrum at some n, or a few a few local spectra. In practice, by Nyquist Theorem, it may only be necessary to find ST for frequency f from 0 to ½, i.e., for k=0, 1, . . . N/2−1. Thus, it takes O(N2) to hold the entire set of ST.
If the time series h[n] is real, then FT satisfies the conjugate symmetry property: H[N−k]=
for k=0, 1, . . . , N−1. As discussed herein, Re( ) denotes the real part and over-line (i.e.,
TT-Transform
The TT-transform (TT) is the IFT of ST. In the continuous case, it is shown in (6) below.
T(t, τ)=∫−∞∞S(t, f)ei2πfτdf (6)
One of ordinary skill would understand that there are many ways to discretize the TT. One example discrete TT, which is called T1 herein, has summation endpoints symmetric about 0 and is shown in (7) below.
where each of n, m is the time index going from 0 to N−1.
Additional discretization strategies for TT-transform are discussed below. For example, T2 differs from the T1 of (7) in the summation endpoints and is shown below in (8).
T2 is preferred as to T1 because its FT will bring back the original ST values as the endpoints are the standard 0 and N−1. Additionally, as discussed below, T2 actually gives more accurate results for ST than the other forms.
T3, which is shown below in (9), is the IFT of SC.
As discussed above, SC satisfies the conjugate symmetry property, hence its IFT, T3[n, m], is real for all m=0, 1, . . . , N−1.
Referring now to
b) illustrates the magnitude of the ST of the signal, with frequency index k along the vertical axis. (White=0, Black=0.5).
As shown in
To eliminate the wrapping and to offset the vertical shift of distance n at time index n, it is possible to use the displaced forms of T2 and T3, called Offset TT-transform (OTT) and denoted by O2 and O3, which are shown in (10) and (11) below.
O
2
[n, m]=T
2
[n, (m+n+N)modN], and (10)
O
3
[n, m]=T
3
[n, (m+n+N)modN], (11)
where m goes from −N/2 to N/2−1. Then these OTT will cluster around the horizontal line m=0, without the wrapping problem, as shown in
Pure Complex Sinusoids
A complex sinusoid of frequency f, magnitude A and phase θ is a periodic function of time t:
u
f,A,θ(t)=Aei(2πft+θ)=A(cos(2πft+θ)+i sin(2πft+θ)).
If A is unity and θ is zero, it is a Pure Complex Sinusoid (PCS) of frequency f. In the discrete case, the PCS of frequency index q (with q=0, 1, . . . , N−1) is defined as a time series in time index n:
for n=0, 1, . . . , N−1.
The formula for the IFT shown in (2) can be written in terms of PCS, which is shown in (12) below.
Fourier Transform of Pure Complex Sinusoid
The FT of a PCS of frequency index q is equal to the Kronecker delta, which is shown by (13) below.
where q and k range from 0 to N−1. (14) below shows the case when q or k take values outside the above range.
where Z is the set of integers.
S-Transform of Pure Complex Sinusoid
The ST of a PCS of frequency index q, written as Sq[n, k], can be computed by the frequency-domain formula shown in (4) and is shown below in (15).
By (13), only the term with κ+k=q, i.e., κ=q−k, is non-zero in the summation. Thus, (15) becomes (16) below.
S
q
[n, k]=e
−2π
(q−k)
/k
e
i2π(q−k)n/N (16)
However, there is a problem with (16) if (4) is strictly sued: κ ranges from −N/2 to N/2−1 in the summation, but in the methods discussed below, q will go from 0 to some number greater than N/2, and k from 0 to N/2−1. So, κ=q−k may not always hold, e.g. when k is N/4 and q is 3N/4. Therefore, the more general (14) can be used to get a more correct form of (16), which is shown below as (17).
The second alternative of (17) sets κ=q−k−N to ensure that κ=q−k lies within the summation range. It is possible to put q as the subject of the if-clause, because q will be varied for a given k in the methods discussed below. But, (17) is inconvenient to use. However, when q≧k+N/2 in the second alternative of (17), the expressions shown in (18) and (19) below result.
(q−k)2/k2>(N/2)2/k2>1 (18)
because k is less than N/2, and
(q−k−N)2/k2=(k+N−q)2/k2>1 (19)
because q<N. Hence, the expression in (17) for the two alternatives are small, and it is possible to use (16) at all times.
ST is linear, in the sense that given a number of time series, the ST of their linear combination is equal to the linear combination of their ST. So, by (12), the ST of the IFT in terms of the PCS is shown in (20) below.
showing that the set of Sq[n, k] with q=0, 1, . . . , N−1 is a basis for any ST.
Properties of S-Transform of Pure Complex Sinusoid
Provided below are properties of the ST of PCS that are useful in the methods discussed below.
Periodic:
For a given k, the magnitude of Sq[n, k], written as f[k], is: e−2π
Gaussian:
For a fixed q, f[k], is bell-shaped, and looks like a Gaussian with SD=q/2π. To prove it, note that f[k] has a single maximum at k=q, and is small when k is far from q. The graph of (q−k)/k against k is a hyperbola cutting the k axis at k=q with slope −1/q. So the straight-line graph of (q−k)/q is tangential to it. This implies that f[k] and e−2π
The multi-sinusoid example discussed with regard to
Symmetry:
For fixed q and m, Sq[n, k] is conjugate-symmetric about n=N/2. This is obvious from the fact that (16) becomes its conjugate when n is replaced by N−n.
Support Interval:
Given a small positive number ε, the “support interval” for ε, defined as {k: f[k]≧ε}, is bounded below by lq and above by uq, where (21) below defines lq and uq.
l
q
=q/(1+α)and uq=q/(1−α) (21)
and α=√{square root over (−log ε/2π2)}. This can be proved easily. For the upper bound to be positive, ε need be greater than e−2π
TT-Transform of Pure Complex Sinusoid
As discussed above, T2 and T3 are provided by (8) and (9). Thus, the respective TT of a PCS of frequency index q are denoted by Tq2[n, m] and Tq3[n, m]. According to (8):
Additionally, according to (16), Tq2[n, m] is given by:
From (10), Oq2[n, m], the OTT of Tq2[n, m], is given by:
(23)
Tq3[n, m] and Oq3[n, m] are similar but using SqC[n, k] instead of Sq[n, k], where SqC[n, k] is derived from Sq[n, k] by (5).
Since ST and IFT are linear and TT is their composition, TT is also linear. Hence, from (12):
showing that the set of Tq2[n, m] with q=0, 1, . . . , N−1 is a basis for T2[n, m]. The same holds for T3[n, m], O2[n, m] and O3[n, m].
Properties of TT-Transform of Pure Complex Sinusoid
Periodic:
For a fixed q, Tq2[n, m] is periodic n and m simultaneously, with period N/q, i.e.,
This can be proved easily from (22).
For Oq2[n, m], the summation in (23) is independent of n, so it is periodic in n only, with period N/q, i.e.,
Tq3[n, m] is also periodic in n and m simultaneously, and Oq3[n, m] periodic in n, both with period N/q. It's because they have the same sinusoidal factors as in Tq2[n, m] and Oq2[n, m]. Note that the period is generally a non-integer as q may not divide into N exactly. As discussed below, it is possible to get around this issue by making use of the periodicity of OTT.
Gaussian:
For fixed q and fixed n, the magnitude of Tq2[n, m] or Tq3[n, m], treated as a function of m, is near-Gaussian in shape, centred at m=n with SD=N/q.
To prove it for Tq2[n, m], examine (22), for example. As explained above, f[k] is near-Gaussian with SD q/2π near k=q. Now, the IFT of a Gaussian is also a Gaussian with SD=N/2π times the reciprocal of that of the original Gaussian. Therefore, the magnitude of Tq2[n, m] is also near-Gaussian with SD N/q. It is centred about n since the IFT is evaluated at m−n.
For Tq3[n, m], and using (9) and (5), it can be written as the sum of two “truncated” terms:
Similarly to (22), Tq3[n, m] is formed by the sum of the IFT of “truncated” near-Gaussian functions, which is also near-Gaussian in shape, centred at m=n, but with some oscillatory characteristics due to the abrupt jump in the truncation, (as the IFT of a rect function is a sin c function.)
Now, each Tq2[n, m] or Tq3[n, m], treated as function of m, is near-Gaussian functions centred at m=n. So, their aggregate, Tq2[n, m] or Tq3[n, m], treated as function of m and n, is concentrated about the diagonal m=n.
For fixed q and n, Oq2[n, m] and Oq3[n, m] are also near-Gaussian functions of m. Hence, when treated as function of m and n, they cluster around the horizontal line m=0 (since the IFT is evaluated at m) with SD=N/q.
The TT and OTT characteristics of PCS's are discussed above with regard to
Symmetry:
For fixed q and m, Oq2[n, m] and Oq3[n, m] are conjugate-symmetric about n=N/2. This is obvious from the fact that (23) becomes its conjugate when n is replaced by N−n.
Properties of TT-Transform in General
From (24), the basis Tq2[n, m] spans all T2[n, m]. Now, each Tq2[n, m] is near-Gaussian centred around the diagonal, tailing off moving away from the diagonal. Therefore, T2[n, m] of a general time series, being their linear combination, should have this property. Moreover the high-frequency components of a time series contribute to the places near the diagonal, and low-frequency components take a role in far away places.
And the same holds for T3[n, m], except that it carries the oscillatory characteristic from each Tq3[n, m]. Likewise, Oq[n, m] and Oq8 n, m] cluster about m=0, as each of Oq2[n, m] or Oq3[n, m] does.
Process Flow
Referring now to
At 202, the primary parameters P are optionally set, if the primary parameters P are to be different from the default values. There can be seven primary parameters, for example, which are discussed in detail below. At 204, the data size N is set. At 206, a determination is made as to whether a Basis File for the values of N and the primary parameters P already exists. If no, at 208, the essential basis values for these values of N and the primary parameters P are prepared and saved into the Basis File for future use. At 210, the essential basis values can be retrieved from the Basis File before proceeding to 214 (inputting a time series). If yes, at 212, the essential basis values can be retrieved from the Basis File before proceeding to 214 (inputting a time series).
At 214, a time series of data size N is input. At 216, the secondary parameter S for this particular time series can optionally be set, if the secondary parameter S is to be different from the default values. At 218, the time series is pre-processed to determine the set F of prominent frequency indexes, using the secondary parameter S. At 220, a determination is made as to whether the application is Use Case I, IA or II. For Use Cases I and IA, at 222, the basis values are expanded and accumulated for those PCS's with frequencies in F, for each time index n to form the compressed ST values, using the primary parameters P. If it is determined at 224 that the application is Use Case IA, then at 226, the compressed values can be decompressed to find the local spectra at each time index n. For Use Case II, at 228, the basis values can be expanded and accumulated for those PCS's with frequencies in F, for the desired time index n, to form the compressed ST values there, using the primary parameters P. At 230, the compressed values can be decompressed to find a single local spectrum at n.
At 232, to find local spectrum at another time index, the operations beginning at 228 are repeated. At 234, if there is another time series of the same data size for the same application to process, repeat the operations beginning at 214. At 236, if there is another application with different data size, then repeat the operations beginning at 202.
The terms used in the above process flow are discussed below. The basis produced in 208 is a collection of some ST and OTT values of the PCS's. For a particular application, sub-operations 240 shown in
To use the basis for an input time series, first expand the subset into a complete basis, and then “accumulate” the basis values to obtain some ST and OTT values of that time series. (Accumulation here means finding the linear combination of the basis values.) Those accumulated ST and OTT values are actually the compressed form of the ST of the time series. For Use Cases I and IA, the compressed ST for all time index n=0, 1, . . . , N−1 are obtained, whereas in Use Case II, only the compressed ST at a single time index n obtained, decompressing it afterwards to obtain the local spectrum there.
There are altogether eight user-adjustable parameters, named by Greek alphabets. They are ε, η, γ, μ, υ, λ, φ, β. The first seven are primary parameters and the last one secondary. They are all independent of the data size, and are usually fixed. Users can at all times simply use the default values suggested herein. Optionally, the users may adjust the parameters to make them work best for a particular application. The basis depends on the data size and the primary parameters, by the assumptions discussed below, a single basis file created in 208 should always work in that application. The secondary parameter is used in the accumulation task in Use Cases I, IA and II, and the users may want to tune it to suit an individual time series.
Preparation of Basis Values
Preparation of basis values is discussed with regard to
Finding Support Intervals for each Pure Complex Sinusoid
At 242, support intervals for each PCS are found. Many of the ST values for a given time series are relatively small in magnitude. By ignoring them, it is possible to speed up the process. For example, use parameter ε as the minimum magnitude of the ST values to keep. Then find the “support interval” for each PCS of frequency index q, i.e. the range of frequency index values for which the magnitude of the ST of that PCS is not less than ε. The lower bound lq and upper bound uq of this support interval can be found by applying (21). For each q, only consider the frequency band whose frequency index lies between lq and uq.
Finding the Range of Pure Complex Sinusoids needed
At 244, a range of PCS's can be found. As users may only be interested in ST values for k=0, 1, . . . , N/2−1, only use those PCS's whose support interval overlaps with the interval [0, N/2−1]. Let's define qmax as the largest values of q for which the lower bound lq is below N/2. Then confine to those PCS's with q=0, 1, . . . , qmax; the other q's do not contribute significantly to the results. qmax is roughly proportional to N because lq is proportional to q. In general, qmax is somewhat greater than N/2.
Identifying the Low Set
At 246, the low set of PCS's can be identified. Use the fact that the ST of a PCS with frequency index q is concentrated around k=q, having smaller extent for smaller q. Hence it is economic to just copy the ST values of those PCS's with small q into the basis. This set of such PCS's is called the “Low Set”.
It is possible to specify some threshold copy frequency index a such that if the support interval of a PCS of some q contains a (which is true when lq<a), then put that PCS into the Low Set for copying purposes.
It is also possible to minimize the sum of the extents of the Low Set and the High Set (defined below); this would give the most TRF information for a given basis size. Now, the former is a and the latter is roughly proportional to N/a, (by Section 5.5 as the smallest q in the High Set is close to a). By calculus, their sum is minimal when a is a multiple of √{square root over (N)}. It has been confirmed from experiments that the best value of a to be used is somewhat proportional to √{square root over (N)}, so declare a data-size-independent parameter η so that a=η√{square root over (N)}.
Identifying the High Set
At 246, the high set of PCS's can be identified. It is possible to use OTT instead of TT for the High Set, because OTT does not have the wrapping effect and has a nicer clustering property: Dq2[n, m] (and also Dq3[n, m]) is concentrated around the horizontal line m=0, for given q and n. Contrary to the characteristics of ST discussed herein, the OTT of a PCS with frequency index q has smaller extent for larger q. Hence it is economic to use the OTT values of those PCS's with large q into the basis. The set of such PCS's is referred to as the “High Set”.
Likewise, it is possible to put those PCS's of some q whose support interval contains a (which is true when uq>a) into the High Set for using their OTT.
It is clear that the High Set and Low Set are overlapping; there are many instances of q that are contained in both sets. This is intentional: the cropping of OTT described in the next section may cause some errors or ringing artifacts in the decompressed ST at low frequencies. Fortunately, the copied ST values at those places will replace the incorrect ST there.
Determining Crop Limits for each Pure Complex Sinusoid in the High Set
At 248, the crop limits for each PCS in the high set can be determined. Because of the near-Gaussian and clustering property of OTT, it is possible to crop off those OTT with large values of |m| without much degradation in accuracy.
Let cq be the crop limit for the PCS of some q, i.e. only those m satisfying |m|≦cq are considerably large for use. Due to the Gaussian property discussed above for OTT, it is known that cq is inversely proportional to q. Let γ be a parameter such that cq=γ N/q for all q. (If cq exceeds N/2, then let it be N/2.)
For the given parameter γ, it is possible to determine the value of cq for each q≦qmax. The maximum of all these cq is referred to as “maximum crop limit”, denoted by cmax. It is used to identify the basis nodes, which is discussed below.
But when q is small, cq will become large. This in turn will cause the maximum crop limit cmax exceedingly large, and the basis nodes given by (30) too far apart, hence taxing on the accuracy. It is possible to employ a strategy to limit the size of cq: Let q* be such that the lower bound of its support interval lq* is equal to the copy frequency index a. For all q<q*, set cq to be cq*. In this way, cmax will take a reasonable value of cq*. This choice of q* is quite arbitrary, but proves to give good results.
Identifying Basis Nodes for each Pure Complex Sinusoid in the High Set
At 252, a basis node for each PCS in the high set can be identified. Even though m is restricted to be within [−cq, cq], there will still be a lot of values of Oq2[n, m] (or Oq3[n, m]) for each q and n, since cq is quite a large number for large N. It is possible to subsample them to keep the basis size low. As the OTT values are concentrated about the line m=0, with large values near it and tailing off into small values moving away from the line, there should be more subsampled points near the line. Therefore, a plausible subsampling method for m would be a geometric progression (GP).
The subsampled m's are referred to as the “basis nodes”. As discussed below, the OTT for all PCS's will be accumulated to form their linear combination, so ensure that the basis nodes are common to all PCS's. At the same time, it is desirable to satisfy our UIE requirement for compression: the number of encoded values should the same for all PCS's. Obviously, it is not possible to have both. Therefore, it is possible to make the order of the number bq of basis nodes, O(bq), independent of q. It can easily be proved that this holds if and only if bq is a linear function of log q. It is possible to reason that using a GP for the common basis nodes can make bq proportional to log q.
Let b be the maximum of bq for all q≦qmax, so it is the number of common basis nodes for all PCS's. Then the set B of common basis nodes will be given by:
B={m: m=0 or |m|=round(rj), j=0, 1, . . . , b−1}, (30)
B
q
={m ε B: |m|≦c
q}. (31)
Thus, only the OTT values of that PCS for those m in Bq need to be found.
How to set the value of b is discussed below. It has been discovered that b is proportional to log N. This may also be argued from bq being proportional to log q and qmax being proportional to N. So, define a parameter μ such that b=μ log N.
If b is chosen too small, r will be close to 1, say 1.2. Then round(rj) will be 1 for j=0, 1, 2. To avoid repetition in set B, allow the first few numbers in B to take consecutive integer values 1, 2, 3, . . . , etc. until j is so large that the repetition does not occur.
Subsampling along the Time Axis
At 254, subsample along the time axis. Here, by time axis, the axis for time index n is meant, not the m in TT and OTT, (which happens to be also in time unit). As discussed below, there are a lot of redundancies in ST and OTT. Subsampling will select the essential subsets to be put in the basis file. There are three ways to reduce the basis size:
Time Interval
A PCS of frequency index q consists of waves with wavelength N/q. Hence its ST, TT and OTT all have time resolution that varies with q, being smaller for lower frequencies. Therefore, it is not necessary to compute low-frequency ST and OTT for every time index n. It is possible to skip by some time interval iq, taking every iq th element along the time axis.
For a PCS of some q, the subsampling time interval width iq is inversely proportional to q. So use parameter μ, defined as the number of intervals for the PCS with q=√{square root over (N)}. √{square root over (N)} is chosen as it is somewhat the geometric centre of the frequency domain; it has been used in as discussed below to define the bounds of Low and High Sets. That is, make i√{square root over (N)} equal
where [ ] means integral part. By inverse proportion, the time interval width iq for PCS of frequency index q is given by:
A large value for υ is generally chosen. Then, the expression for iq would become 0 for most q except in the very low frequency band. When iq is 0, take it to be 1, meaning that simply find the values at all n without skipping.
Some of the known ST approximation techniques also exploited the characteristics of the small frequency resolutions of ST at high frequencies. In the methods discussed herein, a less aggressive approach is adopted by setting a fairly large υ, so the time interval is much shorter than in the know ST approximations, guaranteeing more accurate results.
Some moderate interval subsampling is justified, especially for those applications where the time series is analyzed by averaging ST over some time interval.
Symmetry
By symmetry property of both ST and OTT discussed above, it is possible to compute only the ST and OTT values for n≦N/2, the other half being obtained by conjugate symmetry.
Periodicity
Although both ST and OTT of a PCS exhibit periodicity, as discussed above, it is possible to only make use of the latter for the High Set, as the period of the former is too large to be useful for the Low Set. (q and k are small there, and so is q−k.)
By (26), OTT is periodic in n with period N/q for some q in the High Set, so in theory it is possible to only need to find the OTT values for n≦N/q, copying them for the ensuing range of n. But N/q is not an integer unless q is a factor of N. For most cases of q, the sequence of OTT never repeats itself after some discrete time interval.
To make use of this non-integral period, the OTT not just for the first period with n=0, 1, . . . , [N/q], but for a longer “initial interval” n=0, 1, . . . , jq. jq is called the “initial interval limit” needs to be computed. Then the OTT values in the ensuing periods can be copied from the OTT values in the initial interval. By the symmetry property discussed below, it is possible to extrapolate the initial values up to N/2.
The “copying” operation is not simple. Suppose that it is necessary to find the OTT value at some time index n between jq+1 and N/2. The smallest time index v0 within the initial interval can be determined such that n and v0 differ by an integral multiple of the period. Specifically,
Then, all real numbers v=v0, v0+N/q, v0+2N/q, . . . , can be scanned over until jq to see which v is closest to an integer, i.e. the rounding error |v-round(v)| is smallest. Choose the OTT at that round(v) as the OTT for n. This minimum-rounding-error will be denoted by eq,n, and that round(v) denoted by vq,n.
Although the above process is time-consuming, it is not an issue because basis values are prepared only once. It may also be possible to speed up finding vq,n in the preparation stage by first running the process for the first n=jg+1. Next, noting that when n is incremented by 1, the scanning set is shifted by 1 so it is very likely that vq,n+1=vq,n+1, (unless the new v0 introduced into the set has a rounding error less than those old members of the set, but this may beignoree as the improvement should be small). Then, continue adding 1 to vq,n until it reaches jq; then run the above process for the next n.
jq is now discussed. It should be large enough to reduce error, but not too large to undermine the use of the periodicity. The following sophisticated algorithm attempts to find the best jq for each q in the High Set, based on some parameter λ>1:
The optimization in step (a) above ensures that the initial interval limit chosen can produce the smallest rms error in the copied OTT values. It has been found from experiments that (a) is unnecessary when q≧2√{square root over (N)}.
If q is large, the time-consuming step (a) is not of much use as the range of j is short, so simply use the ceiling of λ×period as the initial interval limit, unless q happens to share a common factor with N. λ can therefore be understood as the maximum number of periods in the initial interval.
To explain step (b), note that N/gcd(N, q) is a multiple of the period N/q since gcd(N, q) is a factor of q. Moreover, N/gcd(N, q) is an integer since gcd(N, q) is a factor of N too. So, N/gcd(N, q)+1 can legitimately serve as the initial interval limit. If λ gcd(N, q)>q, then N/gcd(N, q) will be less than λ N/q, so it is better than the one using step (c).
Now, for each PCS of q in the High Set, use (32) and the method there to find the best time index vq,n within the initial interval for copying into each ensuing n.
To avoid doing the above procedure again for every time series in the application, it is possible to save into the basis file the set of jq for all q in the High Set, as well as the set of vq,n for all q in the High Set and all n that lie after the initial interval. They will be retrieved from the basis file along with the basis values. But for very large N, this would make the basis file extremely large. In this case, it is then possible to resort to finding on the fly during the expansion phase discussed below, which should be quite fast employing the speedup idea given above. Note that this is applicable to Use Cases I and IA but not Use Case II.
Note that it is assumed that the time interval iq of discussed above is 1. If it is greater than 1, then the above paragraphs remain valid, if the time series is treated as containing N/iq data with period N/iq q.
Computing Basis Values for each Pure Complex Sinusoid in High and Low Sets
At 258, the basis value for each PCS in the high and low sets can be computed. Equipped with the cropping and subsampling schemes discussed above, it would now be straightforward to form the essential basis for the High Set and Low Set.
For the High Set, there is the option of either using O2 or O3. The former is accurate at very high frequencies close to ½. The latter nearly halves the size of the basis, as O3[n, m] is real for all n and m, so it is not necessary to store their imaginary parts. The times to prepare the basis files, and to extract and accumulate the basis values are all reduced. It is possible to use a Boolean parameter φ for the user to choose between using O2(φ=FALSE) and using O3(φ=TRUE). This choice does not affect the Low Set.
The graphs of O2 and O3 in
For each PCS of q in the High Set, go over all time index n from 0 to N/2 (due to the symmetry) or from 0 to jq whichever is shorter, at time interval iq. For that n, perform the IFT for Oq2[n, m] as in (22) if φ is FALSE or a similar IFT for Oq3[n, m] if φ is TRUE. From the IFT, it is possible to get the O2 or O3 values at all m. But it is only necessary to get those O2 or O3 values at the basis nodes of B identified above, and save them into the basis file.
Next, for each PCS of q in the Low Set, go over all time index n from 0 to N/2, at time interval iq. For that n, use (16) to get the ST values and put them into the basis file.
As discussed above, the PCS of the same q may appear in both the high and low set. This overlap is necessary to ensure completeness and continuity in the result.
The basis is not huge in size and can likely be held in memory. If memory space is an issue, it is possible to compute the basis values for each q and immediately save them into the basis file. Afterwards, the memory can be reused for the next q.
Generating Compressed ST Values
To generate the compressed ST values for all time index n=0, 1, . . . , N−1 and frequency index k=0, 1, . . . , N/2−1, it is not necessary to first compute all the ST values, as explained in above. Instead, retrieve the basis values from the basis file, expand them to cover the entire time range, and then accumulate them to form the compressed values. Before taking these steps, perform pre-processing on the given time series as discussed below.
Pre-Processing to Find Prominent Frequencies
From (20) and (24), the ST and OTT values for the time series is a linear combination of those of the PCS's, weighted by the corresponding “Fourier coefficients” H[q]. So, find these Fourier coefficients by applying FFT on the given time series. Now, note that in most cases, H[q] is significant only for a subset of frequency index q, so just accumulate those “prominent” PCS's with large H[q] into the linear combination.
Next, determine how many of the prominent Fourier coefficients to select. One way is to set a fixed threshold, say 0.01, and ignore all those q whose H[q] falls below the threshold. This approach is good when the time series being processed are all similar in nature and their frequency contents are being compared.
For widely varying time series, use a robust way that is unaffected by noise or outliers such as like the percentile. For example, sort the set of Fourier coefficients {H[q]: 1≦q≦qmax} in ascending order, and its βth percentile Pβ as the threshold. Hence, form the set S of prominent frequency indexes:
S={q: H[q]≧Pβ}. (33)
Consequently, the most prominent (100−β) % of the Fourier coefficients are used. This parameter does not affect the creation of the basis and so is regarded as a secondary parameter. The user may adjust it for an individual time series.
As S depends on the distribution of frequencies in a time series, this method is “adaptive” in the sense that the best way is chosen to compress the time series. This is superior to other techniques that apply a rigid process regardless of the spectral characteristics of the time series.
This strategy has an important advantage besides reducing accumulation time. In many practical applications, like segmentation and classification, it is only necessary to look at the prominent frequencies, as the less prominent ones may be due to noise or minor artifacts. This is discussed in further detail below.
Restricting S for an Application
When only interested in some frequency subset K, then set S can further be made smaller:
S={q: H[q}≧Pβ and [lq, uq]∩ K≠Ø}. (34)
For example, using the midrange frequency band K={k: k1≦k≦k2}, assuming that effects by global trends or by noise are not important, which are reflected in frequencies lower than k1, or in frequencies higher than k2 respectively. Then, S={q: H[q}≧Pβ and [lq≧k1 and uq≦k2}.
Rather than using β as discussed where there is no clue as to which frequencies are ignored, it is more controllable for the user to set the band limits K. Then, S={q: lq≧k1 and uq≦k2}.
Expanding the Basis into Entire Time Range
Example operations 260 for expanding and accumulating basis values for the entire time range are discussed below. At 262, it is possible first to get all the basis values for each PCS in the intersection of the High Set and S. But the values saved in the basis file only covers the initial interval from n=0 to jq, and at time intervals of iq. So, it is then possible to expand them to cover the entire time range for use in the next section.
First, copy the OTT values inside the initial interval onto the ensuing time indexes n, up to N/2. For each w, the position vq,n in the initial interval to copy from is already saved in the basis file. If vq,n is not included in the basis file, they are found on the fly as discussed above.
After extending the initial interval values into the whole range up to N/2+iq, fill in the skipped OTT values if the time interval iq is not 1. It is possible to simply use fast linear interpolation with good accuracy because the time interval is mostly small. (+iq is needed because to find OTT for n around N/2, an interpolation interval that extends beyond N/2 is used.)
Finally, obtain the OTT values for n>N/2 by finding the complex conjugate of the corresponding value at N−n.
The basis values for PCS's in the intersection of the Low Set and S, are expanded in the similar fashion, except that the initial interval copying is not needed. Just linearly interpolate over each time interval.
Accumulating the Basis Values for an Input Time Series
At 264, it is now possible to accumulate the expanded High- and Low-Set basis values only for those frequency indexes in the set S. The linear combination so formed for all the n will together form the compressed ST values of the given time series, and this job is done. When using the O3 option with φ=TRUE, the compressed form has half the size as the normal one.
Again, if memory is an issue, just load the basis values from the file only for those PCS's belonging to the set S, one by one. For each PCS in S, expand and accumulating the basis values into the partial sum of the linear combination. The memory can then be released before loading the basis values of the next PCS in S.
The strategy is taken that in the compressed form, all the compressed data for each time index n exist, so that the TFR at each n individually can be analyzed. The linear interpolation over the time intervals for the Low Set discussed above is done purposefully.
Generating Local Spectrum at Desired Time
Example operations 270 for generating local spectrum at desired time are discussed below.
Pre-Processing to Find Prominent Frequencies
As discussed above, it is possible to perform pre-processing to find prominent frequencies. It needs be done only once for a given time series. It can be reused for as many local spectra as needed. It is also possible to use the band limits discussed above.
Getting the Basis Values for the Specific Time
At 272, find the basis values at time n. The operation is similar to operations discussed above for generating basis values for the entire time series, except only expanding for a single time index n, not for the whole time range. Hence, the steps can be simplified. For instance, if it lies on the initial interval, or if the margins of time intervals, or if n≦N/2, then the extension, interpolation or conjugates, respectively, are not performed. The position vq,n in the initial interval to copy from is already saved in the basis file. If the position is not included in the basis file, find them on the fly. The speedup idea discussed above is of no use here as vq,n for a single n is needed.
Accumulating the Basis Values
At 274, accumulate the basis values, the operations of which are similar to those discussed above with regard to the entire time series, except that it is only performed for a single time index n. For convenience in finding local spectra at several time indexes, place in memory all the basis values for all those PCS's in S.
Decompressing the Basis Value
Example operations 280 for decompressing to find ST at each time n are discussed below. The basis value obtained above for a particular n is the compressed form of ST. To decompress it into the uncompressed ST values, decompress the High Set and Low Set separately.
Decompressing for the High Set
At 282, the basis values for the High Set is the OTT values at selected basis nodes of Bare found as discussed above. It is possible to fill the missing ones between the nodes. The simplest and fastest way is by linear interpolation between adjacent nodes. Again, in the O3 option (φ=TRUE), it is only necessary to do it for the real parts.
This may cause a slight increase in low frequency contents because of the interpolated straight lines, but the error is minimal. The error would be significant when the OTT values are large; luckily they are large only near the diagonal m=n and the nodes are very close there so the needed interpolation is small, if at all. The nodes are far apart when they are away from the diagonal; luckily the OTT values there are small.
Next, perform the inverse of the displacement of (11) on the complete set of OTT to get back the corresponding TT values. Note that in the O3 option, only perform it for the real parts. Finally, perform the FFT on the TT to generate the ST values.
There is an abrupt truncation of OTT at the crop limits discussed above, causing a very small sin c ringing effect in its FFT. Optionally, use a tailing technique like linear extrapolation to avert it, but the benefits are outweighed the overhead of extrapolation.
Referring to
Copying for the Low Set
The ST values formed as discussed above cover the upper frequency band with good values for k greater than the threshold copy frequency index a. The results below a are poorer. At 284, simply cover up the poor results by inserting the basis values for the Low Set into this lower frequency band. In this way, obtain a complete, accurate local spectrum covering the upper (k>a) and lower (k≦a) parts of the frequency range from 0 to N/2−1.
Generating S-Transform Values for Entire Time Series
To generate ST values over the entire time series, i.e. Use Case IA, first obtain the compressed values for the time series as discussed above. Then perform the decompression steps for each time unit n in the time series as discussed above. If N is too large for all the ST values to be held in memory, it is possible to either save them in auxiliary storage as they are being generated, or discard an ST value immediately after it has been used.
Experimental Results
Parameters
Experiments have been performed with different parameter settings to find out how they may affect the speed, compression ratio and accuracy of the methods discussed herein. For example,
ε: (ST parameter discussed above) It determines the size of the support interval for each PCS, and depends on the data size N. It has been discovered that using 0.05 gives good results for all N. Increasing it will make the support intervals smaller, so there are fewer PCS's involved in the computation. As the result, the process runs faster with lower accuracy.
η: (Low-high margin parameter discussed above) Increasing it will move the threshold copy frequency index a downwards, giving exact values for the ST just below it. In
γ: (Cropping parameter discussed above) This parameter acts as a tradeoff between speed and accuracy. By lowering it, the ST values at low frequency band will become poor. There will be some ripples in the ST diagram at low frequencies just above the threshold a, especially for those time series with significant high-pitch components.
μ: (High node parameter discussed above) Though this parameter directly dictates the size of the basis file and of the compressed form, its effect on accuracy is less apparent than that of γ. Nevertheless, high values do improve accuracies especially at the peaks and troughs.
υ: (Time interval parameter discussed above) This mainly affects the basis file size and accuracy. Lowering it decreases the size but increases the time to expand the basis, and introduces artifactual patterns at low frequencies. Using large values for it to improve accuracy is preferred.
λ: (Periodicity parameter discussed above) Larger value of λ will make the initial interval longer and the basis file larger, with diminishing return on accuracy, and without reducing the time taken to expand the basis.
φ: (OTT parameter discussed above) Setting it to TRUE (O3 option) almost halves the basis size and reduces the process time of the FALSE option, at a mild expense of poorer accuracy in ST at high frequencies close to ½. Preferably, always keep this option unless more regular high frequency values are desired.
β: (Prominent FT parameter discussed above) This does not affect the basis formation, and can easily be adjusted by the user. Decreasing it lowers the threshold and puts more Fourier coefficients into play, improving the accuracy and consuming more time. The effects depending very much on the degree of uniformity in the distribution of frequencies in the input time series.
Table 1 below shows the default values of the parameters.
Speed and Size
Tables 2 and 3 below show the times taken to perform the main steps in the process, the size of the basis and of the compressed ST values, for different data sizes N. The time taken to find a single local spectrum at time t fluctuates with t, so it is possible to take an average over all t for the columns “Local Spectrum” and “Compressed Local Spectrum”. Four-byte floats for all real values, and eight bytes for complex values are used.
The time series being used to form this table is the vertical position of an IPHONE strapped onto the back of a person walking on a treadmill at the speed of 4.5 mph, recorded by the accelerometer inside the IPHONE.
In Table 2, the default values suggested in Table 1 for the parameters in all trials are used.
667
1249
It is inevitable that the times expended in each step depend on the input data; simple time series with little variation take shorter times to process, while those spanning a wide frequency band would take longer times. The walking example belongs to the latter, so it takes longer to process than the average case. Similarly, the size of the compressed values depend on the particular example.
Using the same treadmill walking example above, the time series are different due to different data sizes. Moreover, it is hard, if not impossible, to determine an exact complexity for compression time, basis size and compressed size, as they involve various processes conditionally. Nevertheless some numerical analysis of the numbers were performed in the table to estimate the complexities. They are reported in the following paragraphs, with plausible explanations:
From Table 2, it can be seen that the times taken for preparation is approximately O(N2), because there are O(N) PCS's and each PCS takes O(N) to compute its basis values. The retrieval time is negligible, being dependent on file access and transmission rate and the position of the file on the media. The pre-processing is made up of an FFT on the original data and a quick sort on the FFT, each taking O(N log N) time. The quick sort may take O(N2) in the worst case; the unexpectedly long preprocessing times for large N, shown in bold italic in Table 2, are due to a bad case performance in quick sort.
It takes approximately O(N2) to generate compressed values, because it is done by running over a fraction of the N PCS's and performing linear-time expansion and accumulation for each PCS. The time taken to generate the uncompressed ST values is only slightly greater, because the decompression by FFT is very fast.
The time for finding a single compressed local spectrum, which involves expansion and accumulation, is O(N) as the two processes are linear. To generate the local spectrum, decompression by FFT is needed, so the time is somewhere between O(N) and O(N log N).
The size of the Basis File is O(N log N), since it contains O(N) PCS's, each having O(log N) numbers. But if vq,n values discussed above are included into the basis file, it will get close to O(N2) for large N, making the file huge. So when N is large, they should preferably not be included, causing a drop in the trend of file size increase, shown in bold in the table.
For the compressed entire ST values, its size is somewhat greater than O(N log N). There are N time units, each encoded by O(log N) nodes and some low-frequency ST values. This is high compression compared to the original size 8N(N/2−1) bytes of entire ST for n=0, 1, . . . , N−1 and k=0, 1, . . . , N/2−1.
In Table 3 below, the crucial parameters ε, η, γ, μ, φ, β are adjusted for the treadmill example with data size of 4096. The first row uses the default values, while each of the other rows has only the indicated parameter changed, the rest being defaults.
Accuracy
As shown in
The treadmill example discussed above with the window size of 256, running on IMAC at 2.8 GHz is used. The results for those parameters that affects the accuracy considerably: ε, η, γ, μ, φ, β are only compared.
Based on these results, a proposed strategy for improving accuracy depends on the objective: To achieve low-frequency accuracy, try larger η like 1.2, and to get accuracy just below f=0.5, set φ to FALSE. If the time series have complicated spectral decomposition, then choose β=0 to involve every Fourier coefficients; for simple cases, use larger β like 30 to only use the most prominent 70%.
Referring now to
Referring now to
Accuracy Measurement by Inverse S-Transform
By Inverse S-Transform, the original time series from the ST values are obtained. This can be done by virtue of the theorem that the mean of ST voice at a frequency is equal to the FT at that frequency:
so the time series can be recovered by IFT.
a)-(g) show that if Inverse S-Transform is applied to the ST values generated by FTFT-1D methods provided herein, then the time series values obtained are almost identical to the original time series, with very small errors. This again confirms that FTFT-1D is highly accurate.
Conclusion—FTFT-1D
As discussed herein, FTFT-1D methods provide a fast and direct technique to generate the compressed form of ST values reducing the size from O(N2) to O(N log N), which is good for analysis because the time-frequency information is uniformly encoded. It also presents the instantaneous real-time computation of local spectra for real-time visualization and processing. Both are accurate, robust, adaptive, and using little memory, especially good for long time series.
The methods do not strain memory requirements much, and are robust with the same parameter settings that work for all cases. It is adaptive, maximizing its efficacy according to the distribution of frequencies in the data. It employs a number of techniques to increase its speed, including the preparation of some non-redundant basis values, saved in a basis file, that needs be done only once for each particular applications. Unlike the subset approaches, it can determine the ST value for any frequency accurately.
The methods rely on special compression techniques that achieves Uniform Information Encoding (UIE): less important information is compressed more so that the amount of useful information per byte is uniform in the compressed data. The actual ST values are non-uniform, in the sense that they tend to have smaller time resolution at lower frequencies and smaller frequency resolution at high frequencies. Hence, in some applications like feature extraction and segmentation, it might be better to use for analysis the UIE-compressed than the original ST values.
FTFT-2D Pixel
Methods
Transforms
Transforms are defined above with regard to FTFT-1D methods provided herein. It should be understood that a number of these transforms are again provided below with regard to discussion of FTFT-2D methods.
Discrete S-Transform
The discrete ST of a time series h[n] of size N is a TFR given by the frequency-domain formula (36):
where n and k are the time and frequency indexes (k=Nf, where f is the frequency). H[k] is the Fourier Transform (FT) of h[n]. As discussed above, square brackets are used herein to represent discrete functions of integers.
Discrete TT-Transform
The TT-transform (TT) is the Inverse Fourier Transform (IFT) of ST. The TT is shown in (37):
where each of n, m is the time index going from 0 to N−1.
Referring now to
c) illustrates that the TT values are concentrated along the diagonal m=n, attenuating moving away from it. These characteristics are discussed in detail below. There is also the wrapping effect, as manifested near the top left and bottom right corners. This comes from TT[n, m+N]=TT[n, m]. The wrapping imposes some burden and overhead in the implementation of the methods discussed herein. To eliminate the wrapping and to offset the vertical shift of distance n at time index n, the Offset TT-transform (OTT) is introduced. They are defined by:
O[n, m]=T[n, (m+n+N)modN], (38)
where m goes from −N/2 to N/2−1. Then these OTT will cluster around the horizontal line m=0 as shown in
Let B be the frequency band between frequency indexes a and b inclusive, given by the set B={a, a+1, a+2, . . . , b}. It has size N′=b−a+1. Then the Banded TT-Transform (BIT) of a time series with respect to B is obtained by first finding the ST of the time series, forming a subset of ST within B, and translating it by −a so that its frequency index now goes from 0 to N′. Then BTT is the IFT (of size N′) of this new ST:
where m goes from 0 to N′−1.
where m goes from −N′/2 to N′/2−1.
Pure Complex Sinusoid
A discrete Pure Complex Sinusoid (PCS) of size N and frequency index q (with q=0, 1, . . . , N−1) is defined as a complex time series uq[n]=ei2πqn/N, for n=0, 1, . . . , N−1.
The formula for IFT can be written in terms of PCS's:
The FT of uq[n] is obviously equal to: (Z is the set of integers)
S-Transform of Pure Complex Sinusoid
The discrete ST of a PCS of frequency index q, written as Sq[n, k], can be computed by (36):
By (42),
The second alternative sets κ=q−k+N to make sure that κ lies within the summation range.
Let fq[k] be the magnitude of Sq[n, k]. For a fixed q, fq[k] is clearly bell-shaped, resembling a Gaussian with standard deviation (“SD”) SD=q/2π. It can easily be proved that given a small positive number ε, the support interval for ε, defined as {k: fq[k]≧ε}, is bounded below by lq and above by uq, where
l
q
=q/(1+d)and uq=q/(1−d), (45)
with d=√{square root over (−log ε/2π2)}. So, the width of this support interval is proportional to q.
ST is linear, in the sense that given a number of time series, the ST of their linear combination is equal to the linear combination of their ST. The linearity applied to (41) gives:
showing that the set of Sq[n, k] with q=0, 1, . . . , N−1 is a linear basis for any ST.
TT-Transform of Pure Complex Sinusoid
The TT and OTT of a PCS of frequency index q are denoted by Tq[n, m] and Oq[n, m]. By (37) and (44), it can be proven that
T
q
[n, m]=e
i2πqn/N(IFT of fq[k] evaluated at m−n), (47)
and
O
q
[n, m]=e
i2πqn/N(IFT of fq[k] evaluated at m) (48)
For fixed q and fixed n, the magnitude of Tq[n, m], treated as a function of m, is near-Gaussian in shape, centered at m=n with SD=N/q. To prove it, fq[k] is near-Gaussian with SD q/2π near k=q. Now, the IFT of a Gaussian is also a Gaussian with SD=N/2π divided by the SD of the original Gaussian. Therefore, the magnitude of Tq[n, m] is near-Gaussian with SD=N/q, and is centered about m=n since the IFT is evaluated at m−n. It follows that Oq[n, m] is also near-Gaussian functions of m clustering around the horizontal line m=0 with SD=N/q.
Since ST and IFT are linear and TT is their composition, TT is also linear. So by (41):
showing that the set of Tq[n, m] with q=0, 1, . . . , N−1 is a basis for T[n, m]. Now, each Tq[n, m] is near-Gaussian centered around the diagonal, tailing off moving away from the diagonal. Therefore, T[n, m] of a general time series, being their linear combination, should have this property.
Banded TT-Transform of Pure Complex Sinusoid
The BTT of a PCS of frequency index q with frequency band B from a to b is:
It can be proved that:
Here, g[k]=e−2π
The BTT of a general time series is a linear combination of the BTT of PCS's for different q:
As each TB; q[n, m] is near-Gaussian around m=nN′/N, TB[n, m] should have the same property.
Discrete 2D S-Transforms
The Discrete 2D S-Transform (2D ST) of an Nx×Ny date set or image h[x, y] is a space-frequency representation]. It can be expressed as a nesting of two 1D ST:
Here, H[kx, ky] is the 2D FT. So, S[nx, ny, kx, ky]measures the content of the 2D frequency index [kx, ky] at the pixel [nx, ny]. By Nyquist Theorem, a user may only be interested in finding 2D ST for frequencies fx and fy from 0 to ½, i.e. for frequency indexes kx=0, 1, . . . , Nx/2−1, and ky=0, 1, . . . , Ny/2−1.
The above equations work when kx, ky are positive. If kx is positive and kx is zero:
For kx zero and ky positive, S[nx, ny, 0, ky] like (54) is given, with subscripts x, y interchanged. If both kx and ky are zero, the zero voice is simply the overall mean of h:
It can be seen that S[nx, ny, kx, 0] is independent of nx, S[nx, ny, 0, ky] is independent of ny, whereas S[nx, ny, 0, 0] is constant for all nx, ny.
Although FT can be found by Fast FT (FFT), these equations are good only when all the 2D ST for all pixels need to be found: It still takes O(Nx2Ny2 log NxNy) to find them and too much to store all of them. FTFT-2D methods discussed herein avoid using these equations directly.
Local Spectrum and Texture Curve
For a given pixel [nx, ny] in the image, equations (53) to (55) give the complex values of 2D ST. In practice, users may only be interested in their magnitude (modulus), called local spectrum, at the pixel. It is a discrete real function of kx, ky in the 2D k-space, so it consists of NxNy/4 real values. The local spectrum can be plotted in the k-space using some gray scale, as shown in
In some applications with Nx=Ny (=N), it is possible to average the radial components of the magnitudes in polar coordinate system:
Here, the summation is over all the points (kx, ky) in the k-space with distance r from the origin, i.e. within the circular arc of radius r in the quadrant of
2D Pure Complex Sinusoids
A 2D discrete Pure Complex Sinusoid (2D PCS) of x- and y-frequency indexes qx and qy (with qx=0, 1, . . . , Nx−1 and qy=0, 1, . . . , Ny−1) is defined as a 2D Nx×Ny data set:
u
q
, q
[n
x
, n
y
]=e
i2πq
n
/N
e
i2πq
n
/N
, (57)
for nx=0, 1, . . . , Nx−1 and ny=0, 1, . . . , Ny−1. It is clearly a product of 1D PCS in x and 1D PCS in y:
uq
From the formula for 2D IFT:
and from (57):
Finding Local Spectrum at a Pixel
Process Flow
Referring now to
At 602, parameters are set, if the parameters are to be different than defaults. At 604, a monochromic Nx×Ny image is input. At 606, a determination is made as to whether Nx=Ny=power of 2 (i.e., if Nx=Ny=2M for some integer M, then set N=2M). If yes, operations proceed to 610, which is discussed below. If no, at 608, the image is expanded. For example, operations 660 in
At 610, basis is prepared. For example, Low, Medium and High Bands and the corresponding PCS Sets can be identified, and the basis nodes can be determined, and the basis values can be prepared for each band. Then, at 612, the image is pre-processed to find the Fourier Matrix H. At 616, the coordinates (nx, ny) of the pixel whose local spectrum is to be found are input. At 618, the ST magnitudes at (nx, ny) can be found using the basis values and Fourier Matrix H. For example, the ST magnitudes can be found according to sub-operations 680 and 690 are shown in
For example, at 692, Interpolate Fn
Returning to
Some of the above operations are discussed in more detail below. Note that the basis preparation (i.e., step 610), which is complicated and time consuming need be done once for all images of the same size. Also, the preprocessing need be done only once for all pixels in the same images.
Optimized Hanning Window
Preferably, the input image is square so that it is possible to form its Texture Curve, which is discussed below. Moreover, if it is not square, basis values can be prepared twice, for Nx and for Ny. Preferably, the image size is a power of 2, because FFT, which is used throughout the algorithm, is efficient for powers of 2.
If these two preferences are not both satisfied, then it is possible to form a blank square image of constant intensity ho and size N (step 608) and place the input image centrally in it. To eliminate the ringing artifacts caused by the sudden change in intensity across the edges, the intensities can be modified near those edges by an optimized form of Hanning Window.
First use a parameter α between 0 and 1 to specify the extent of windowing. The new intensity at each pixel [nx, ny] is computed as:
h*[n
x
, n
y
]=h
o+(h[nx, ny]−ho)wx[nx]wy[ny], (61)
where wx[nx] and wy[ny] are Hanning Window along x and y directions:
(Similarly for wy[ny].) So, if α=0.5, only half of the input image along each direction (i.e. a quarter on each side) will be modified. The condition Nx<N is because windowing along x is not needed if Nx=N.
The windowing is optimized by choosing ho to minimize the squared difference between the original input image and the modified one, i.e.
By partial differentiation with respect to ho, obtain:
By this step, it can be assumed that Nx=Ny=N, and that N is a power of 2.
Determining Basis Nodes and Preparing Basis Values
Example operations 630 for determining basis nodes and preparing basis values are shown in
The 1D frequency space will be divided into Low, Medium and High Bands: L, M, H. These bands may be disjoint or overlapping, but their union must cover the range of frequency index k from 0 to N/2−1. A slight overlap may help cover the poor values at the end of one band by the better values at the end of the adjacent one. Preferably, make Low and Medium Bands overlapping, and Medium and High ones disjoint.
Corresponding to each band, we can find the PCS Sets QL, QM, QH of those values of q whose PCS contributes to that band. Three integer parameters μL, μM, μH can be specified, and the numbers mL=μL, mM=2μ
Finding Support Intervals for each Pure Complex Sinusoid
At 632, support intervals for each PCS can be found. Many of the Sq[n, k] values are relatively small in magnitude. By ignoring them, it is possible to speed up the process. Use parameter ε as the minimum magnitude to keep. For each q, only consider those frequency indexes lying between lq and ug given by (45).
Finding Useful Pure Complex Sinusoids
At 634, useful PCSs are found. As users are interested in ST values for k=0, 1, . . . , N/2−1, it is only necessary to have to use those PCS's whose support interval overlaps with the interval [0, N/2−1]. Define gmax as the largest values of q for which the lower bound lq is below N/2. Then confine to those PCS's with q=0, 1, . . . , qmax; the other q's do not contribute significantly to the results. The number of useful PCS's is denoted as P=qmax+1.
Low Band
At 636, the low band is identified, as well as its set of PCSs. First, identify the range of frequency index k from 0 to υL as the Low Band. Its PCS Set is the set of q such that the support interval of q intersects with the Low Band:
QL={q: [lq, uq]∩[0, υL]≠Ø}={q: lq≦υL}. (64)
At 638, determine basis nodes and prepare basis nodes for the Low Band. For the Low Band, simply use the discrete 1D ST values because as discussed above, ST has narrow support interval for small q (with SD=q/2π), and hence only a small number of basis values are needed there. If the node count mL is equal to υL+1, then simply put the jth node at frequency index kj=j. Otherwise, subsample the ST values along the frequency axis. It has been found that the parabolic subsampling works better: the jth node is at frequency index kj=c j2+j, where c is such that the last node is at frequency index υL.
In both cases, let K={kj: j=0, 1, . . . , mL−1}. For each q in QL, it is only necessary to find the 1D ST values for those kj in K lying between lqand uq.
Medium Band
At 640, identify the medium band, as well as its set of PSCs. The Medium Band has frequencies just above the Low Band. Set ρM and υM as the lower and upper limits of the Medium Band. Its PCS Set is:
QM={q: [lq, uq]∩[ρM, υM]≠Ø}={q: lq≦υM and uq>ρM} (65)
For this Band, use the OBTT, OM; q[n, m]. As discussed above, let a=ρM, b=υM and N′=υM−ρM+1. For FFT, its range N′ is better a power of 2.
At 642, find crop limits for the Medium Band. Because of the near-Gaussian and clustering properties of OBTT, it is possible to crop off those OBTT with large values of |m|. Let cq be the crop limit for the PCS of some q, i.e. only those m satisfying |m|≦cq are considerably large for use. As discussed above, cq is inversely proportional to q. Let γM be the cropping parameter such that cq=γM N′/q for all q. (If cq exceeds N′/2, then we let it be N′/2.)
For the given parameter γM, it is possible to determine the value of cq for each q≦qmax. The maximum of all these can be called cq as “maximum crop limit”, denoted by cmax.
At 644, determine basis nodes for the Medium Band. Even though m is restricted to be within [−cq, cq], there may still be many values of OM; q[n, m] for each q and n. Sometimes it is desirable to subsample them to keep the basis size low. As the OBTT values are concentrated about the line m=0, there should be more subsampled points near the line. Therefore, a plausible subsampling method for m would be a geometric progression (GP).
The node count for the Medium Band, mM, is common to all PCS's. So the common set C of the mM values of m for the nodes is given by:
C={m: m=0 or |m|=round(rj), j=0, 1, . . . , mM−1}, (66)
where round( ) means the nearest integer of a real number. The radix r is such that rm
Then for a PCS of some q≦qmax, determine a subset Cq of C given by:
C
q
={m εC: |m|≦c
q}. (67)
Then, finding only the OBTT values of that PCS for those m in Cq is needed.
If mM is chosen too small, r will be close to 1, say 1.2. Then round(rj) will be 1 for j=0, 1, 2. To avoid repetition in set C, allow the first few numbers in C to take consecutive integer values 1, 2, 3, . . . , etc. until j is so large that the repetition does not occur.
Finally, at 646, the basis values for the Medium Band are OM; q[n, m] over all nodes m in C.
High Band
At 648, identify the High Band, as well as its set of PCSs. For the High Band, compute the OTT, Oq[n, m], because as discussed above, TT has narrow support interval for large q (with SD=N/q), and hence only a small number of basis values are needed there. For example, it is possible to only set its lower limit ρH, as its upper limit must be N/2−1. Usually, let ρH=υM+1. Its PCS Set is:
Next, and similarly as discussed above for the Medium Band, in 650-654, find crop limits for the High Band, determine basis nodes for the High Band and compute basis values for the High Band. Cropping and subsampling for the High Band can be performed similarly as for the Medium Band discussed above. The only differences are that OTT is used instead of OBTT, as the IFT for the entire frequency range is being found, and that γH and mH are used, which play the similar roles as γM and mM. Then, at 656, form Basis Matrix Bn
Preprocessing to Find Fourier Matrix for the Image
Example operations 670 for finding a Fourier Matrix for the image shown in
The complex elements H[qx, qy] in H measures the content of 2D frequency (qx, qy) in the image. At 674, to speed up the matrix multiplication, specify a threshold β so that only those prominent H[qx, qy] whose magnitude greater than or equal to β will take part in the multiplication. In other words, set small entries in H to zero. This parameter does not affect the creation of the basis.
Generating Compressed Local Spectrum Values for a Pixel
Compressed values can be computed by matrix multiplications as shown below.
Given a time series h[n], define a 1D Hybrid Transform (1D HT) Fh[n, m] by concatenating the basis values for the Low, Medium and High Bands in tandem along m:
where n=0, 1, . . . , N−1 and m=0, 1, . . . , M−1. The m in (69) is not the same as the m discussed above.
In particular, if h[n] is uq[n], a PCS of some q, then Fh[n, m] becomes Fu
Fh[n, m] is intrinsically linear in h: if h1, h2, . . . are time series and if a1, a2, . . . are constants, then:
F
a
h
+a
h
+ . . .
[n, m]=a
1
F
h
[n, m]+a
2
F
h
[n, m]+. . . (70)
This is because its constituents ST, OBTT and OTT are linear, and the cropping and subsampling operations are also linear.
Now, operations are performed to go from 1 dimension to 2 dimension: Given Nx×Ny data set or image h[x, y], the 2D HT of h can be defined by nesting:
F
h
[n
x
, n
y
, m
x
, m
y]=1D HT along x of (1D HT along y of h), (71)
where nx is fixed during 1D HT along y, and ny and my are fixed during 1D HT along x.
In particular, if h[nx, ny] is a 2D PCS, uq
Fu
This is obvious because by (58), uq
The linearity (70) for 1D HT can be extended into 2D HT since nesting preserves linearity:
F
a
h
−a
h
+ . . .
[n
x
, n
y
, m
x
, m
y
]=a
1
F
h
[n
x
, n
y, mx, my]+a2Fh
Now, by (60), h is a linear combination of uq
Using the B notation introduced above, (74) can be rewritten as:
Let Fn
Fn
All matrices are complex-valued. From (69), the mth row in the matrix corresponds to only one band (Low, Medium or High Band), and only several PCS's (of frequencies q) are in the corresponding PCS Sets QL, QM, QH. So, there are only a few non-zero elements in each row, making the Basis Matrix sparse. Moreover, those elements are row-contiguous in the sense that they cluster together in a row.
(76) is done as two matrix multiplications as discussed below. It is possible to make use of the row-contiguous sparsity of Bn
Forming Intermediate Matrix
First find the Intermediate Matrix Ln
Initialize all elements of Ln
For each qx from 0 to qmax
The (mx, qy)th element of Ln
Forming Compressed Matrix
Then multiply Ln
Initialize all elements of Fn
Referring now to
Generating Local Spectrum for an Image
As discussed above, the compressed form Fn
Interpolation to Form Semi-Compressed Local Spectrum Values
It is possible to undo the subsampling in the nesting manner: First undo along x direction, and then undo the result along the y direction.
The subsampling is undone by filling in the missing points by simple linear interpolation, assuming that the variation is linear along the gap. Optionally, it is possible to use more sophisticated methods, like sin c interpolation to give slightly better results, but the benefits are outweighed by the extra processing time.
The result is an I×I Semi-compressed Matrix Gn
For Medium and High Bands, interpolate separately the real and imaginary parts of OBTT or OTT. For the Low Band, only interpolate the magnitude of ST, because the real and imaginary parts oscillate along the frequency axis. If μL is equal to υL+1, the interpolation is not needed.
Decompressing to Form Local Spectrum
Example operations 611 for decompressing to form local spectrum are shown in
To do this, note that the interpolated values for Medium and High Bands are the OBIT and OTT values respectively, with some cropping. First undo the offsetting to get back the BTT or TT values, which are the IFT of the ST or a band limited ST. So, to get back the ST values, perform 2D FT for the Medium and High Bands at 613. Since (71) is formed from the semi-compressed values by nesting the OBIT or OTT operation, perform FFT in the nesting manner: First apply FFT along x direction, and then apply FFT to the result along the y direction. For the Low Band, at 615, simply copy the magnitudes obtained in the above section.
Results
There are 11 parameters. The default parameter setting that provides good combination of accuracy and speed for image sizes N=256 and N=512 (which are most common in medical images) are shown in Table 4. The values of ε, γM, μM, γH, μH and β are independent of N, while those of υL, μL, ρM, υM and ρH are proportional to N. They work well for all medical images we tested.
Table 5 shows the times in seconds taken in different processes for image sizes 256 and 512:
The preparation of basis takes longer time but it is only done once for images of the same N. The preprocessing of an image is instantaneous. The time to find the local spectrum at a pixel by FTFT-2D is much shorter than that by the original frequency-domain formula.
Going from N=256 to N=512, the local spectrum computation time by the formula increases 8-fold, but that by FTFT-2D only increases 3-fold. This shows the efficiency of the algorithm.
Referring now to
Conclusions—FTFT-2D Pixel
As discussed above, a fast and accurate technique, called FTFT-2D Pixel, to generate 2D ST magnitudes at each pixel in an image is presented. It facilitates medical image analysis especially for virtual biopsy.
FTFT-2D ROI
In addition to the FTFT-2D Pixel methods discussed above, it is possible to compute the 2D ST magnitudes and their statistics for a region of interest (ROI) in an image or for the entire image (i.e., FTFT-2D ROI). In many applications, a user may not be interested in the spectral content at a single pixel, as it varies from one pixel to the next especially for high frequencies. Instead, the user may be more concerned with the spectral distribution and statistics over an ROI in the image. To obtain the spectral distribution and statistics over an ROI, the local spectrum is individually found for each pixel in the image and then accumulated to obtain the statistics.
As discussed herein, the ROI may be one of the following types. The ROI can be a discrete set of points. For example, a pathology is manifested as isolated pixels with abnormal texture scattered over a medical image. Additionally, the ROI can be a curve along the fracture in a bone or a boundary of an anatomical structure. The ROI can also be a standard shape. For example, the ROI can be an area of standard shape like rectangle or circle. As discussed below, matrices are first computed for the bounding a rectangle of the ROI. So, it will be most efficient if the ROI is itself rectangular. The ROI can also be an irregular shape. The most common form of ROI is a general irregular shape, such as a tumor seen in a medical image. It is possible to find the statistics for the interior of tumor, or the complete tumor including the border. Additionally, the ROI can have holes, such as the penumbra of a lesion or the thick border of a tumor. In addition, the entire image can be treated as a particular case of ROI, but in this cases, its statistics are seldom found as it may include the untextured region outside the human body. The ROI can be a union of disjoint regions, e.g. for diseases where the abnormalities appear in different parts of the body.
In addition, the ST statistical results for an ROI may include mean, standard deviation (SD), minimum and maximum of ST magnitudes evaluated over the pixels in the ROI, etc.
Also, for each radius in the frequency space, it is possible to determine the radial components of the magnitudes at each pixel in the ROI, and hence obtain the statistics (mean, SD, minimum and maximum of ST magnitudes, etc.) of the radial component for each radius. The graph of mean radial components against the frequency is called the mean texture curve of the ROI. It should be understood that radial components are discussed above. For example, it is possible to find the distribution of ST along a line that runs from the interior to the exterior of a pathology to see how the texture changes as it crosses the border.
Process Flow
Referring now to
At 902, parameters are set, if the parameters are to be different than defaults. At 904, a monochromic Nx×Ny image is input. At 906, a determination is made as to whether Nx=Ny=power of 2 (i.e., if Nx=Ny=2M for some integer M, then set N=2M). If yes, operations proceed to 910, which is discussed below. If no, at 908, the image is expanded. As discussed above, example operations 660 in
At 910, basis is prepared. For example, Low, Medium and High Bands and the corresponding PCS Sets can be identified, and the basis nodes can be determined, and the basis values can be prepared for each band. As discussed above, example operations 630 in
Referring now to
Alternatively, at 9502, if the x-length projection of ROI is less than its y-length, form intermediate matrix product H Bi
Then, at 960, find ST magnitudes at pixel (ix, iy) from the Compressed Matrix Ci
Returning to
Determining Bounding Rectangle
Let R be the set of pixels in an ROI, and let:
a
x=min{x: (x, y)ε R}, ay=min{y: (x, y)ε R}; (77)
b
x=max{x: (x, y)ε R}, by=max{y: (x, y)ε R}. (78)
Then, the minimal bounding rectangle of an ROI is defined as the set of pixels: B=[ax, bx]×[ay, by], where [a, b] represents the set of integers lying between a and b inclusive.
As discussed below, it is possible to form 2×2 squares at intervals of 2 along x and y directions if Skipping Strategy 1 or 2 are used. It is also possible to form 4×4 squares at intervals of 4 if Skipping Strategy 3 is used. The squares must start from the bottom left corner of the bounding rectangle, (or top left corner depending on the convention).
It may be useful to maximize the number of squares contained wholly in the ROI, by shifting the grid of squares. This can provided a greater chance of having more frequency-homogeneous squares. The border of the ROI usually have higher frequencies than the interior since the texture changes across the border. The maximization may prevent mixing the border and interior pixels in the same square, which is shown in
To this end, define the set T0
T
o
, o
={(ix, iy): <ax−ox+cix, ay−oy+ciy> ⊂ R, ix=0, 1, 2, . . . , iy=0, 1, 2, . . . }, (79)
where <x, y> stands for the square of size c with (x, y) at the bottom-left corner. Here, c is 2 for Strategies 1 and 2, and is 4 for Strategy 3. Then choose the values of offsets ox and oy between 0 and c−1 to make To
Hence the bounding rectangle is translated by the negative of the offsets so that it starts at the corner (ax−ox, ay−oy), with squares marching from there.
Finding Local Spectra for all Pixels
To find the ST statistics of an ROI, the local spectrum at each pixel (ix, iy) in the ROI need to be determined. As discussed above with regard to
Fi
(80) can be computed either using the left intermediate matrix product Li
Li
or using the right intermediate matrix product Ri
Ri
Let T1 be the time to perform the first matrix multiplication in (81) or (82) to produce Li
Let prxR={ix:(ix, iy)ε R} and pryR={iy:(ix, iy)ε R} be the x- and y-projections of R. In most cases, prxR=[ax, bx], and pryR=[ay, by]. Let | | denote set cardinality.
If (81) or (82) are executed individually for each pixel in R, the total time taken is |R|(T1+T2). There are faster ways to do that:
With (81), it is possible to find Li
With (82), it is possible to find Ri
The methods above are faster. So, choose (81) or (82) according as |prxR| is less than or greater than |pryR|.
The first and second matrix multiplications for (81) are discussed above. Those for (82) are similar. Finally, as discussed above with regard to FTFT-2D Pixel methods, processes to obtain the local spectra for all the pixels in the ROI are disclosed.
Skipping Low Band or Low and Medium Bands
Since the ST values at Low or Medium Bands have low spatial resolution, it does not change much as we move from a pixel to the neighboring ones. Therefore, it is possible to skip computing them for a pixel if it has been done for an adjacent pixel. Skipping schemes are discussed in detail below.
The first step of getting the intermediate matrix product Li
The second step of finding the final product can be made faster by skipping. Suppose that it is desirable to skip the Low Bands for a particular pixel, only finding the ST values there for Medium and High Bands. Then in the second step of (81), use the following modification:
As compared to the algorithm for forming a compressed matrix discussed above, here it is possible to spare the time of accumulating for mx in the Low Bands. If (82) is used instead of (81), the method is similar. After the compressed matrix Fi
To skip the Low and Medium Bands for a particular pixel, only find the ST values there for the High Band, then the process is similar.
Skipping Strategies
The example strategies focus on the examples of 256×256 images with one Low, one Medium and one High Band, where the ROI is a single region of standard or general shape. In particular, three Skipping Strategies to skip computing some ST values for adjacent pixels are provided.
To implement the skipping, computing and saving the ST values for those Bands at a pixel in memory can be performed, and then retrieving and copying them into the adjacent pixels instead of computing them again. This would cause headache in housekeeping the saved values and in releasing the memory for other data. Alternatively, a “pixel forest” structure is adopted so that where the forest is traversed by depth-first scheme, then the adjacent pixels can use the ST values for those Bands in the most recently traversed pixel. In this way, it is possible able to evade the housekeeping need.
For a neighborhood of pixels with similar ST values, the particular pixels skipped are not important. In the strategies, the pixels with smallest x and y coordinates in the neighborhoods are used as representatives. This would cause some bias in the representation, which should be fine.
Skipping Strategy 1
This strategy only has little skipping so it takes longest time but is most precise. It is suitable for ROI with complex and varying texture, such as the rectangular ROI 1700, which is shown in
Here a forest of quad-trees is built with two levels, with occasional all-Band skipping. In graph theory, a forest means a set of disjoint trees. Here, at the top level, pixels are taken at every other x position and every other y position, i.e. belonging to the set:
{(ax−ox+2ix, ay−oy+2iy)ε R: ix=0, 1, 2, . . . , iy=0, 1, 2, . . . , },
where (ax, ay) were defined in (77) and (ox, oy) are those values that make |To
The first two leaves of each tree are taken, corresponding to a pair of diagonally opposite pixels in the square, and their ST values are computed for all Bands.
Then the “upper-difference” is found, defined as the norm-1 (mean-absolute) difference between the ST values of these two pixels at each (kx, ky) in the upper quadrant [N/4, N/2]×[N/4, N/2] of the 2-dimensional frequency index space. If this upper-difference is less than some threshold δ, then it is contended that the ST values are quite uniform over the square, so it is possible to skip computing All-Band ST values for the other two leaves in that tree. Otherwise, proceed to find the ST values for these leaves.
This kind of skipping will be called “Diagonal Skipping”. To reduce the time of finding the upper-difference, it is optionally possible to take (kx, ky) at intervals of, for example, 6 for kx and for ky. Then only 1/36 of the points in [N/4, N/2]×[N/4, N/2] are used.
The reason for arranging the four children diagonally can be seen in
Skipping Strategy 2
This strategy only finds the Low-Band ST values once for each 2×2 square, and performs occasional skipping for other Bands. So it is faster but slightly less precise than Strategy 1. It is suitable for ROI with fairly complex texture, somewhere between the textures of ROI's in
The forest in the same way as in Strategy 1, except that the diagonal order is not imposed. So, the children can be arranged in any order in the square, like: (ix, iy), (ix, iy+1), (ix+1, iy) and (ix+1, iy+1).
It is possible to traverse each time starting from the root. When a node that corresponds to a pixel inside the ROI is reached that has not been computed before, compute its ST values in one of two ways: if that node is the first one dealt with in that tree, then find its ST values for all Bands; otherwise, only find its Medium- and High-Band ST values, as the Low-Band values are already in memory found for that first one.
Further skipping for the Medium and High Bands can be provided. For example, for the first node (pixel) computed in each tree, obtain the following two quantities:
Again to find these quantities more efficiently, it is possible to optionally take, for example, every 6th values of kx or ky.
Now, if High-X-ST-magnitude is less than some threshold ξ, then the x-components of the high frequency ST values are quite small, implying that there is little change in texture along the x direction. So, it is possible to safely skip computing All-Band ST values for (ix+1, iy) and (ix+1, iy+1).
Similarly if High-Y-ST-magnitude is less than ξ, it is possible to skip computing All-Band ST values for (ix, iy+1) and (ix+1, iy+1). And if both are less than ξ, it is possible to not find ST values for the other pixels in the square.
This type of skipping is referred to as “Spectral Skipping”.
Skipping Strategy 3
This strategy only finds the Low-Band ST values once for each 4×4 square, and Medium-Band ST values once for each 2×2 square. It performs occasional skipping for other Bands. So it is faster but slightly less precise than the above strategies. It is suitable for ROI with fairly simple texture, such as the brain tumor example shown in
Here a forest of quad-trees with three levels is built. At the top level, it is possible to take pixels at every fourth x position and every fourth y position, i.e. the set:
{(ax−ox+4ix, ay−oy+4iy)ε R: ix=0, 1, 2, . . . , iy=0, 1, 2, . . . , }.
Each node or pixel (ix, iy) at the top level has four children in the middle level in any order, such as (ix, iy), (ix, iy+2), (ix+2, iy) and (ix+2, iy+2). Each node (jx, jy) in the middle level has four children in the bottom level: (jx, jy), (jx, jy+1), (jx+1, jy) and (jx+1, jy+1). So, each tree corresponds to a 4×4 square.
Referring now to
It is possible to traverse each time starting from the root by depth-first scheme. When a node that corresponds to a pixel inside the ROI is reached that has not been computed before, compute its ST values in one of three ways: if that node is the first one dealt with in that tree, then find its ST values for all Bands; otherwise if it is in the middle level, only find its Medium- and High-Band ST values, as the Low-Band values are already in memory found for that first one, otherwise it must be in the bottom level. Then only find its High-Band ST values, as the Low- and Medium-Band values are already in memory found for the previous ones. The memory holds the values for most recently visited node; thanks to the depth-first traversal, that node is the one to use as it is in the same square.
Like Skipping Strategy 2 above, it is possible to have Spectral Skipping for the Medium and/or High Bands: For the first node (pixel) computed in each tree, obtain four quantities: High-X-ST-magnitude and High-Y-ST-magnitude defined above, as well as two new ones:
Likewise, it is possible to optionally find these quantities more efficiently. It is possible to optionally take, for example, every 3rd values of kx or ky for [ρM, υM], and every 6th values of kx or ky for [0, N/2].
Now, if High-X-ST-magnitude is less than some threshold ζ, then there is little change in texture along the x direction. So, it is possible to safely skip computing All-Band ST values for every second pixel in the x direction (i.e. pixels c, b, k, j; g, f, o, n in
Similarly for the y direction, and for the case when these quantities in both directions are small.
Automatic Selection of Skipping Strategy
The default Skipping Strategy for medical images can be Skipping Strategy 3, for example, assuming that the ROI for a pathology is quite large, like the interior of a tumor. On some occasions, when a small ROI (e.g. a tiny lesion), or thin (e.g. the ROI is the border around a tumor), the little or no skipping should be performed.
While the users are allowed to manually choose the skipping strategy, the FTFT-2D methods are capable to doing this for them, based on the size and breadth of the ROI. It first computes the number |R| of pixels in the ROI, and its average x- and y-breadths defined by: Lx=|R|/(by−ay) and Ly=|R|/(bx−ax), using the values found in (77) and (78).
If |R| is less than some threshold, for example, 150, or if Lx or Ly is less than some threshold, for example, 5, then, using Skipping Strategy 1 can be suggested. Else, if |R| is less than, for example, 600, or if Lx or Ly is less than, for example, 10, then using Skipping Strategy 2 can be suggested. It should be understood that the above criteria upon which suggestions are based are only provided as examples, and that other values can be used.
Optionally, it is possible to look at x- and y-breadths separately. For instance, an ROI is a thin strip parallel to the y-axis, then it may have fewer levels along the x-direction than along the y-direction. But this case is quite rare and may require more complicated logic.
Weighting of Cells
As discussed above, it may be desirable to determine the ST statistics for a given ROI, primarily the mean and SD of ST magnitudes over all pixels in the ROI. If there is no skipping, it is possible to use the standard formulae to calculate them. These formulae work well even when computing the Low- or Medium-Band ST values are skipped for some nodes in Skipping Strategy 2 or 3, because these values have been computed previously and it is possible to simply take them from memory.
But there is an issue for Diagonal Skipping in Skipping Strategy 1 (where pixels are skipped in a 2×2 square if the first two diagonally opposite pixels are similar in ST) and for Spectral Skipping in Strategies 2 and 3 (where pixels in a square are skipped if the first pixel computed in that square has small high-frequency ST values). In these kinds of skipping, there is no copying from memory and special weights are used to ensure correct statistics.
Let C be the set of “computed pixels”, i.e. pixels in the ROI whose ST values have been found by computing and perhaps retrieving from memory. For each pixel (ix, iy) in C, count the total number n(ix, iy) of pixels (in the pixel's square and in ROI) represented by it. n(ix, iy) is actually the number of skipped pixels plus one.
Then, the mean and variance of ST of the ROI at frequency index (kx, ky) are given by:
The weights in (83) are simply the count n(ix, iy). Note that the denominators in (83) should be the total number of pixels in the ROI.
The count should not simply be used as the weights in (84), as this would pertain to the assumption that the skipped pixels take the same ST values as the representative. This would make the SD smaller than it should be. Actually there is some slight variation over those pixels. From experiments, it has been discovered that using the square-root of count as weight gives a more realistic result, closer to the one obtained without any skipping.
Pseudo-Code of Finding ROI Statistics
The computer implementation if the pixel forest structure with node traversing, skipping, weighting and statistics updating is quite complicated.
Experiments on Regions of Interest
Referring now to
The default values for the diagonal-difference threshold δ in Skipping Strategy 1 is set to be 0.05. The default values for the spectral thresholds ξ and ζ in Strategies 2 and 3 are 0.2 and 0.4 respectively.
In
a) illustrates a 256×256 MRI image of a diseased brain, with an ROI 1600 drawn.
a) illustrates a 256×256 MRI image of a diseased brain, with an ROI 1700 drawn.
In the mean texture curves of
The respective process times are listed in Table 6 below, which also includes the time for computing the whole image, treating it as the ROI.
Better than 99% overall accuracy is obtained, with about 440-fold and 240-fold speedup for the tumor (i.e.,
Skipping Strategies 1 and 2 produce more accurate results but take longer time to run, but it is really hard to see any difference in accuracies among the three strategies. The speed increases from Strategy 1 to 2 and from Strategy 2 to 3 are more remarkable.
It is observed that the SD found by FTFT-2D using the weighting method discussed above agree well with the true values, confirming that the square-root weighting reflects the distribution well. But the maximum ST over all pixels in the ROI is slightly underestimated by the skipping techniques, which is due to having missed/skipped some pixels.
Conclusion—FTFT-2D ROI
As discussed above, a fast and accurate technique to calculate ST statistics of an ROI is provided. The method is also robust, adaptive, and uses little memory, and especially useful for large ROI's.
Numerous other general purpose or special purpose computing system environments or configurations may be used. Examples of well known computing systems, environments, and/or configurations that may be suitable for use include, but are not limited to, personal computers, server computers, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, network personal computers (PCs), minicomputers, mainframe computers, embedded systems, distributed computing environments that include any of the above systems or devices, and the like.
Computer-executable instructions, such as program modules, being executed by a computer may be used. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Distributed computing environments may be used where tasks are performed by remote processing devices that are linked through a communications network or other data transmission medium. In a distributed computing environment, program modules and other data may be located in both local and remote computer storage media including memory storage devices.
With reference to
Computing device 1800 may have additional features/functionality. For example, computing device 1800 may include additional storage (removable and/or non-removable) including, but not limited to, magnetic or optical disks or tape. Such additional storage is illustrated in
Computing device 1800 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by device 1800 and includes both volatile and non-volatile media, removable and non-removable media.
Computer storage media include volatile and non-volatile, and removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Memory 1804, removable storage 1808, and non-removable storage 1810 are all examples of computer storage media. Computer storage media include, but are not limited to, RAM, ROM, electrically erasable program read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 1800. Any such computer storage media may be part of computing device 1800.
Computing device 1800 may contain communications connection(s) 1812 that allow the device to communicate with other devices. Computing device 1800 may also have input device(s) 1814 such as a keyboard, mouse, pen, voice input device, touch input device, etc. Output device(s) 1816 such as a display, speakers, printer, etc. may also be included. All these devices are well known in the art and need not be discussed at length here.
It should be understood that the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination of both. Thus, the methods and apparatus of the presently disclosed subject matter, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium wherein, when the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the presently disclosed subject matter. In the case of program code execution on programmable computers, the computing device generally includes a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. One or more programs may implement or utilize the processes described in connection with the presently disclosed subject matter, e.g., through the use of an application programming interface (API), reusable controls, or the like. Such programs may be implemented in a high level procedural or object-oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language and it may be combined with hardware implementations.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
This application claims the benefit of U.S. Provisional Patent Application No. 61/562,516, filed on Nov. 22, 2011, entitled “FTFT-1D Patent Detailed Description,” and U.S. Provisional Patent Application No. 61/562,486, filed on Nov. 22, 2011, entitled “FTFT-2D Patent Detailed Description,” and U.S. Provisional Patent Application No. 61/562,498, filed on Nov. 22, 2011, entitled “FTFT-2D Patent Detailed Description,” the disclosures of which are expressly incorporated herein by reference in their entireties.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2012/066403 | 11/21/2012 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61562516 | Nov 2011 | US | |
61562486 | Nov 2011 | US | |
61562498 | Nov 2011 | US |