The invention relates to the method and system for data analyzing. In particular, a method and system to extract intrinsic information from any data, stationary or non-stationary to define probability distribution as the intrinsic probability distribution.
Probability distribution is one of the most powerful tools in study stochastic phenomena. It can be applied to any non-deterministic measurable quantity under very general conditions from continuous to categorical data. The concept of probability distribution is so fundamental that the very existence of electrons from a quantum mechanical view could only be described in terms of probability distribution. On macro-scale physical phenomena, probability distribution is the main tool for study everything non-deterministic. Specific examples are turbulence from flows in the blood vessels to galactic scale, Electroencephalogram (EEG) signals and many social economic problems.
In probability studies, there is a powerful theorem: the Central Limit Theorem, which states that if the random variable is identically distributed, has well defined arithmetical mean and variance, and the sample is large enough (ergodic), then the distribution will approach Gaussian distributed irrespective to the underlying distributions. Therefore, the term ‘normal’ distribution is also known as theorem of large numbers. The theorem is so powerful, that the conditions could be loosely satisfied and the results are still Gaussian. Consequently, Gaussian distribution is also known as normal distribution that covers most of the phenomena we observed and measured. It offers a global view of the phenomena. The question here is how much information is buried in this amalgam of large number? How can we find out the underlying driving mechanisms for the distribution and the nuance of the intrinsic probabilistic structures?
Furthermore, a critical condition for the central limit theorem is the existence of well-defined arithmetic mean and variance. Then, there is an obvious and inherent limitation on its validity: the method can only be applied to a homogeneous population. In case of physical measurable quantities expressed as a time series, the probability is then only valid to stationary processes. Many natural as well as manmade phenomena are not stationary. Even for a stationary process, our measurement period might not long enough to cover all the possible time scales involved and render the measured sample locally non-stationary. At any rate, the classical view of probability distribution would not be able to apply to non-stationary processes. However, there is still need a meaningful measure of the intrinsic probabilistic structure to reveal the statistical properties of those non-stationary or locally non-stationary processes.
The present invention provides a method for data analyzing in a data analyzing processor, comprises receiving a data, then decomposing the data into a plurality of intrinsic mode functions (IMFs) by utilizing an empirical mode decomposition (EMD) method, wherein the intrinsic mode functions are a value changes over time of the data in different frequencies, obtaining a plurality of probability density functions based on accumulating the distribution of each IMF according to a longest mean time scale, and generating an intrinsic probability distribution function (iPDF) component spectrum, wherein the iPDF component spectrum comprises the distribution of probability density functions between a frequency dimensional and a standard deviation dimensional. This invention result of the method can be used as a diagnosis tool implementing in a system.
In an embodiment of the invention, a system for data analyzing comprises a measurement processor, an analysis processor and an outputting processor.
The measurement processor receives a data.
The analyzing processor is connected the measurement processor for decomposing the data into a plurality of IMFs by utilizing an EMD method, obtains a plurality of probability density functions based on accumulating the distribution of each IMF according to a longest mean time scale.
The outputting processor is connected the analyzing processor for generating an iPDF component spectrum, wherein the iPDF component spectrum comprises the distribution of probability density functions between a frequency dimensional and a standard deviation dimensional.
Other systems, methods, features, and advantages of the present disclosure will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present disclosure, and be protected by the accompanying claims.
Many aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views. The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Having summarized various aspects of the present disclosure, reference will now be made in detail to the description of the disclosure as illustrated in the drawings. While the disclosure will be described in connection with these drawings, there is no intent to limit it to the embodiment or embodiments disclosed herein. On the contrary, the intent is to cover all alternatives, modifications and equivalents included within the spirit and scope of the disclosure as defined by the appended claims.
The present invention discloses a method implemented in a data analyzing device. It is understood that the method provides merely an example of the many different types of functional arraignments that may be employed to implement the operation of the various components of a system for data analyzing, a computer system, a multiprocessor computing device, and so forth. The execution steps of the present invention may include application specific software which may store in any portion or component of the memory including, for example, random access memory (RAM), read-only memory (ROM), hard drive, solid-state drive, magneto optical (MO), IC chip, USB flash drive, memory card, optical disc, or other memory components.
For some embodiments, the system comprises a display processor, a processing processor, a memory, an input processor and a storage medium. The input processor used to provide data such as image, text or control signals to an information processing system such as a computer or other information appliance. In accordance with some embodiments, the storage medium such as, by way of example and without limitation, a hard drive, an optical processor or a remote database server coupled to a network, and stores software programs. The memory typically is the process in which information is encoded, stored, and retrieved etc. The processing processor performs data calculations, data comparisons, and data copying. The display processor is an output processor that visually conveys text, graphics, and spectrum. Information is shown on the display processor is called soft copy because the information exists electronically and is displayed for a temporary period of time. The display processor includes CRT monitors, LCD monitors and displays, gas plasma monitors, and televisions. In accordance with such embodiments of present invention, the software programs are stored in the memory and executed by the processing processor when the computer system executes the method for data analyzing. Finally, information provided by the processing processor, and presented on the display processor or stored in the storage medium.
The measurement processor 110 receives a data, wherein the data can be in nonlinear and non-stationary type. The analyzing processor 120 receives the data from the measurement processor 110, decomposes the data into a plurality of intrinsic mode functions (IMFs) by utilizing an empirical mode decomposition (EMD) method, wherein the IMFs are a value changes over time of the data in different frequencies.
Please refer
Therefore, the IMFs 210 provide a compact support to the distribution of the original data up to the overall trend scale. The EMD is applied to Gaussian white noise, each IMF has mean frequency separated from its neighboring components dyadically, and its distribution is still Gaussian. With such nice properties for the EMD expansion, the intrinsic probability distribution of any data, stationary or non-stationary can be analyzed, and gain physically understanding of the underlying distribution properties.
The analyzing processor 120 obtains a plurality of probability density functions based on accumulating the distribution of each IMF according to a longest mean time scale. In one embodiment, the partial sums are formed from the highest frequency component is calculated by the analysis processor 220 according to the following expression:
wherein cj(t) are the Intrinsic Mode Functions and a trend.
As each IMF is zero mean, the partial sum should also be zero mean. The time scale of the partial is then limited by the time scale of the IMF component having the longest mean time scale. The sum of all the IMFs is be full data, which would give the global distribution of the full data minus the overall trend to obtain the probability density functions for each successive partial sum till reach the full data set.
Those distributions from the partial sums are presented as a two dimensional contour to reveal the intrinsic probability distributions of the data detrended or cut off at various time scales. To make the results easy to interpret, the analyzing processor 120 accentuates the departure of each distribution from the pervasive Gaussian bell curve, and plots the contour of the difference between the data and the model Gaussian values.
The system 100 provides probability distribution for each IMF component. The density functions of the last few IMF components look strange for there are only a limited number of oscillations in the IMFs that would make the distribution to have insufficient degree of freedom. The probability distribution of the partial sums and the components could be drastically different, especially when the process involved multi-scale interactions especially with nonlinear multiplicative processes. The resulting partial sum and component probability density functions are designated as probability density function.
The outputting processor 130 generates an intrinsic probability distribution function (iPDF) component spectrum, wherein the iPDF component spectrum comprises the distribution of probability density functions between a frequency dimensional and a standard deviation dimensional.
The outputting processor 130 generates an iPDF partial sum spectrum, wherein the iPDF partial sum spectrum comprises a distribution of the probability density function in the first highest frequency and the summing probability density functions between the frequency dimensional and the standard deviation dimensional.
In addition to non-stationary processes, iPDF is used for stationary processes. As the mean and variance values would be dominated by the most energetic variations, the global distribution is dominated by such component and reduce the less energetic component obscured. The intrinsic probability distribution would alleviate these conditions and enable to examine the probabilistic structure of components of all scales in great details.
In some embodiments, before the analyzing processor 120 examines any data from natural phenomena, we calibrates the method with a white noise of 10,000 sample with a unity standard deviation value, and its interactions with a deterministic type wave is calculated by the analysis processor 220 according to the following expression:
However, some probability density functions of the component could deviate drastically from the Gaussian as shown in
The outputting processor 130 examines the results of the partial sums and provides the iPDF partial sum spectrum 700. In
The analyzing processor 120 chooses a non-continue distribution part 708 of the iPDF partial sum spectrum 700, such as T= 1/32 to ⅛, compares the non-continue distribution part of the iPDF partial sum spectrum 700 with the iPDF component spectrum 600 to determine a variation probability density function (T=1/f= 1/16).
The modulation effect on all the other IMFs is to make the next three IMFs slightly super-Gaussian.
In
Linear additive processes are simply superposition without any interactions between the stoke wave and the white noise. The influence of the deterministic wave only shows up when the scale reach the wave period. The multiplicative process is influence all the IMF components. This calibration exercise shows that even for stationary processes, the iPDF provides more information on the constituting components and the driving mechanism involved in the data generation processes.
The analyzing processor 120 chooses a non-continue distribution part 1108 of the iPDF partial sum spectrum 1100, such as T= 1/32 to T=⅛, compares the non-continue distribution part of the iPDF partial sum spectrum 1100 with the iPDF component spectrum 1000 to determine probability density function (T= 1/16) as a variation probability density function.
In addition to non-stationary processes, iPDF is used for stationary processes. As the mean and variance values would be dominated by the most energetic variations, the global distribution is dominated by such component and reduce the less energetic component obscured. The intrinsic probability distribution alleviates these conditions and enables to examine the probabilistic structure of components of all scales in great details.
We may show the prowess of iPDF in revealing the detailed dynamics of the wave turbulence interactions. Here we examine the probability density functions for each IMF component between a frequency dimensional 1202 and a standard deviation dimensional 1204, the results are shown in
In one embodiment,
The analyzing processor 120 examines EEG signal from health control and AD patients at various stages of progress. Showing in the iPDF component spectrum 1400, 1402, 1404, and 1406, the disease progress, the iPDF becomes increasingly super-Gaussian, a condition indicating the lack of variation of the brain responses to any stimuli, or the mental state is increasingly rigid and non-responsive. Because of interest is the difference between the health control and the initial AD case. The highest frequency component for the health control is bimodal, indicating that there is a rich signal at the highest frequency range fluctuating with 3 point waves with values either at maxima or minima Showing in the iPDF partial sum spectrum 1500, 1502, 1504, and 1506, as soon as the AD initiates, this highest frequency component immediately disappears in exact the same way as the deterministic wave modulating the white noise given in
In an embodiment, the analyzing processor 120 sums up the plurality of distribution of the probability density functions in a first highest frequency and a second highest frequency to obtain a first summing probability density function. The analyzing processor 120 further sums up the plurality of distribution of the first summing probability density function and the probability density function in a third highest frequency, to obtain a second summing probability density function and repeating the last step, to sum up the distribution of a n-th summing probability density function and the probability density function in a (n+2)-th highest frequency, to obtain a (n+1)-th summing probability density function. An iPDF partial sum spectrum is generated by the outputting processor 130, wherein the iPDF partial sum spectrum comprises a distribution of the probability density function in the first highest frequency and the summing probability density functions between the frequency dimensional and the standard deviation dimensional.
In other embodiment, the outputting processor 130 classifies the data into a single mode, an additive mode or a product mode according to the distributions of the iPDF partial sum spectrum.
In other embodiment, the analyzing processor 120 chooses a non-continue distribution part of the iPDF partial sum spectrum, compares the non-continue distribution part of the iPDF partial sum spectrum with the iPDF component spectrum to determine a variation probability density function.
The invention provides the new iPDF achieves by adding a new dimension to examine phenomena separated by scales for studying the probability structure in great details even with a dominant components. The probability distribution reveals the details of the constituting components and the underlying processes. In particular, the iPDF provides more detailed nuanced presentations of the probability density as a function of the intrinsic scales of the data both in component-wise and also in the partial sums.
The iPDF offers a powerful tool to study the data imbedded with multi-scale variations as in the LOD data. The iPDF reveals the intrinsic probabilistic characteristics of all the scales involved, even if the data is stationary. Furthermore, iPDF provides any non-stationary data with strong trend, for the EMD would detrend the data in various IMF components, the data with various partial sums and not only provides probability distribution, but also with nuance. The iPDF would not be limited to stationary processes but also extends the probability studying to data into the non-stationary processes. Therefore, the method and system for data analyzing enables to examine the intrinsic probabilistic properties in great details for both stationary and non-stationary processes.
It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.