This application claims priority from Korean Patent Application No. 10-2021-0051577 filed on Apr. 21, 2021 in the Korean Intellectual Property Office, and all the benefits accruing therefrom under 35 U.S.C. 119, the contents of which in its entirety are herein incorporated by reference.
The present inventive concepts relate to a data processing method of tensor stream data.
Tensors are representation of multi-way data in a high-dimensional array. Data modeled as tensors are used in various fields, such as machine learning, urban computing, chemometrics, image processing, and recommender systems.
As the dimension and/or size of data increase, extensive research has been conducted on tensor factorization which may effectively analyze tensors.
The real data may include outliers due to causes such as network disconnection and system errors, or some of data may be lost. The tensor factorization based on tensors contaminated with such outliers and lost data is relatively inaccurate, and it is not easy to recover tensors contaminated with such outliers and loss data after tensor factorization.
Aspects of the present inventive concepts provide a tensor data processing method of detecting outliers and restoring missing values on the basis of the characteristics of the tensor stream data.
Aspects of the present inventive concepts also provide a tensor data processing method capable of predicting future tensor data.
Example embodiments of the present inventive concepts provide a tensor data processing method, the method comprises receiving an input tensor including at least one of an outlier and a missing value, the input tensor being input during a time interval between a first time point and a second time point, factorizing the input tensor into a low rank tensor to extract a temporal factor matrix, calculating trend and periodic pattern from the extracted temporal factor matrix, detecting the outlier which is out of the calculated trend and periodic pattern, updating the temporal factor matrix except the detected outlier, combining the updated temporal factor matrix and a non-temporal factor matrix of the input tensor to calculate the real tensor and recovering the input tensor by setting data corresponding to a position of the outlier or a position of the missing value of the input tensor from the data of the real tensor as an estimated value.
Example embodiments of the present inventive concepts provide a tensor data processing method, the method comprises receiving an input tensor, applying to the input tensor to initialize a static tensor factorization model of temporal characteristics, factorizing the input tensor into a temporal factor matrix and a non-temporal factor matrix on the basis of the static tensor factorization model, calculating trend and periodic pattern of the temporal factor matrix on the basis of the temporal prediction model, updating the temporal factor matrix and the non-temporal factor matrix in accordance with a dynamic tensor factorization model, combining the updated temporal factor matrix and the non-temporal factor matrix to calculate the real tensor and detecting and repairing an outlier tensor and a missing value of the input tensor on the basis of the real tensor.
Example embodiments of the present inventive concepts provide a tensor data processing method, the method comprises receiving an input tensor including at least one of an outlier and a missing value, the input tensor being input during a time interval between a first time point and a second time point, factorizing the input tensor into a low rank tensor to extract each factor matrix, calculating each data pattern from the extracted first temporal factor matrix, detecting the outlier which is out of the calculated data pattern from the first factor matrix, updating the first factor matrix on the basis of the calculated data pattern except the detected outlier, combining the updated first factor matrix with a remaining second factor matrix of the input tensor to calculate the real tensor and recovering the input tensor by considering to a position of the outlier or a position of the missing value of the input tensor from the data of the real tensor as an estimated value.
However, aspects of the present inventive concepts are not restricted to the one set forth herein. The aspects of the present inventive concepts will become more apparent to one of ordinary skill in the art to which the present inventive concepts pertain by referencing the detailed description of the present inventive concepts given below.
Unless otherwise specified, all terms used herein (including technical and scientific terms) may be used in the meaning that may be commonly understood by those having ordinary knowledge in the art to which the present inventive concepts belong. Also, terms commonly used and predefined are not ideally or excessively interpreted unless explicitly specifically defined.
Prior to specific explanation of example embodiments of the present inventive concepts, notations described in the present specification will be described.
Codes and indexes described herein are based on Table 1. For example, u is a scalar, u is a vector, U is a matrix, and x is a tensor. An N-dimensional tensor x indicates an (i1, . . . , iN)-th input xi
,
t
and
t
UW indicates a Hadamard Product of matrices U and W of the same size. Sequence U(N)
. . .
U(1) of Hadamard Product of matrix U may be represented by
n=1NU(n). Hadamard Product may be extended to tensors.
U⊙W indicates Khatri-Rao Product of matrices U and W of the same size. The Khatri-Rao Product of the matrix U and the matrix W may be represented by Equation (1). Sequence U(N)⊙ . . . ⊙U(1) of Khatri-Rao Matrix U may be represented by ⊙n=1NU(n).
Hereinafter, example embodiments according to the technical idea of the present inventive concepts will be described referring to the accompanying drawings.
The tensor is a representation of data in a multi-dimensional array. For example, a vector is called a rank 1 tensor, a matrix is called a rank 2 tensor, and an array of three or more dimensions is called a rank-n tensor. Various real data may be expressed in the form of tensors.
Referring to
On the other hand, in many applications, tensor data may be collected in the form of tensor streams that continuously increase over time. However, the real data collected in the form of the tensor stream may include missing value (m of
For example, a tensor x is a Rank 1 tensor that may be expressed as an outer product of N vectors. (For example, χ=u(1)○u(2)○ . . . ○u(N), where u(n)∈I
The present inventive concepts disclose a tensor data processing method of detecting missing values and outliers included in the tensor stream data in real time. The present invention also discloses a tensor data processing method of recovering missing values and outliers detected in a previous tensor stream received before. The present inventive concepts also disclose a tensor data processing method capable of predicting a future tensor stream to be received later.
The tensor factorization is a type of tensor analysis technique that may calculate a latent structure that makes up the tensor. The tensor factorization may reduce dimensions of tensor data of high-dimension and large volumes and express the tensor data by fewer number of parameters than existing data.
Explaining an example of the number of taxi operations of
This specification is based on a factorization method that includes a temporal factor matrix in the tensor factorization method. For example, CANDECOMP/PARAFAC (CP) factorization model among the temporal factor factorization models may be used. Hereinafter, it is referred to as a CP factorization method. An N-dimensional (Rank N) tensor may be represented by an outer product of the one-dimensional (rank-1) tensor R. For example, a tensor x based on the real data may be expressed as Equation (2).
In example embodiments, U(n)=[ũ1(n), . . . , ũR(n)]∈I
If one axis of the tensor means time, the corresponding factor matrix may be defined as the temporal factor matrix. That is, if U(3) of the number of taxi operations is explained by Equation (2), the time factor matrix that has passed from the departure time of the taxi may be represented by U(3)=[ũ1(3), . . . , ũR(3)]. —u13), , UR(3)1.
The tensor data processing method of the present inventive concepts may grasp the pattern of the observed data in the data to be input to the tensor stream, detect an outlier in the data that are input up to the present time on the basis of the grasped pattern, and estimate an estimated value corresponding to outlier and unobserved data (missing values).
In example embodiments, the data patterns may include not only the temporal characteristics of the data, but also a physical model of characteristics of the data itself, rules related to the data, a prior knowledge, and the like. The tensor data processing method of the present inventive concepts may calculate the pattern features of the tensor stream on the basis of at least two or more data patterns, detect an outlier from the observed data on the basis of the calculated pattern features, and determine estimated value corresponding to the missing value and the position of the detected outlier. Furthermore, the tensor processing method of the present inventive concepts may predict not only the tensor stream of the data observed so far but also the tensor stream of the future data to be input later on the basis of at least two or more data patterns, and detect the outliers in advance and estimate the missing values.
In some example embodiments, if there is sensing data related to the sensor, a pattern based on the hardware characteristics (sensing margin, sensor input/output characteristics, etc.) of the sensor itself may be considered together with the temporal characteristics. In other example embodiments, if there is data related to the CPU, the pattern related to the hardware characteristics and the operating characteristics of the CPU may be considered together with the temporal characteristics.
Hereinafter, although the tensor data processing method will be described focusing on the temporal characteristic pattern, the tensor data may be processed by also combining other data pattern features according to various example embodiments described above.
For example, if a temporal component is included in the data, the data to be input to the tensor stream show temporal smoothness and seasonal smoothness with the flow of time.
The temporal smoothness means that tensor data over continuous time have similar values. For example, the internal temperature, humidity, and illuminance of an office at 9:10 am are similar to the internal temperature, humidity and illuminance of the office at 9:20 am. The seasonal smoothness means that tensor data of continuous cycles have similar values. For example, the internal temperature, humidity, and illuminance of the office at 9:10 today may be similar to the internal temperature, humidity, and illuminance of the office at 9:10 am tomorrow.
Taking the example of an international visitor night event in Australia, shown in
For example, a level of a left graph (e.g., an average value for that year) shows a gradual trend of increase from 30 million to 60 million, and a trend of a left graph shows a gradual decrease until 2014 compared to before 2007 when looking at the slope of the graph that fluctuates in one year. Also, the seasonality of the input tensor is shown to fluctuate at an amplitude of a predetermined or alternatively, desired interval.
When such patterns are grasped, the visitor numbers in January 2005 to January 2013 may be used to predict the visitor numbers in January 2014 on the basis of the levels, trends and seasonality mentioned above. In addition, the number of visitors on Jan. 20, 2014 may be predicted on the basis of the number of visitors on Jan. 15, 2014. If the number of visitors on Jan. 20, 2014 is out of a predetermined or alternatively, desired range from the estimated value, it may be determined as an outlier, and in some example embodiments, it may be excluded from the tensor stream. It may also be used to predict the number of international visitors on the subsequent date, for example, Jan. 25, 2014, by substituting the estimated value estimated by previous data to the data determined to be outliers.
The input tensor y includes a real tensor x including data within the normal range and outlier (oinit). Assuming an incomplete N-dimensional real tensor x and rank R, the indicator matrix Ω is a binary tensor of the same size as χ, and the value of each factor ω in the matrix means whether to observe the element χ at the corresponding position. The indicator matrix Ω is defined as in Equation (3).
CP factorization is to find a factor matrix that reduces or minimizes the missing function of Equation (4) using only the observed tensor x. Equation (4) shows a value based on Frobenius norm (see Table 1) of hadamard product of the indicator matrix Ω, on the value obtained by subtracting the factor matrix {U(N)}n=1N that reduce or minimize the missing function from the real tensor x.
That is, a factor matrix {U(N)}n=1N that allows CP factorization with few missing value or outliers is found in the real data. An estimated tensor {circumflex over (χ)} may be restored by the use of the factor matrices calculated using the tensor factorization method, and the values missed from the real tensor χ may be estimated as the value {circumflex over (χ)} of the restored tensor.
The tensor data processing method according to some example embodiments includes a static tensor factorization model and a dynamic tensor factorization model. The static tensor factorization model may be based on the pattern characteristics of the data observed in a given time interval (e.g., Jan. 15, 2014, Jan. 20, 2014, and Jan. 25, 2014 in the example of
To detect outliers/missing values in the tensor stream, the periodic pattern and trend may be extracted from the previous tensor inputs (e.g., from 0 to t−1 time point) using the static tensor factorization model, and the data that does not match the extracted periodic pattern and the trend may be determined as an outlier. Alternatively, the missing value may be found.
Referring to I
(y=χ+
). The cost function of the tensor factorization model of the static tensor may be represented by Equation (5).
Referring to Equation (5), the static tensor factorization model looks for a factor matrix {U(N)}n=1N and an outlier tensor that reduce or minimize the cost function C. In the Equation (5), λ1 and λ2 are a temporal smoothness control parameter and a periodic smoothness control parameter, respectively, and λ3 is a sparsity control parameter that controls sparsity of the outlier
. m is a period, and each matrix Li∈
(I
For example, when λ1 is increased, the temporal smoothness is weighted and calculated, and when λ2 is increased, the periodic smoothness is weighted and calculated. When λ3 is increased, the outlier is sparse with small density. In other words, as λ1 and λ2 are increased, the more similar value will be made in terms of time and period.
The values of the smoothness constraint matrix are lnn=1, ln(n+i)=−1 for all 0≤n≤IN−i, and the remaining values are matrices having 0.
A first term ∥Ω(y−
−χ)∥F2 of Equation (5) is the missing function of the error that occurs when the y tensor is factorized into the real tensor χ and the outlier tensor
, a second term λ1∥L1U(N)∥F2 is a term that encourages the temporal smoothness on the temporal factor matrix U(N), a third term λ2∥LmU(N)∥F2 is a term that encourages periodic smoothness on the temporal factor matrix U(N), and a fourth term λ3∥
∥1 is a term that encourages sparsity on the outlier tensor.
When extracting the periodic pattern and trend from previous tensor inputs (e.g., from 0 to t−1 time point) using the static tensor factorization model in
For example, the tensor data processing method applies a dynamic tensor factorization model according to the extracted periodic patterns and trend to estimate future temporal factor matrix after the t time point (current), and generate future subtensors to which the estimated future temporal factor matrix is applied. The tensor data processing method compares the future subtensor at the estimated t time point with the input subtensor received at the actual t point. As a result of the comparison, a value which is out of the predetermined or alternatively, desired range is determined as an outlier, ignored and mapped to the estimated real data value, or a value that has no current record is determined as the missing value and is mapped to the estimated real data value.
The dynamic tensor factorization model is a model in which a temporal factor is reflected on the static tensor factorization model, and assumes the example of a dynamic tensor stream in which tensors are continuously collected in the form of a stream. This is the example where an incomplete subtensor yt∈I
A first term in Equation (7) is a missing function of the error that occurs when the input tensor yτ is factorized into the real tensor χτ and the outlier tensor τ, a second term is a term that encourages temporal smoothness on uτ(N) which is a vector newly added to the temporal factor matrix U(N), a third term is a term that encourages periodic smoothness on uτ(N) which is a vector newly added to the temporal factor matrix U(N), and a fourth term is a term that encourages sparsity on the outlier tensor
τ. In the Equation (7), λ1 and λ2 are the temporal smoothness control parameter and the periodic smoothness control parameter, respectively, and λ3 a sparsity control parameter that controls the sparsity of the outlier
. pτ and qτ are expressed as in Equation (8), respectively.
In Equations (7) and (8), uτ(N) is a temporal vector, which is a τ-th row vector of the temporal factor matrix, and means a temporal component of the input tensor yτ.
That is, referring to Equations (7) and (8), the dynamic tensor factorization model also finds the factor matrix {U(N)}n=1N and the outlier tensor that reduce or minimize Equation (7) at each time t. However, the time-series concept t is added to Equation (5), and if t=IN, Equations (5) and (7) become the same.
The tensor data processing method of the present inventive concepts may estimate the missing value included in the tensor stream and remove the outlier in real time. To this end, i) initialization of the tensor factorization model, ii) application to the temporal factor factorization model, and iii) dynamic update of the tensor stream are performed.
1. Initialization of Tensor Factorization Model
Referring to of the real tensor x, using Equation (5).
According to example embodiments, the initialization of the tensor factorization model may be performed as in Algorithm 1. Algorithm 1 is to find the factor matrix {U(N)}n=1N and outlier tensor that reduce or minimize the cost function C on the basis of Equation (5).
t, Ωt}t=1t
init ← [1,
2, . . . ,
t
init ← I
init, . . . , {U(n)}n=1N)
init ← SoftThresholding(Ωinit (
init − {circumflex over (χ)}init), λ3)
First, each subtensor yt of ti is connected to generate a matrix Ω and an outlier tensor oinit of the same size as the collective tensor yinit and the real tensor χ, and initialize the factor matrix {U(N)}n=1N. The temporal components matrix is found using SOFIAALS, and outlier tensors is detected according to Equation (9) like line 8 of Algorithm 1.
SoftThresholding(x,λ3)=sign(x)·max(|x|−λ3, 0) (9)
Explaining Equation (9), if the value calculated as Ωinit(yinit−{circumflex over (x)}init) exceeds λ3, this example is determined as the outlier tensor oinit and the value is mapped to 0 instead of the outlier data.
In order to increase the speed of detecting outliers according to example embodiments, the value λ3 may be detected using a decade factor d like algorithms line 9 to line 12 of Algorithm 1, and the outliers may be detected, while gradually reducing the value λ3. For example, the decade factor d is about 0.8 to reduce the scale of the value λ3.
On the other hand, like Line 7 of Algorithm 1, optimization may be performed in an ALS (Alternating Least Squares) manner to estimate the factor matrix from the input tensor yinit at the time of tensor factorization. The ALS manner is to fix other matrix (for example, second factor matrix) except the first factor matrix to update one matrix (for example, the first factor matrix), and then update the first factor matrix in the direction of optimizing Equation (5). As an example, the initialization of the tensor factorization model may obtain outliers and temporal factor, using SOFIAALS like Algorithm 2.
,
, Ω, R, m, λ1, λ2,
* = −
Referring to Algorithm 2, SOFIAALS updates the non-temporal factor matrix U(n) row by row in the input tensor y*=yinit−init in which the outlier is removed, using Equation (10).
For example, referring to line 3 to line 9 of Algorithm 2, a single non-temporal factor matrix ui) of Equation (10) is found, using Equations (11) and (12) from the non-temporal factor matrix.
In order to find the value that reduces or minimizes the cost function C({U(n)}n=1N, ), it is possible to calculate ui
Equation (13) may be arranged as Equation (14),
The non-temporal factor matrix ui
B
i
(n)
u
i
(n)
=c
i
(n)
⇔u
i
(n)
=B
i
(n)
c
i
(n) (15)
On the other hand, referring to line 10 to line 12 of Algorithm 2, each row ui) according to Equation (16).
In example embodiments, each row of the temporal factor matrix is based on Equation (17), and KiNj and HiNj of Equation (16) are based on Equation (18).
In summary, the initialization method calculates a temporal factor matrix U(N), using the static tensor factorization model of Equation (5) and the ALS manner from a tensor stream which is input multiple times (for example, three times) the minimum period.
2. Application to Temporal Factor Factorization Model
As a result of initialization, a temporal factor matrix U(N) having each of row ũ1(N), ũ2(N), . . . , ũR(N), which has a length ti and a period m is calculated.
The temporal factor factorization model may extract a predetermined or alternatively, desired pattern from the temporal factor matrix. For example, when applying the temporal factor matrix to the temporal factor factorization model, the level, trend, and seasonality patterns of the temporal factor matrix U(N). For example, it is possible to extract that the number of international visitors to Australia as explained in
Halt-Winters model (hereinafter referred to as a HW model) may be used as the temporal factor factorization model according to some example embodiments. The HW model may be defined by one prediction equation such as Equation (24) based on three smoothness equations such as Equations (21) to (23) below.
l
t=α(yt−st−m)+(1−α)(lt−1+bt−1), (21)
b
t=β(lt−lt−1)+(1−β)bt−1, (22)
s
t=γ(yt−lt−1−bt−1)+(1−γ)st−m (23)
Equation (21) shows an equation for a level pattern on the time t of the temporal factor factorization model, Equation (22) shows an equation for a trend pattern on the time t of the temporal factor factorization model, and Equation (23) shows an equation of the seasonality pattern on the time t of the temporal factor factorization model. In example embodiments, each of the coefficients α, β, γ is a real number between 0 and 1.
In Equation (24), ŷt+h|t is a predicted temporal factor matrix after the lapse of h at the time t, and m means a seasonality period.
allows estimated value of the seasonal component used in the estimation to be calculated in the last season of the time series. The HW model exponentially decreases the weight as the generated time point of data is old, and increases the weight as the generated time point is recent. In order to access with this weighted average manner, it is necessary to estimate the smoothing parameters and initial values of level, trend, and seasonal components.
When the time t is t=1, . . . , T in the collected tensor stream, T means the last time point, and a difference between the t time point and the previous t−1 time point is derived like et=yt−{tilde over (y)}t|t−1 in the HW model. In example embodiments, the initial values and coefficients α, β, γ of the level, trend, and seasonal components are found using the sum of squared errors (SSE) Σt=1Te12 so that the SSE is reduced or minimized to find a difference et.
That is, learning of the HW model is to find a level pattern or a trend pattern or a seasonality pattern in which the difference SSE between the previous time point t−1 and the current time point t is reduced or minimized, while controlling the coefficients α, β, γ.
A specific description of the HW model refers to C C Holt, “Forecasting seasonals and trends by exponentially weighted moving averages,” International journal of forecasting, vol. 20, no. 1, pp. 5-10, 2004, and P R Winters, “Forecasting sales by exponentially weighted moving averages, Management science, vol. 6, no. 3, pp. 324-342, 1960.
In this specification, although the HW model has been described as an example, an ARIMA model and a Seasonal ARIMA model may be used as the temporal factor factorization model according to various example embodiments, and a temporal prediction algorithm based on machine learning may also be applied.
3. Dynamic Update of Tensor Stream
Referring to t from the subtensors yt (previous input) continuously received to restore each factor matrix, that is, the real tensor χt. The restored factor matrix includes level, trend, and seasonal component in the example of the data pattern characteristics, for example, temporal characteristics. For example, the restored factor matrix may be restored, using the level pattern, trend pattern, or seasonality pattern calculated in the previous process.
The restored factor matrix is used to estimate the missing value on the basis of the difference from the newly received subtensor (current input). Algorithm 3 relates to a dynamic update in the tensor data processing method.
t, Ωt}t=t
t with (21)
1) Estimation of Outlier Subtensor t at time t
Referring to line 3 of Algorithm 3, and
û
t|t−1
(N)
=l
t−1
+b
t−1
+s
t−m (25)
That is, as in line 4 of Algorithm 3, the prediction subtensor ŷt|t−1 may be calculated as in Equation (26) on the basis of Equation (25) for the temporal factor matrix.
ŷ
t|t−1
=
{U
t−1
(n)}n=1N−1, ût|t−1(N) (26)
As in t using the 2-sigma rule.
In Equation (27), {circumflex over (Σ)}t−1 is an error scale tensor, which is a tensor that stores how many errors occur in each entry. The 2-sigma rule is represented as in Equation (28).
According to the procedure described above, the outlier tensor t of the currently input subtensor (
2) Update of Non-Temporal Factor Matrix
When the outlier tensor t detected by Equation (27) is subtracted from the actually input subtensor yt, the time t, that is, the real tensor xt of the currently input tensor, is calculated. Ideally, there is a need to update the non-temporal factor matrix U(n) to reduce or minimize Equation (7) which is the cost function for all subtensors yinit received before the time t. However, since the length of the tensor stream may increase to infinity, a new cost function ft that considers only the item of the time t is defined by Equation (29).
That is, when comparing Equation (7) with Equation (29), Equation (7) calculates the cost function on the basis of the subtensor yt to be input at the time t=1, 2, . . . , but Equation (29) is calculated in consideration of only an example where the subtensor yt is related to the time t.
The non-temporal factor matrix {U(n)}n=1N−1 may be calculated by applying Equation (29) to the gradient descent algorithm (gradient descent, or gradient descent method).
A residual subtensor of Ωi(yt−
t−
{U(n)}n=1N−1, u(N)
) is set as Rt in Equation (29). The non-temporal factor matrix needs to be updated in the direction of reducing or minimizing Equation (29) in units of size μ (that is, reducing or minimizing based on the derivative value of ft) as in Equation (30).
In example embodiments, R(n) is represented by a matrix of the mode n of the subtensor Rt (mode-n matricization of Rt).
That is, the non-temporal factor matrix Ut−1(n) up to now may be updated to the non-temporal factor matrix Ut(n) of the next time t according to Equation (30).
3) Update of Temporal Factor Matrix
Temporal factor matrix ut(N) is also needed to be updated in the direction of reducing or minimizing Equation (29) in units of size μ (that is, reducing or minimizing based on the differential value of ft) as in Equation (31).
In Equation (31), vec ( ) is a vectorization operator.
4) Update of Data Pattern Characteristics
As in line 7 to line 9 of Algorithm 3, when the non-temporal factor matrix Ut(n) and the temporal factor matrix ut(N) in terms of time t are calculated in operations 2) and 3), data pattern characteristics, that is, level patterns, trend patterns and seasonality patterns are updated for use in predicting the subtensor at the subsequent time t+1. Equation (32) calculates a level pattern lt, a trend pattern bt and a seasonality pattern st which are newly updated (line 10 of Algorithm 3).
l
t=diag(α)(ut(N)−st−m)+(IR−diag(α))(lt−1+bt−1),
b
t=diag(β)(lt−lt−1)+(IR−diag(β))bt−1,
s
t=diag(γ)(ut(N)−lt−1−bt−1)+(IR−diag(γ))(st−m (32)
In Equation (32), diag ( ) is an operator that creates a matrix having elements of the input vector on the main diagonal. IR is an R×R matrix.
5) Restoration of Real Tensor {circumflex over (χ)}t
The real subtensor {circumflex over (χ)}t at the time t is restored as in Equation (33), on the basis of each of the factor matrices updated as in operations 1) to 4) described above, for example, the non-temporal factor matrix Ut(n) and the temporal factor matrix ut(N).
{circumflex over (χ)}t={Ut(n)}n=1N−1, ut(N)
(33)
Also, the restored values of the restored subtensor {circumflex over (χ)}t may be used to restore the missing values. As in
4. Prediction of Future Subtensor
When using the tensor data processing method described above, it is possible to estimate/predict the tensor to be input later, that is, the future subtensor as in
For example, the temporal factor matrix ût|t
ŷ
t|t
=
{U
t
(n)}n=1N−1, ût|t (34)
The tensor data processing method of the present inventive concepts may estimate the future subtensor ŷt|t
Referring to
The apparatus calculates trend and periodic pattern of the temporal factor matrix (S30). The apparatus detects an outlier from the raw input tensor data based on the trend and the periodic patterns (S40). The apparatus updates the temporal factor matrix except the outlier (S50). The apparatus combines the updated temporal factor matrix and the non-temporal factor matrix to calculate a real tensor (S60). The apparatus repair data of the outlier position based on the real tensor and recover (S70). For example, the apparatus estimates normal value of the outlier position or the missing values' position based on the trend and the periodic pattern.
Additionally, the apparatus may predict a future(next) input tensor and use it to detect new outlier based on the trend and periodic pattern of the real tensor.
Referring to
As an input, the 90% data was lost in a 30×30×90 size synthetic tensor with a period of 30, and outliers corresponding to 7 times the largest value of the entire data values were injected into 20% data of the remaining data.
As the number of repetitions increased, the normalized resolution error was significantly lowered when using the tensor data processing method of the present inventive concepts, whereas when the general ALS, the resolution error was not lowered.
For example, as the experimental results obtained by repeating 1,000 times, when using the general ALS, although it was failed to find a periodic pattern, when using the tensor data processing method of the present inventive concepts, it was verified that the temporal characteristics, that is, the periodic patterns, may be already accurately found before reaching the 1,000th repetition even in the data in which data of 90% was lost and serious outliers were included.
Referring to
Referring to
Referring to
The electronic apparatus (100) may be designed to perform various functions in the semiconductor system, and the electronic apparatus (100) may include, for example, an application processor. For example, the electronic apparatus (100) may analyze the data input to the tensor stream according to the above-mentioned data processing method, extract valid information, and make a situation determination on the basis of the extracted information or control the configurations of an electronic device on which the electronic apparatus is mounted. In example embodiments, the electronic apparatus (100) may be applicable to one of a robotic device such as a drone and an advanced drivers assistance system (ADAS), and computing devices that perform various computing functions, such as a smart TV, a smart phone, a medical device, a mobile device, a video display device, a measurement device, an IoT (Internet of Things) device, an automobile, a server, and equipment. In addition, the electronic apparatus may be mounted on at least one of various types of electronic devices.
The processor (110) is, for example, an NPU (Neural Processing Unit) that performs the above-mentioned data processing method based on the input tensor data, generates an information signal based on the execution result, or may be retrained to predict the future tensor data. The processor (110) may include programmable logic according to some example embodiments, or may further include a MAC (Multiply Accumulate circuit).
Alternatively, the processor (110) may be one of various types of processing units such as a CPU (Central Processing Unit), a GPU (Graphic Processing Unit), and an MCU (Micro COntroller Unit), depending on some example embodiments.
The memory (120) may store the tensor stream to be input and the intermediate result accompanying the operation of the processor (110). For example, the memory (120) may store the raw input tensors which input to the electronic apparatus (100), real tensors and intermediate values such as temporal factor matrix or non-temporal factor matrix. The memory (120) may include a cache, a ROM (Read Only Memory), a PROM (Programmable Read Only Memory), an EPROM (Erasable PROM), an EEPROM (Electrically Erasable Programmable Read-Only Memory), a PRAM (Phase-change RAM), a Flash memory, an SRAM (Static RAM), or a DRAM (Dynamic RAM), as an operating memory, according to some example embodiments. Alternatively, the memory (120) may include a flash memory, or a resistive type such as an ReRAM (resistive RAM), a PRAM (phase change RAM), and a MRAM (magnetic RAM), as a non-volatile memory device, according to some example embodiments. In addition, the non-volatile memory device may include an integrated circuit including a processor and a RAM, for example, a storage device or a PIM (Processing in Memory).
On the other hand, the electronic apparatus may include at least one functional circuits, receive the information signal of the processor (110), determine the situation or execute other operations.
According to the tensor data processing method of the present inventive concepts described above, real-time processing is enabled in detecting the outliers and recovering the missing values. According to example embodiments, the tensor data processing method of the present inventive concepts may be utilized in a pretreatment processing that utilizes data of the temporal characteristics that appear in semiconductor facility, semiconductor design, device characteristic measurement, and the like, while reducing or minimizing the loss of information. It is also possible to process data lost in the pretreatment process or noise. Further, according to the tensor data processing method of the present inventive concepts, it may be utilized for online learning that may process data updated in real time at high speed.
Alternatively, according to the tensor data processing method of the present inventive concepts described above, it may be used to grasp the relevance of various sensor data collected from a plurality of sensors in the semiconductor process facility, detect the outlier of the sensor data or restore the missing values in real time to predict more accurately future data. Further, it is possible to detect the outlier instantly on the basis of the predicted data.
For some embodiments, the environment inside the semiconductor fabrication is very strictly managed. If a sudden change in temperature or humidity of the internal environment of the semiconductor fabrication occurs, the probability of occurrence of abnormalities in the semiconductor process increases and adversely affects the yield. Therefore, it is necessary to determine the linear correlation of sensor data collected from various types of sensors inside the semiconductor fabrication and at the same time to detect missing values and outliers in real time. It can help improve semiconductor yield by re-inspecting the semiconductor process by detecting the occurrence of outliers in real time.
Alternatively, according to the tensor data processing method of the present inventive concepts described above, in network traffic management, temporal data of traffic incoming for each server port can be efficiently managed. For example, the trend and periodicity of traffic data can be grasped, and the acceptance degree of the server can be determined through the future traffic data prediction.
For some embodiments, a network traffic management is a very important issue for companies or operators that provide internet services. For example, OTT service providers use a function that auto-scaling the number of servers according to the number of users watching the video. This present disclosure provides analyzing user's network traffic changing in real time, predicting the future, and detecting outliers. This is essential for service providers to operate non-stop and low latency.
This is because, according to the tensor data processing method of the present disclosure, network load balancing can be performed by predicting a time period when the number of viewers increases and securing more available servers in advance.
For examples, Cloud service providers such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure provide the flexibility to quickly scale up or down services, such as the auto-scaling function. The auto scaling function can perform to monitor various network resources such as CPU, Memory, Disk, and Network and automatically adjust the size of the server. The cloud service providers use the technology of the present invention to analyze the resource usage of instances, to model it as a tensor, and to predict a future resource usage before it increases. This makes it possible to provide more stable cloud services.
Alternatively, according to the tensor data processing method of the present inventive concepts described above, in traffic data management, it is possible to grasp the trend and periodicity of traffic and utilize it for traffic system management in a specific time zone.
For some embodiments, the traffic data shows clear periodic characteristics. Typically, the traffic data can be used in navigation applications or autonomous driving technology.
For example, the present disclosure performs to collect road traffic from location A to location B and model it as a tensor, so that future traffic trend can be predicted. And the present disclosure can capture regions with sudden increases in traffic in real time. This information can be used to re-search alternative routes and to estimate more accurate estimated travel times.
Alternatively, according to the tensor data processing method of the present inventive concepts described above, in banking data management, it is possible to grasp the trend and periodicity of banking transaction and utilize it for electrical banking system management to prevent from abnormal transaction.
For example, the present invention performs to collect users' remittance history or card usage history, model them as tensors and predict usual patterns of user's banking transaction. When a transaction different from the usual banking transaction pattern occurs, it can be estimated as an abnormal transaction. the abnormal transactions can be detected in advance and blocked in real time.
While the present inventive concept has been described with reference to exemplary embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made thereto without departing from the spirit and scope of the present inventive concept.
Number | Date | Country | Kind |
---|---|---|---|
10-2021-0051577 | Apr 2021 | KR | national |