The invention relates to a system and a method of fast processing Singular Value Decomposition (SVD) of extremely large-scale low-rank matrices.
The matrix decomposition method has significant roles in computer applications such as computational mechanics, complex networks, machine learning, data analysis, image processing, and signal processing.
These computer applications that require the handling of large-scale low-rank matrices will always be executed in computer systems with limited resources. Growing requirements to process, analyze, and interpret such high-dimensional data, have led to methods to approximate the matrix with low-rank counterparts for practical applications. One of the most widely used methods for low-rank matrix analysis is singular value decomposition (SVD), a fundamental tool in linear algebra, statistics, and machine learning. SVD for dimensionality reduction techniques, e.g., principal component analysis (PCA), maps data from a high dimensional into a low dimensional space. The SVD of a real matrix A ε is A=UΣVT, where U ε
and V ε
are orthogonal matrices representing the left and right singular vectors, respectively, and Σ is an m×n diagonal matrix of singular values in descending order.
Although SVD provides the optimal low-rank approximation, it requires costly storage and long computing time that are prohibitive for large-scale matrices on ordinary desktop computers. Optimal SVD of a dense, low-rank (k) matrix A of size m by n requires 0(mn2+m2n+n3) time. If optimality is not required, computationally efficient low-rank approximations by randomized projection methods can be used to approximate the dominant subspace of the data matrix. Lanczos methods, for example, have time complexity 0(mnk2). Still, the whole data matrix is stored, the rank must be known in advance, and the process remains computationally expensive, especially for data projection.
Classical or randomized SVD approaches for large data require reading the whole data at least once, storage of substantial temporary matrices during computation, and large matrix multiplications during their projection steps.
The prior art matrix decomposition methods are slow and limited by the system specification making it not scalable
U.S. Pat. No. 10,229,092 discloses a low-rank matrix approximation method using low-rank matrix factorization in the lp-norm space, where p<2 (e.g., 1≤p<2). This method uses low-rank matrix factorization in the least absolute deviation (l1-norm) space providing an l1-PCA technique. This method minimizes the lp-norm of the residual matrix in the subspace factorization of an observed data matrix, such as to minimize the l1-norm of the residual matrix where p=1. The alternating direction method of multipliers (ADMM) is applied to solve the subspace decomposition of the low-rank matrix factorization with respect to the observed data matrix. Iterations of the ADMM may comprise solving a subspace decomposition and calculating the proximity operator of the l1-norm. However, this method is time-consuming and does not scale well for large matrices.
U.S. Pat. No. 9,760,537 discloses a computer-implemented method for finding a CUR decomposition. The method includes constructing, by a computer processor, a matrix C based on a matrix A. A matrix R is constructed based on the matrix A and the matrix C. A matrix U is constructed based on the matrices A, C, and R. The matrices C, U, and R provide a CUR decomposition of the matrix A. The construction of the matrices C, U, and R provide at least one of an input-sparsity-time CUR and a deterministic CUR. this method is time-consuming and does not scale well for large matrices.
In accordance with a first aspect of the present invention, there is provided computer-implemented method for generating an approximation image of a matrix A, the computer-implemented method comprising:
In accordance with the first aspect, the C and R matrices comprise columns and rows of A given by indices I and J generated randomly from a uniform distribution.
In accordance with the first aspect, the matrix U is an optimal solution of U=Cu†AuRu†, wherein rows and columns of Au are retrieved from the storage memory holding the matrix A using deterministic sampling method.
In accordance with the first aspect, the deterministic sampling method is a discrete empirical interpolation method (DEIM).
In accordance with the first aspect, the step of the Singular Value Decomposition on the C and R comprising generating matrices [Uc,Sc,Vc] and [Ur,Sr,Vr], such that C=UcScVcT and R=UrSrVrT.
In accordance with the first aspect, the step of Singular Value Decomposition on the C, U, and R comprises the step of generating a matrix Z, such that Z=ScVcTUUrSr.
In accordance with the first aspect, the approximation matrix UkΣkVkT is generated by the Singular Value Decomposition on the intermediate matrix Z, such that Z=UzSzVzT.
In accordance with the first aspect, Uk=UcUz, Σk=Sz, and Vk=VkVr.
In accordance with the first aspect, k is greater than a rank of the matrix A.
In accordance with the first aspect, k is an arbitrary integer.
In accordance with the first aspect, an error tolerance ε is stored in memory of the processing unit.
In accordance with the first aspect, distance functions for the Singular Values of C and R, δc=|σck−σck−1| and δr=|σrk−σrk−1| are generated, wherein Sc=diag(σc1, . . . ,σck) and Sr=diag(σr1, . . . ,σrk).
In accordance with the first aspect, if one of the distance functions returns a value greater than the error tolerance ε,k will be increased to construct in the processing unit a new C and R matrices decomposition with increased rows and columns until the distance functions return a value less than the error tolerance ε.
In accordance with the first aspect, the processing unit is adapted to compute the approximation matrix UkΣkVkT using partial elements of the matrix A.
In accordance with the first aspect, the processing unit comprises cache memory or random access memory having a size less than a size required to hold the matrix A.
In accordance with the first aspect, wherein the processing unit is adapted to load only the partial elements of matrix A to cache memory or random access memory for processing.
In accordance with the first aspect, the matrix A comprises a plurality of digital images.
In accordance with the first aspect, each digital image is represented as a column vector in matrix A.
In accordance with the first aspect, the matrix A comprises time series data wherein each column vector represents a snapshot data of a time interval.
Embodiments of the present invention will now be described, by way of example, with reference to the accompanying drawings in which:
The present invention is designed to address two problems in current computing systems which have physical limitations in the working memory and non-volatile storage: (1) representing large-scale data matrices using a small number of a few rows and columns, thereby addressing the memory issue; and (2) computing a satisfactory approximation of the SVD of the original matrix with these partial data sets.
The method of the present invention can be implemented in a computing system as shown in
In a preferred embodiment of the present invention, there is provide a computer-implemented method for generating an approximation image of a matrix A as shown in
The C and R matrices comprises columns and rows of A given by indices I and J generated randomly from a uniform distribution. The matrix U is an optimal solution of U=Cu†AuRu†, wherein row and column of Cu, Au, and Ru are retrieved from the storage memory holding the matrix A using deterministic sampling method. The processing unit is adapted to load only the partial elements of matrix A to cache memory or random access memory for processing the method of the present invention.
The deterministic sampling method is a discrete empirical interpolation method (DEIM). The step of the Singular Value Decomposition on the C and R comprising generating matrices [Uc,Sc,Vc] and [Ur,Sr,Vr], such that C=UcScVcT and R=UrSrVrT. The step of Singular Value Decomposition on the C and R comprises the step of generating a matrix Z, such that Z=ScVcTUUrSr. The approximation matrix UkΣkVkT is generated by the Singular Value Decomposition on the intermediate matrix Z, such that Z=UzSzVzT, wherein Uk=UcUz, Σk=Sz, and Vk=VzVr.
The value k is greater than a rank of the matrix A. In one embodiment, k is an arbitrary integer.
In one embodiment, when the rank of the matrix A is unknow, multiple Singular Value
Decomposition of C and R will be generated with an increment block size of C and R until the difference between 2 consecutive iterations is less than an error tolerance.
This method comprises the step of assigning an error tolerance ε is stored in memory of the processing unit. The distance functions for the Singular Value Decomposition of C and R, δc=|σck−σck−1| and δr=|σrk−σrk−1| are generated, wherein Sc=diag(σc1, . . . ,σck) and Sr=diag(σr1, . . . ,σrk). When one of the distance functions returns a value greater than the error tolerance ε, k will be increased to construct in the processing unit a new C and R matrices decomposition with increased columns and rows until the distance functions return a value less than the error tolerance ε.
The algorithm of an embodiment of the present invention has a lot of applications, such as images processing and artificial intelligence. Typically, for image processing, the matrix A comprises a plurality of digital images and each digital image is represented as a column vector in matrix A. In another application, the matrix A comprises time series data wherein each column vector represents a snapshot data of a time interval.
To provide a solution for both problems, the present invention provides a novel, highly efficient method, CUR-SVD, to compute the top k singular vectors and values of a large-scale matrix by exploiting its low-rank structure. Most large-scale data matrices in practical applications have low-rank structures or can be well-approximated by low-rank matrices, allowing the large-scale data matrices to be represented by only a small portion of rows and columns. If that representation has sufficiently high accuracy, efficient computation of the SVD can be performed for applications where this was not previously possible.
Interpretable and computationally efficient approaches based on selecting rows and columns from a data matrix, represent data in lower dimensional spaces with actual data points as basis vectors to maintain sparseness or nonnegativity, as in CUR matrix decomposition, which represents the data matrix as a product of three small matrices. CUR of matrix A ε is a low-rank approximation A≈CUR where C ε
and R ε
matrices are formed by extracting c columns and r rows of A, respectively, where c,r≥rank (A). U is constructed to make the approximation error very small. Randomized approaches for CUR decomposition sample columns and rows based on uniform or non-uniform distributions while deterministic approaches consider the approximation as an optimization problem and try to find the best decomposition that minimizes the approximation error. Low-rank approximation methods form two categories: (1) fixed-rank, where the rank of A is known, and (2) fixed-precision, where the rank is estimated by making the error between A and its approximation Aapp as small as possible ∥A−Aapp∥F<ε, where ε is a given accuracy tolerance).
The system or method of the present invention is adapted to handle large-scale matrices that are prohibitively large to store in random access memory (RAM) or processing unit caches for computing their SVD. Two algorithms to be executed by a computing system or processing unit (fixed rank CUR-SVD and fixed precision CUR-SVD) are proposed in the present invention to compute the top k singular vectors and their corresponding singular values of large-scale, low-rank data for the cases where the rank is known or unknown respectively.
To validate the accuracy and efficiency of the systems or methods of the present invention in practical scenarios, numerical experiments were conducted on a desktop computer with datasets of different sizes and low-rank structures to demonstrate the effectiveness of CUR-SVD. The system or method of the present invention is shown to provide comparable approximation accuracy to existing SVD algorithms for large-scale low-rank matrices but was much more efficient in terms of computational time and storage. The top 200 singular vectors and their corresponding singular values of a 1 million by 1 million matrix were calculated in about 3 and 15 minutes for the fixed-rank and fixed-precision algorithms, respectively, with a tiny approximation error and memory utilization of less than 3 GB. This makes CUR-SVD feasible on most current computing devices. Traditional methods would have required 7.27 TB of memory, which is impractical.
In this example embodiment, the systems and methods of the present invention are implemented by a computer or processor. The computer may be implemented by any computing architecture, including portable computers, tablet computers, stand-alone Personal Computers (PCs), smart devices, Internet of Things (IOT) devices, edge computing devices, client/server architecture, “dumb” terminal/mainframe architecture, cloud-computing based architecture, or any other appropriate architecture. The computing device may be appropriately programmed to implement the invention.
As shown in
The Server 100 may include storage devices such as a disk drive 108 which may encompass solid state drives, hard disk drives, optical drives, magnetic tape drives or remote or cloud-based storage devices. The server 100 may use a single disk drive or multiple disk drives, or a remote storage service 120. The server 100 may also have a suitable operating system 116 which resides on the disk drive or in the ROM of the server 100.
The computer or computing apparatus may also provide the necessary computational capabilities to operate or to interface with a machine learning network, such as neural networks, to provide various functions and outputs. The neural network may be implemented locally, or it may also be accessible or partially accessible via a server or cloud-based service. The machine learning network may also be untrained, partially trained or fully trained, and/or may be retrained, adapted or updated over time.
The CUR-SVD algorithm to be executed in a processing unit of a computing system of the present invention consists of two stages: constructing a CUR decomposition and performing SVD on the small decomposed C and R matrices. It is highly efficient in computational time and storage to compute the SVD of large-scale matrices compared with existing methods. The large-scale matrices sometimes be too big to store in computer memory. Therefore, the CUR-SVD algorithm of the present invention can be utilized in other valuable applications, such as efficient computation of the inverses of extremely large-scale low-rank matrices.
The method depicted in
The Fixed (Know) Rank CUR-SVD Method
For a large-scale matrix A ε , with low-rank structure rank (A)=ρ<<min {m,n}, randomly select c columns of A to form C ε
, where c>ρ. Then, with high probability, C will capture the column span of A. Similarly, randomly select r rows of A to form R ε
, where r>ρ. Then, with high probability, R captures the row span of A. The large-scale matrix A represented by CUR decomposition is A≈CUR, where U is the link between the column and row spans of A, which makes the approximation error very small. Hence, the left and right singular vectors and values of A can be obtained from the small matrices C, U, and R without having to load the large-scale matrix A into memory, as presented in the first stage of
Following is the pseudo-code of Algorithm 1 for generating an approximation decomposition matrix of A where A≈UkΣkVkT.
and k is the number of rows and columns of A to select, k >> ρ.
|I″|×c
R×|J″|
|I″|×|J″|
Algorithm 1 presents the pseudo-code of the fixed rank CUR-SVD method, which computes the top k singular vectors and values of the large-scale matrix A as A≈UkΣkVkT, using the small matrices of the CUR decomposition of A. The first stage constructs the CUR decomposition of A using only k (for simplicity assume c=r=k) rows and columns from A, and the second stage computes the k-SVD of A using the matrices of the CUR decomposition.
Inputs are the matrix A, k>>ρ, that is the number of rows and columns to be selected to form C and R, which is larger than the rank of A. The indices of the rows and columns, I and J, are generated randomly from a uniform distribution with the R and C matrices formed from the rows and columns of A given by I and J.
Effect of the U matrix: Matrix U of the CUR decomposition needs to be constructed to make the approximation error ∥A−CUR∥F2 or ∥A−CUR∥2 as small as possible. The simplest method, the pseudoinverse of the intersection matrix between C and R, gives a high approximation error for high rank or noisy data. The optimal solution is U=C†AR† where U is formed by weighting all elements of A, giving a small approximation error but at high computational expense, especially for large-scale matrices. To balance approximation accuracy and computational complexity, it is proposed to form U by selecting more rows from C and more columns from R, instead of the intersection between c columns and r rows, and then compute U using the optimal form. More rows and columns from C and R are obtained by a deterministic sampling method, called discrete empirical interpolation method (DEIM), that captures the most informative rows and columns using the information of singular vectors. DEIM is a point selection algorithm originally used to model order reduction in nonlinear dynamical systems that have been used as a fully deterministic algorithm to improve the error boundary when forming C and R in CUR decomposition.
Algorithm 3 shows the procedures to compute the selected row (and similarly column) indices/by the DEIM algorithm. The input is the full rank matrix U ε that contains the top-k left singular vectors. The algorithm processes one singular vector at a time, and each vector provides one index, as shown in Steps 2 to 9. Therefore, the maximum number of generated indices is the number of available singular vectors, k.
left singular vectors matrix (m ≥ k);
After constructing C and R, their SVD is computed to give Uc ε , Sc ε
, Vc ε
, Ur ε
, Sr ε
, and Vc ε
, as shown in Lines 5 and 6 of Algorithm 1 and the second stage of
CUR-SVD of the U matrix based on the intersection, optimal solution, and the DEIM method, are denoted as CUR-SVD(I), CUR-SVD(O), and CUR-SVD(D), respectively.
For k less than the rank of A, the approximation error is high and not stable for CUR-SVD(I), but similar to SVD for CUR-SVD(O) and CUR-SVD(D).
Only C, U, and R, a total of (mc+nr+rc) elements, instead of the of the whole A matrix, are considered, and computation of the SVD of A, with complexity (m min{m,n}2), is replaced by computing the SVD of C, Z, and R with complexities
(mk2),
(k2),
(nk2), respectively.
All experiments were implemented with MATLAB R2022a on a desktop computer with an Intel® Core™ i7-10700 processor @ 2.90 GHz, and 16 GB RAM.
The relative errors of the different methods are shown in
CUR-SVD gives more accurate singular values for three different patterns than randomized and single-pass SVD, especially for the S-shape pattern (
The timing, approximation error, and memory usage of CUR-SVD to compute the top k-SVD for k=100 and 200 from large-scale matrices are given in Table 1. For the top 200 singular vectors and values of a 1 million by 1 million matrix, about 82 seconds and only 3.052 GB of memory are required to give an approximation error of about 1.2e−12. This is superior to current state of the art methods in terms of data storage and computing time requirements.
By projecting one of the test images onto UUT of SVD and CUR-SVD while changing the number of singular vectors of U from 20 to 300 in steps of 20, the reconstructed faces shown in
Projecting 128 test images of two persons with PCA by the SVD and CUR-SVD methods, shows their clear separation and classification (
The fixed-precision CUR-SVD method estimates an accurate rank k and simultaneously computes the SVD without having to store the whole data set and approximated matrices, making it highly efficient in computational time and storage.
For a large-scale matrix A ε with low-rank structure rank(A)=ρ<<min{m,n} and C and R from a CUR decomposition, if rank(C)=rank(A)=rank(R), then rank(U)=rank(A) and A≈CUR with high probability. Suppose the number of columns selected to form C is c>>ρ, then
By applying thin SVD on C as, C=UcScVcT, Sc ε has the singular values of C,s1, . . . ,sρ, . . . , sc in descending order. For k<ρ<c,
If rank k is set to be ρ<k<c, then
Similarly for the R matrix by constructing R with r>ρ rows and truncating R with rank k>ρ,
For fixed precision algorithms, the goal is to make ∥A−Ak∥2<ε, where ε is the error tolerance, and find k that satisfies ∥A−Ak∥2<<ε or find k that makes σi(A) almost fixed for i=k+1, . . . ,min {m,n}. This requires computing the SVD of A, increasing the rank k and checking its singular values (i.e., σi(A)) until almost fixed. Computing σi(A) is impractical for large-size matrices. Fixed rank CUR-SVD can overcome this computational and storage complexity by selecting k columns and rows, and computing the k-SVD using Algorithm 1. Then the number of rows and columns selected can be increased and the k-SVD recomputed until σi(A) is almost fixed.
Re-computing k-SVD using fixed rank CUR-SVD is the main computational cost in the algorithm and, as the number of selected rows and columns in each iteration is increased, the SVD of C and R must be re-computed, so embodiments of the invention propose to use the computed singular vectors and corresponding singular values from the previous iteration to compute k-SVD of the current iteration. Incremental SVD (presented in Algorithm 4) is used to re-compute the SVD of C and R in each iteration, decreasing computational complexity.
For C ε with c>>ρ and rank(C)≈rank(A)≈ρ and R ε
with r>>ρ and rank(R)≈rank(A)=ρ, then, rank(U)≈rank(A)=ρ, and A=CUR. Hence,
The stopping condition is the minimum value of j at which the singular values of both C and R become almost fixed. For C, let δc=|σck−σck−1| where σxi is i-th singular value of X and for R, δr=|σrj−σrj−1|. Therefore, δc and δr both need to be smaller than the tolerance parameter, ε.
The Algorithm 2 below presents fixed precision CUR-SVD.
, ϵ, b is the number of selected
matrix after reading b columns.
matrix after reading b rows.
The inputs are a large-scale matrix A→, the tolerance parameter ε, and block size b, which is the increment in the number of rows and columns to be incremented in each iteration. For simplicity, in this embodiment, it is assumed c=r.
Initialization step: The column and row indices, J and I, are generated using a uniform sampling strategy, and b columns and rows of A corresponding to the first b indices are read to construct C and R.
Procedures: The SVD of C and R are computed, and δc a nd δr are initialized. If necessary, new blocks of b columns and b rows of A corresponding to the second subsets of indices J and I are read (from hard disk) to memory to construct the new matrices C′ and R′, the SVD of C and R are updated using IncSVD (the incremental SVD Algorithm 4) with δc and δr updated.
In one embodiment of the present invention, the main algorithm of incremental SVD by Brand is implemented with two modifications. The main algorithm is modified to deal with a block of columns instead of one column in each iteration, and the modified Gram-Schmidt Algorithm 4 for reorthogonalization after each iteration. The notation A (:, 1: ) and
refer to the first
columns of A, and the function diag:
→
takes as input a vector and returns a square matrix with the input vector entries as its diagonal elements. Let A be a m by n dense matrix and ai refer to the i-th column of A, i.e., A=[a1|a2| . . . |an]. Suppose the SVD of the Ae is computed as follows,
=UΣVT, where U ε
, Σ ε
is a diagonal matrix with the
(ordered) singular values of A
, and V ε
.
The incremental SVD aims to update the above U, Σ, and V matrices to find the SVD of =[
|B], where B ε
is the new adding matrix of b columns.
Algorithm 4 shows the procedures of the incremental SVD to compute the SVD of using the computed SVD of A
and the new adding columns B.
, Σ ∈
, V ∈
, B ∈
):
+ 1 :
+ b − 1)
= size (U, 2); c =
+ b;
If the stopping condition ((δc<ε) && (δr<ε) is satisfied, then the estimated rank k is k=e×b, where e is the number of times the stopping condition was invalid. The final C ε and R ε
matrices and their SVD [Uc, Sc, Vc] and [Ur, Sr, Vr], are the final results from IncSVD In Lines 15 and 16. Thus, the rank k and the SVD of C and R matrices are computed simultaneously.
As with the fixed rank CUR-SVD algorithm, after constructing C and R, the U matrix is computed. Finally, the intermediate matrix Z and its SVD [Uz, Sz, Vz] are computed to give Uk ε , Σ ε
, Vk ε
, with A≈UkΣkVkT.
The effect of all parameters, (error tolerance c, block size b, and size of the matrix n) on the estimated rank {circumflex over (k)}, run time, estimated error pn(E), and memory usage by the fixed precision CUR-SVD algorithm for different tolerance parameters (by row) are given in
For a large error tolerance, ε=1e−3, the estimated rank {circumflex over (k)} is far from the actual rank k, especially for small block sizes b (first row,
Tables 2 and 3 show the performance of the fixed-precision CUR-SVD algorithm for small-and large-scale matrices, respectively. The actual rank of the generated data was 150, the block size b was 20, and the error tolerance parameter ε was set to 1e−8. The estimated rank was 180 under different values of β. Therefore, CUR-SVD accurately estimated the number of columns and rows (i.e., estimated rank) that guarantee high approximation accuracy. For the 1,000,000 by 1,000,000 matrix, CUR-SVD required about 2.71 GB of memory to compute the first 180 singular vectors and singular values in about 15 minutes. The estimated error pi(E) was about 1e−13 for different matrix sizes and patterns of singular values.
Embodiments of the present invention presents a novel method to compute the top k-SVD of an extremely large-scale, low rank matrix that is prohibitive with existing SVD algorithms. CUR-SVD is based on randomly observing a part of the data using CUR decomposition. Two algorithms, fixed k-rank and fixed precision CUR-SVD, were developed for different low-rank approximation problems and achieved comparable approximation accuracy to the optimal reduced SVD algorithm. Instead of analyzing the whole matrix, these methods act via CUR decomposition and have higher efficiency than traditional SVD algorithms. When only a small portion of matrix entries are observed, the method can perform SVD on extremely large-scale matrices that cannot be loaded into computer memory, and does this in much less time than a standard method would need.
The performance of CUR-SVD was evaluated by comparing it with widely used methods for computing SVD. The MATLAB built-in command for truncated SVD, based on a Krylov subspace iterative method, was used and designated as SVD. For randomized SVD methods, two widely used methods, SVD randQB and SVD single pass, which are the primary randomized method and a single pass randomized SVD, respectively, were used for the performance comparison. The relative error and run-time are reported for all experiments as the average over ten trials. The relative errors in the Frobenius norm were used as an accuracy metric of the approximation, defined as,
For low-rank matrices, the matrix is generated directly based on SVD. Let V ε and W ε
be two randomly generated real unitary matrices, and Σ ε
be a non-negative real diagonal matrix with manually set diagonal values. The low-rank matrix A ε
will be represented as A=VΣWT. Results were compared with the ground truth of the generated data for experiments on large-scale matrices, as the existing SVD algorithms cannot handle such huge matrices.
To evaluate the performance of CUR-SVD in handling the U matrix (as shown in
To evaluate the computational and space efficiency of the fixed-rank CUR-SVD method, randomized SVD based on QB decomposition and single pass SVD are used in one embodiment of the present invention. A set of different-size matrices was generated with a rank of 100, and the singular values were chosen as Σk=diag(1−β,2−β, . . . ,k−β) for the singular values decay parameter β=0.5, as shown in
To check the accuracy of the resultant singular values and vectors by CUR-SVD, three types of matrices were generated to stand for different distribution shaped (slow, fast, and S-shape) decays of singular values. The singular vectors U and V were random matrices with orthonormal columns and the diagonal elements of Σ were σj=e−j/2, σj=e−j/7, and σj=0.0001+(1+ej−30)−1 for slow, fast, and S-shape decays, respectively. A matrix A ε was generated for each pattern of singular values and the top 200 singular vectors and singular vectors of A were computed using different methods.
In one embodiment of the present invention, the orthogonality of the resultant singular vectors can be evaluated by this proposed method: three matrices are generated in the same manner as mentioned above. The k-SVD is computed using SVD and the proposed methods for each matrix, where changes from 10 to 200 with step 10. For each k value, the orthogonality of computed left and right singular vectors is computed as,
respectively, where ∥x∥∞=max|xi|. Reference is made to
To evaluate CUR-SVD with large-scale matrices, the classical power method to estimate the spectral norm of error between the actual data matrix and approximated matrix by the proposed method was utilized.
For large-scale data (Table 1), a set of matrices of sizes starting from 100,000 by 100,000 to 1 million by 1 million was generated with a rank 150 and slow decay distribution of singular values. For each matrix, the rank k (the number of rows and columns selected to construct CUR) was set to 100 or 200. CUR-SVD was applied to large-scale matrices of size 1 million by 1 million with four different singular values distributions: linear-, polynomial-, cosine-, and trapezoid-like with actual ranks of 100, (
In order to evaluate CUR-SVD methods compared to SVD for PCA, the facial recognition benchmark dataset was used. The Yale Extended B dataset contains images of 38 individuals, and each has 64 images under different lighting conditions, as shown in
A matrix Ak can be specified in terms of the three matrices Uk, Σk, and Vk of its k-SVD. The matrices Uk and Vk were generated randomly from a normal distribution, then QR factorization was used to obtain the orthonormal ma trices of sizes m×k and k×n, respectively. The singular values of Ak
For each set of configurations, the computational time, memory usage, estimated rank k, and estimated error pi(E) are presented. Give a matrix A ε , there is provided a random column vector ω ε
with independent and identically distributed elements, each distributed as a Gaussian random variable of zero mean and unit variance. Let {umlaut over (ω)} ε
a column vector, {umlaut over (ω)}=ω/∥ω∥2. Then the estimated spectral norm of A at i-th step of the power iteration method is given by,
For a large value of i, pi(A) estimates the value of ∥A∥2 with high probability. In this example embodiment, usage of i=30 to ensure an accurate estimation.
Suppose Uk, Σk and Vk the actual decomposed matrices by k-SVD with A=UkΣkVkT. Let Ûk, {circumflex over (Σ)}k and {circumflex over (V)}k the output matrices from the proposed method. Then the reconstructed matrix A=Ûk{circumflex over (Σ)}k{circumflex over (V)}kT. It is infeasible to compute the error between the original data matrix A and approximated matrix  as ∥A−Â∥2 for large-scale matrices because of the multiplication computational complexity to compute A and Â, and their storage cost. However, to avoid the complexity of computing A and  by estimating the spectral norm of error between them directly as follows, let E=A−Â=UkΣkVkT−Ûk{circumflex over (Σ)}k{circumflex over (V)}kT, then
With high probability pi(E) estimates the value of ∥UkΣkVkT−Ûk{circumflex over (Σ)}k{circumflex over (V)}kT∥2. To evaluate the performance of the fixed precision CUR-SVD algorithm compared to SVD by MATLAB, matrices with sizes from 1,000 to 25,000 in steps of 1,000 were generated with the parameter settings given above. The timing, estimated error pi(E), and memory usage for each matrix size for different decay of singular values distribution β=0.5 were reported (
Although not required, the embodiments described with reference to the Figures can be implemented as an application programming interface (API) or as a series of libraries for use by a developer or can be included within another software application, such as a terminal or personal computer operating system or a portable computing device operating system. Generally, as program modules include routines, programs, objects, components and data files assisting in the performance of particular functions, the skilled person will understand that the functionality of the software application may be distributed across a number of routines, objects or components to achieve the same functionality desired herein.
It will also be appreciated that where the methods and systems of the present invention are either wholly implemented by computing system or partly implemented by computing systems then any appropriate computing system architecture may be utilized. This will include tablet computers, wearable devices, smart phones, Internet of Things (IoT) devices, edge computing devices, standalone computers, network computers, cloud-based computing devices and dedicated hardware devices. Where the terms “computing system” and “computing device” are used, these terms are intended to cover any appropriate arrangement of computer hardware capable of implementing the function described.
It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.
Any reference to prior art contained herein is not to be taken as an admission that the information is common general knowledge, unless otherwise indicated.