1. Technical Field
The present invention relates to improved methods and systems for digital filtering or signal filtering with a digital component by employing novel tensor-vector multiplication methods. The tensor-vector multiplication technique is also employed for determination of correlation of signals in electronic systems, for forming control signals in automated control systems, etc.
2. Background Art
Digital Filtering
A digital filter is an apparatus that receives a digital signal and provides as output a corresponding signal from which certain signal frequency components have been removed or blocked. Various digital filters have different resolution accuracies and remove different frequency components to accomplish different purposes. Some digital filters simply block out entire frequency ranges. Examples are high pass filters and low pass filters. Others target particular problems such as noise spectra or try to clean up signals by relating the frequencies to previously received signals. Examples are Wiener and Kalman filters.
Methods and systems for tensor-vector multiplications are known in the art. One of such methods and systems is disclosed in U.S. Pat. No. 8,316,072. In this patent a method (and structure) of executing a matrix operation is disclosed, which includes, for a matrix A, separating the matrix A into blocks, each block having a size p-by-q. The blocks of size p-by-q are then stored in a cache or memory in at least one of the two following ways. The elements in at least one of the blocks are stored in a format in which elements of the block occupy a location different from an original location in the block, and/or the blocks of size p-by-q are stored in a format in which at least one block occupies a position different relative to its original position in the matrix A.
U.S. Pat. No. 8,250,130 discloses a block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. The mechanism increases block size and divides each block into sub-blocks. By reversing the visitation order, the mechanism eliminates a sub-block load at the corner turns. The mechanism performs sub-block matrix multiplication for each sub-block in a given block, and then repeats operation for a next block until all blocks are computed. The mechanism may determine block size and sub-block size to optimize load balancing and memory bandwidth. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.
U.S. Pat. No. 8,237,638 discloses a method of driving an electro-optic display, the display having a plurality of pixels each addressable by a row electrode and a column electrode, the method including: receiving image data for display, the image data defining an image matrix; factorizing the image matrix into a product of at least first and second factor matrices, the first factor matrix defining row drive signals for the display, the second factor matrix defining column drive signals for the display; and driving the display row and column electrodes using the row and column drive signals respectively defined by the first and second factor matrices.
U.S. Pat. No. 8,223,872 discloses an equalizer applied to a signal to be transmitted via at least one multiple input, multiple output (MIMO) channel or received via at least one MIMO channel using a matrix equalizer computational device. Channel state information (CSI) is received, and the CSI is provided to the matrix equalizer computational device when the matrix equalizer computational device is not needed for matrix equalization. One or more transmit beam steering code words are selected from a transmit beam steering codebook based on output generated by the matrix equalizer computational device in response to the CSI provided to the matrix equalizer computational device.
U.S. Pat. No. 8,211,634 discloses compositions, kits, and methods for detecting, characterizing, preventing, and treating human cancer. A variety of chromosomal regions (MCRs) and markers corresponding thereto, are provided, wherein alterations in the copy number of one or more of the MCRs and/or alterations in the amount, structure, and/or activity of one or more of the markers is correlated with the presence of cancer.
U.S. Pat. No. 8,209,138 discloses methods and apparatus for analysis and design of radiation and scattering objects. In one embodiment, unknown sources are spatially grouped to produce a system interaction matrix with block factors of low rank within a given error tolerance and the unknown sources are determined from compressed forms of the factors.
U.S. Pat. No. 8,204,842 discloses systems and methods for multi-modal or multimedia image retrieval. Automatic image annotation is achieved based on a probabilistic semantic model in which visual features and textual words are connected via a hidden layer comprising the semantic concepts to be discovered, to explicitly exploit the synergy between the two modalities. The association of visual features and textual words is determined in a Bayesian framework to provide confidence of the association. A hidden concept layer which connects the visual feature(s) and the words is discovered by fitting a generative model to the training image and annotation words. An Expectation-Maximization (EM) based iterative learning procedure determines the conditional probabilities of the visual features and the textual words given a hidden concept class. Based on the discovered hidden concept layer and the corresponding conditional probabilities, the image annotation and the text-to-image retrieval are performed using the Bayesian framework.
U.S. Pat. No. 8,200,470 discloses how improved performance of simulation analysis of a circuit with some non-linear elements and a relatively large network of linear elements may be achieved by systems and methods that partition the circuit so that simulation may be performed on a non-linear part of the circuit in pseudo-isolation of a linear part of the circuit. The non-linear part may include one or more transistors of the circuit and the linear part may comprise an RC network of the circuit. By separating the linear part from the simulation on the non-linear part, the size of a matrix for simulation on the non-linear part may be reduced. Also, a number of factorizations of a matrix for simulation on the linear part may be reduced. Thus, such systems and methods may be used, for example, to determine current in circuits including relatively large RC networks, which may otherwise be computationally prohibitive using standard simulation techniques.
U.S. Pat. No. 8,195,734 discloses methods of combining multiple clusters arising in various important data mining scenarios based on soft correspondence to directly address the correspondence problem in combining multiple clusters. An algorithm iteratively computes the consensus clustering and correspondence matrices using multiplicative updating rules. This algorithm provides a final consensus clustering as well as correspondence matrices that gives intuitive interpretation of the relations between the consensus clustering and each clustering from clustering ensembles. Extensive experimental evaluations demonstrate the effectiveness and potential of this framework as well as the algorithm for discovering a consensus clustering from multiple clusters.
U.S. Pat. No. 8,195,730 discloses apparatus and method for converting first and second blocks of discrete values into a transformed representation, the first block is transformed according to a first transformation rule and then rounded. Then, the rounded transformed values are summed with the second block of original discrete values, to then process the summation result according to a second transformation rule. The output values of the transformation via the second transformation rule are again rounded and then subtracted from the original discrete values of the first block of discrete values to obtain a block of integer output values of the transformed representation. By this multi-dimensional lifting scheme, a lossless integer transformation is obtained, which can be reversed by applying the same transformation rule, but with different signs in summation and subtraction, respectively, so that an inverse integer transformation can also be obtained. Compared to a separation of a transformation in rotations, on the one hand, a significantly reduced computing complexity is achieved and, on the other hand, an accumulation of approximation errors is prevented.
U.S. Pat. No. 8,194,080 discloses a computer-implemented method for generating a surface representation of an item includes identifying, for a point on an item in an animation process, at least first and second transformation points corresponding to respective first and second transformations of the point. Each of the first and second transformations represents an influence on a location of the point of respective first and second joints associated with the item. The method includes determining an axis for a cylindrical coordinate system using the first and second transformations. The method includes performing an interpolation of the first and second transformation points in the cylindrical coordinate system to obtain an interpolated point. The method includes recording the interpolated point in a surface representation of the item in the animation process.
U.S. Pat. No. 8,190,549 discloses an online sparse matrix Gaussian process (OSMGP) which is using online updates to provide an accurate and efficient regression for applications such as pose estimation and object tracking. A regression calculation module calculates a regression on a sequence of input images to generate output predictions based on a learned regression model. The regression model is efficiently updated by representing a covariance matrix of the regression model using a sparse matrix factor (e.g., a Cholesky factor). The sparse matrix factor is maintained and updated in real-time based on the output predictions. Hyperparameter optimization, variable reordering, and matrix downdating techniques can also be applied to further improve the accuracy and/or efficiency of the regression process.
U.S. Pat. No. 8,190,094 discloses a method for reducing inter-cell interference and a method for transmitting a signal by a collaborative MIMO scheme, in a communication system having a multi-cell environment are disclosed. An example of a method for transmitting, by a mobile station, precoding information in a collaborative MIMO communication system includes determining a precoding matrix set including precoding matrices of one more base stations including a serving base station, based on signal strength of the serving base station, and transmitting information about the precoding matrix set to the serving base station. A mobile station in an edge of a cell performs a collaborative MIMO mode or inter-cell interference mitigation mode using the information about the precoding matrix set collaboratively with neighboring base stations.
U.S. Pat. No. 8,185,535 discloses methods and systems for determining unknowns in rating matrices. In one embodiment, a method comprises forming a rating matrix, where each matrix element corresponds to a known favorable user rating associated with an item or an unknown user rating associated with an item. The method includes determining a weight matrix configured to assign a weight value to each of the unknown matrix elements, and sampling the rating matrix to generate an ensemble of training matrices. Weighted maximum-margin matrix factorization is applied to each training matrix to obtain corresponding sub-rating matrix, the weights based on the weight matrix. The sub-rating matrices are combined to obtain an approximate rating matrix that can be used to recommend items to users based on the rank ordering of the corresponding matrix elements.
U.S. Pat. No. 8,175,853 discloses systems and methods for combined matrix-vector and matrix-transpose vector multiply for block sparse matrices. Exemplary embodiments include a method of updating a simulation of physical objects in an interactive computer, including generating a set of representations of objects in the interactive computer environment, partitioning the set of representations into a plurality of subsets such that objects in any given set interact only with other objects in that set, generating a vector b describing an expected position of each object at the end of a time interval h, applying a biconjugate gradient algorithm to solve A*.DELTA.v=b for the vector.DELTA.v of position and velocity changes to be applied to each object wherein the q=Ap and qt=A.sup.T(pt) calculations are combined so that A only has to be read once, integrating the updated motion vectors to determine a next state of the simulated objects, and converting the simulated objects to a visual.
U.S. Pat. No. 8,160,182 discloses a symbol detector with a sphere decoding method. A baseband signal is received to determine a maximum likelihood solution using the sphere decoding algorithm. A QR decomposer performs a QR decomposition process on a channel response matrix to generate a Q matrix and an R matrix. A matrix transformer generates an inner product matrix of the Q matrix and the received signal. A scheduler reorganizes a search tree, and takes a search mission apart into a plurality of independent branch missions. A plurality of Euclidean distance calculators are controlled by the scheduler to operate in parallel, wherein each has a plurality of calculation units cascaded in a pipeline structure to search for the maximum likelihood solution based on the R matrix and the inner product matrix.
U.S. Pat. No. 8,068,560 discloses a QR decomposition apparatus and method that can reduce the number of computers by sharing hardware in an MIMO system employing OFDM technology to simplify a structure of hardware. The QR decomposition apparatus includes a norm multiplier for calculating a norm; a Q column multiplier for calculating a column value of a unitary Q matrix to thereby produce a Q matrix vector; a first storage for storing the Q matrix vector calculated in the Q column multiplier; an R row multiplier for calculating a value of an upper triangular R matrix by multiplying the Q matrix vector by a reception signal vector; and a Q update multiplier for receiving the reception signal vector and an output of the R row multiplier, calculating an Q update value through an accumulation operation, and providing the Q update value to the Q column multiplier to calculate a next Q matrix vector.
U.S. Pat. No. 8,051,124 discloses a matrix multiplication module and matrix multiplication method are provided that use a variable number of multiplier-accumulator units based on the amount of data elements of the matrices are available or needed for processing at a particular point or stage in the computation process. As more data elements become available or are needed, more multiplier-accumulator units are used to perform the necessary multiplication and addition operations. Very large matrices are partitioned into smaller blocks to fit in the FPGA resources. Results from the multiplication of sub-matrices are combined to form the final result of the large matrices.
U.S. Pat. No. 8,185,481 discloses a general model which provides collective factorization on related matrices, for multi-type relational data clustering. The model is applicable to relational data with various structures. Under this model, a spectral relational clustering algorithm is provided to cluster multiple types of interrelated data objects simultaneously. The algorithm iteratively embeds each type of data objects into low dimensional spaces and benefits from the interactions among the hidden structures of different types of data objects.
U.S. Pat. No. 8,176,046 discloses systems and methods for identifying trends in web feeds collected from various content servers. One embodiment includes, selecting a candidate phrase indicative of potential trends in the web feeds, assigning the candidate phrase to trend analysis agents, analyzing the candidate phrase, by each of the one or more trend analysis agents, respectively using the configured type of trending parameter, and/or determining, by each of the trend analysis agents, whether the candidate phrase meets an associated threshold to qualify as a potential trended phrase.
U.S. Pat. No. 8,175,872 discloses enhancing noisy speech recognition accuracy by receiving geotagged audio signals that correspond to environmental audio recorded by multiple mobile devices in multiple geographic locations, receiving an audio signal that corresponds to an utterance recorded by a particular mobile device, determining a particular geographic location associated with the particular mobile device, selecting a subset of geotagged audio signals and weighting each geotagged audio signal of the subset based on whether the respective audio signal was manually uploaded or automatically updated, generating a noise model for the particular geographic location using the subset of weighted geotagged audio signals, where noise compensation is performed on the audio signal that corresponds to the utterance using the noise model that has been generated for the particular geographic location.
U.S. Pat. No. 8,165,373 discloses a computer-implemented data processing system for blind extraction of more pure components than mixtures recorded in 1D or 2D NMR spectroscopy and mass spectrometry. Sparse component analysis is combined with single component points (SCPs) to blind decomposition of mixtures data X into pure components S and concentration matrix A, whereas the number of pure components S is greater than number of mixtures X. NMR mixtures are transformed into wavelet domain, where pure components are sparser than in time domain and where SCPs are detected. Mass spectrometry (MS) mixtures are extended to analytical continuation in order to detect SCPs. SCPs are used to estimate number of pure components and concentration matrix. Pure components are estimated in frequency domain (NMR data) or m/z domain (MS data) by means of constrained convex programming methods. Estimated pure components are ranked using negentropy-based criterion.
U.S. Pat. No. 8,140,272 discloses systems and methods for unmixing spectroscopic data using nonnegative matrix factorization during spectrographic data processing. In an embodiment, a method of processing spectrographic data may include receiving optical absorbance data associated with a sample and iteratively computing values for component spectra using nonnegative matrix factorization. The values for component spectra may be iteratively computed until optical absorbance data is approximately equal to a Hadamard product of a path length matrix and a matrix product of a concentration matrix and a component spectra matrix. The method may also include iteratively computing values for path length using nonnegative matrix factorization, in which path length values may be iteratively computed until optical absorbance data is approximately equal to a Hadamard product of the path length matrix and the matrix product of the concentration matrix and the component spectra matrix.
U.S. Pat. No. 8,139,900 discloses an embodiment for retrieval of a collection of captured images that form at least a portion of a library of images. For each image in the collection, a captured image may be analyzed to recognize information from image data contained in the captured image, and an index may be generated, where the index data is based on the recognized information. Using the index, functionality such as search and retrieval is enabled. Various recognition techniques, including those that use the face, clothing, apparel, and combinations of characteristics may be utilized. Recognition may be performed on, among other things, persons and text carried on objects.
U.S. Pat. No. 8,135,187 discloses techniques for removing image autoflourescence from fluorescently stained biological images. The techniques utilize non-negative matrix factorization that may constrain mixing coefficients to be non-negative. The probability of convergence to local minima is reduced by using smoothness constraints. The non-negative matrix factorization algorithm provides the advantage of removing both dark current and autofluorescence.
U.S. Pat. No. 8,131,732 discloses a system with a collaborative filtering engine to predict an active user's ratings/interests/preferences on a set of new products/items. The predictions are based on an analysis the database containing the historical data of many users' ratings/interests/preferences on a large set of products/items.
U.S. Pat. No. 8,126,951 discloses a method for transforming a digital signal from the time domain into the frequency domain and vice versa using a transformation function comprising a transformation matrix, the digital signal comprising data symbols which are grouped into a plurality of blocks, each block comprising a predefined number of the data symbols. The method includes the process of transforming two blocks of the digital signal by one transforming element, wherein the transforming element corresponds to a block-diagonal matrix comprising two sub matrices, wherein each sub-matrix comprises the transformation matrix and the transforming element comprises a plurality of lifting stages and wherein each lifting stage comprises the processing of blocks of the digital signal by an auxiliary transformation and by a rounding unit.
U.S. Pat. No. 8,126,950 discloses a method for performing a domain transformation of a digital signal from the time domain into the frequency domain and vice versa, the method including performing the transformation by a transforming element, the transformation element comprising a plurality of lifting stages, wherein the transformation corresponds to a transformation matrix and wherein at least one lifting stage of the plurality of lifting stages comprises at least one auxiliary transformation matrix and a rounding unit, the auxiliary transformation matrix comprising the transformation matrix itself or the corresponding transformation matrix of lower dimension. The method further comprising performing a rounding operation of the signal by the rounding unit after the transformation by the auxiliary transformation matrix.
U.S. Pat. No. 8,107,145 discloses a reproducing device for performing reproduction regarding a hologram recording medium where a hologram page is recorded in accordance with signal light, by interference between the signal light where bit data is arrayed with the information of light intensity difference in pixel increments, and reference light, includes, a reference light generating unit to generate reference light irradiated when obtaining a reproduced image; a coherent light generating unit to generate coherent light of which the intensity is greater than the absolute value of the minimum amplitude of the reproduced image, with the same phase as the reference phase within the reproduced image; an image sensor to receive an input image in pixel increments; and an optical system to guide the reference light to the hologram recording medium, and also guide the obtained reproduced image according to the irradiation of the reference light, and the coherent light to the image sensor.
U.S. Pat. No. 8,099,381 discloses systems and methods for factorizing high-dimensional data by simultaneously capturing factors for all data dimensions and their correlations in a factor model, wherein the factor model provides a parsimonious description of the data; and generating a corresponding loss function to evaluate the factor model.
U.S. Pat. No. 8,090,665 discloses systems and methods to find dynamic social networks by applying a dynamic stochastic block model to generate one or more dynamic social networks, wherein the model simultaneously captures communities and their evolutions, and inferring best-fit parameters for the dynamic stochastic model with online learning and offline learning.
U.S. Pat. No. 8,077,785 discloses a method for determining a phase of each of a plurality of transmitting antennas in a multiple input and multiple output (MIMO) communication system includes: calculating, for first and second ones of the plurality of transmitting antennas, a value based on first and second groups of channel gains, the first group including channel gains between the first transmitting antenna and each of a plurality of receiving antennas, the second group including channel gains between the second transmitting antenna and each of the plurality of receiving antennas; and determining the phase of each of the plurality of transmitting antennas based on at least the value.
U.S. Pat. No. 8,060,512 discloses a system and method for analyzing multi-dimensional cluster data sets to identify clusters of related documents in an electronic document storage system. Digital documents, for which multi-dimensional probabilistic relationships are to be determined, are received and then parsed to identify multi-dimensional count data with at least three dimensions. Multi-dimensional tensors representing the count data and estimated cluster membership probabilities are created. The tensors are then iteratively processed using a first and a complementary second tensor factorization model to refine the cluster definition matrices until a convergence criteria has been satisfied. Likely cluster memberships for the count data are determined based upon the refinements made to the cluster definition matrices by the alternating tensor factorization models. The present method advantageously extends to the field of tensor analysis a combination of Non-negative Matrix Factorization and Probabilistic Latent Semantic Analysis to decompose non-negative data.
U.S. Pat. No. 8,046,214 discloses a multi-channel audio decoder providing a reduced complexity processing to reconstruct multi-channel audio from an encoded bitstream in which the multi-channel audio is represented as a coded subset of the channels along with a complex channel correlation matrix parameterization. The decoder translates the complex channel correlation matrix parameterization to a real transform that satisfies the magnitude of the complex channel correlation matrix. The multi-channel audio is derived from the coded subset of channels via channel extension processing using a real value effect signal and real number scaling.
U.S. Pat. No. 8,045,810 discloses a method and system for reducing the number of mathematical operations required in the JPEG decoding process without substantially impacting the quality of the image displayed. Embodiments provide an efficient JPEG decoding process for the purposes of displaying an image on a display smaller than the source image, for example, the screen of a handheld device. According to one aspect of the invention, this is accomplished by reducing the amount of processing required for dequantization and inverse DCT (IDCT) by effectively reducing the size of the image in the quantized, DCT domain prior to dequantization and IDCT. This can be done, for example, by discarding unnecessary DCT index rows and columns prior to dequantization and IDCT. In one embodiment, columns from the right, and rows from the bottom are discarded such that only the top left portion of the block of quantized, and DCT coefficients are processed.
U.S. Pat. No. 8,037,080 discloses example collaborative filtering techniques providing improved recommendation prediction accuracy by capitalizing on the advantages of both neighborhood and latent factor approaches. One example collaborative filtering technique is based on an optimization framework that allows smooth integration of a neighborhood model with latent factor models, and which provides for the inclusion of implicit user feedback. A disclosed example Singular Value Decomposition (SVD)-based latent factor model facilitates the explanation or disclosure of the reasoning behind recommendations. Another example collaborative filtering model integrates neighborhood modeling and SVD-based latent factor modeling into a single modeling framework. These collaborative filtering techniques can be advantageously deployed in, for example, a multimedia content distribution system of a networked service provider.
U.S. Pat. No. 8,024,193 discloses methods and apparatus for automatic identification of near-redundant units in a large TTS voice table, identifying which units are distinctive enough to keep and which units are sufficiently redundant to discard. According to an aspect of the invention, pruning is treated as a clustering problem in a suitable feature space. All instances of a given unit (e.g. word or characters expressed as Unicode strings) are mapped onto the feature space, and cluster units in that space using a suitable similarity measure. Since all units in a given cluster are, by construction, closely related from the point of view of the measure used, they are suitably redundant and can be replaced by a single instance. The disclosed method can detect near-redundancy in TTS units in a completely unsupervised manner, based on an original feature extraction and clustering strategy. Each unit can be processed in parallel, and the algorithm is totally scalable, with a pruning factor determinable by a user through the near-redundancy criterion. In an exemplary implementation, a matrix-style modal analysis via Singular Value Decomposition (SVD) is performed on the matrix of the observed instances for the given word unit, resulting in each row of the matrix associated with a feature vector, which can then be clustered using an appropriate closeness measure. Pruning results by mapping each instance to the centroid of its cluster.
U.S. Pat. No. 8,019,539 discloses a navigation system for a vehicle having a receiver operable to receive a plurality of signals from a plurality of transmitters includes a processor and a memory device. The memory device has stored thereon machine-readable instructions that, when executed by the processor, enable the processor to determine a set of error estimates corresponding to pseudo-range measurements derived from the plurality of signals, determine an error covariance matrix for a main navigation solution using ionospheric-delay data, and, using a parity space technique, determine at least one protection level value based on the error covariance matrix.
U.S. Pat. No. 8,015,003 discloses a method and system for denoising a mixed signal. A constrained non-negative matrix factorization (NMF) is applied to the mixed signal. The NMF is constrained by a denoising model, in which the denoising model includes training basis matrices of a training acoustic signal and a training noise signal, and statistics of weights of the training basis matrices. The applying produces weight of a basis matrix of the acoustic signal of the mixed signal. A product of the weights of the basis matrix of the acoustic signal and the training basis matrices of the training acoustic signal and the training noise signal is taken to reconstruct the acoustic signal. The mixed signal can be speech and noise.
U.S. Pat. No. 8,005,121 discloses the embodiments relate to an apparatus and a method for re-synthesizing signals. The apparatus includes a receiver for receiving a plurality of digitally multiplexed signals, each digitally multiplexed signal associated with a different physical transmission channel, and for simultaneously recovering from at least two of the digital multiplexes a plurality of bit streams. The apparatus also includes a transmitter for inserting the plurality of bit streams into different digital multiplexes and for modulating the different digital multiplexes for transmission on different transmission channels. The method involves receiving a first signal having a plurality of different program streams in different frequency channels, selecting a set of program streams from the plurality of different frequency channels, combining the set of program streams to form a second signal, and transmitting the second signal.
U.S. Pat. No. 8,001,132 discloses systems and techniques for estimation of item ratings for a user. A set of item ratings by multiple users is maintained, and similarity measures for all items are precomputed, as well as values used to generate interpolation weights for ratings neighboring a rating of interest to be estimated. A predetermined number of neighbors are selected for an item whose rating is to be estimated, the neighbors being those with the highest similarity measures. Global effects are removed, and interpolation weights for the neighbors are computed simultaneously. The interpolation weights are used to estimate a rating for the item based on the neighboring ratings, Suitably, ratings are estimated for all items in a predetermined dataset that have not yet been rated by the user, and recommendations are made of the user by selecting a predetermined number of items in the dataset having the highest estimated ratings.
U.S. Pat. No. 7,996,193 discloses a method for reducing the order of system models exploiting sparsity. According to one embodiment, a computer-implemented method receives a system model having a first system order. The system model contains a plurality of system nodes, a plurality of system matrices. The system nodes are reordered and a reduced order system is constructed by a matrix decomposition (e.g., Cholesky or LU decomposition) on an expansion frequency without calculating a projection matrix. The reduced order system model has a lower system order than the original system model.
U.S. Pat. No. 7,991,717 discloses a system, method, and process for configuring iterative, self-correcting algorithms, such as neural networks, so that the weights or characteristics to which the algorithm converge to do not require the use of test or validation sets, and the maximum error in failing to achieve optimal cessation of training can be calculated. In addition, a method for internally validating the correctness, i.e. determining the degree of accuracy of the predictions derived from the system, method, and process of the present invention is disclosed.
U.S. Pat. No. 7,991,550 discloses a method for simultaneously tracking a plurality of objects and registering a plurality of object-locating sensors mounted on a vehicle relative to the vehicle is based upon collected sensor data, historical sensor registration data, historical object trajectories, and a weighted algorithm based upon geometric proximity to the vehicle and sensor data variance.
U.S. Pat. No. 7,970,727 discloses a method for modeling data affinities and data structures. In one implementation, a contextual distance may be calculated between a selected data point in a data sample and a data point in a contextual set of the selected data point. The contextual set may include the selected data point and one or more data points in the neighborhood of the selected data point. The contextual distance may be the difference between the selected data point's contribution to the integrity of the geometric structure of the contextual set and the data point's contribution to the integrity of the geometric structure of the contextual set. The process may be repeated for each data point in the contextual set of the selected data point. The process may be repeated for each selected data point in the data sample. A digraph may be created using a plurality of contextual distances generated by the process.
U.S. Pat. No. 7,953,682 discloses methods, apparatus and computer program code processing digital data using non-negative matrix factorization. A method of digitally processing data in a data array defining a target matrix (X) using non-negative matrix factorization to determine a pair of matrices (F, G), a first matrix of said pair determining a set of features for representing said data, a second matrix of said pair determining weights of said features, such that a product of said first and second matrices approximates said target matrix, the method comprising: inputting said target matrix data (X); selecting a row of said one of said first and second matrices and a column of the other of said first and second matrices; determining a target contribution (R) of said selected row and column to said target matrix; determining, subject to a non-negativity constraint, updated values for said selected row and column from said target contribution; and repeating said selecting and determining for the other rows and columns of said first and second matrices until all said rows and columns have been updated.
U.S. Pat. No. 7,953,676 discloses a method for predicting future responses from large sets of dyadic data including measuring a dyadic response variable associated with a dyad from two different sets of data; measuring a vector of covariates that captures the characteristics of the dyad; determining one or more latent, unmeasured characteristics that are not determined by the vector of covariates and which induce local structures in a dyadic space defined by the two different sets of data; and modeling a predictive response of the measurements as a function of both the vector of covariates and the one or more latent characteristics, wherein modeling includes employing a combination of regression and matrix co-clustering techniques, and wherein the one or more latent characteristics provide a smoothing effect to the function that produces a more accurate and interpretable predictive model of the dyadic space that predicts future dyadic interaction based on the two different sets of data.
U.S. Pat. No. 7,949,931 discloses a method for error detection in a memory system. The method includes calculating one or more signatures associated with data that contains an error. It is determined if the error is a potential correctable error. If the error is a potential correctable error, then the calculated signatures are compared to one or more signatures in a trapping set. The trapping set includes signatures associated with uncorrectable errors. An uncorrectable error flag is set in response to determining that at least one of the calculated signatures is equal to a signature in the trapping set.
U.S. Pat. No. 7,912,140 discloses a method and a system for reducing computational complexity in a maximum-likelihood MIMO decoder, while maintaining its high performance. A factorization operation is applied on the channel Matrix H. The decomposition creates two matrixes: an upper triangular with only real-numbers on the diagonal and a unitary matrix. The decomposition simplifies the representation of the distance calculation needed for constellation points search. An exhaustive search for all the points in the constellation for two spatial streams t(1), t(2) is performed, searching all possible transmit points of (t2), wherein each point generates a SISO slicing problem in terms of transmit points of (t1); Then, decomposing x,y components of t(1), thus turning a two-dimensional problem into two one-dimensional problems. Finally searching the remaining points of t(1) and using Gray coding in the constellation points arrangement and the symmetry deriving from it to further reduce the number of constellation points that have to be searched.
U.S. Pat. No. 7,899,087 discloses an apparatus and method for performing frequency translation. The apparatus includes a receiver for receiving and digitizing a plurality of first signals, each signal containing channels and for simultaneously recovering a set of selected channels from the plurality of first signals. The apparatus also includes a transmitter for combining the set of selected channels to produce a second signal. The method of the present invention includes receiving a first signal containing a plurality of different channels, selecting a set of selected channels from the plurality of different channels, combining the set of selected channels to form a second signal and transmitting the second signal.
U.S. Pat. No. 7,885,792 discloses a method combining functionality from a matrix language programming environment, a state chart programming environment and a block diagram programming environment into an integrated programming environment. The method can also include generating computer instructions from the integrated programming environment in a single user action. The integrated programming environment can support fixed-point arithmetic.
U.S. Pat. No. 7,875,787 discloses a system and method for visualization of music and other sounds using note extraction. In one embodiment, the twelve notes of an octave are labeled around a circle. Raw audio information is fed into the system, whereby the system applies note extraction techniques to isolate the musical notes in a particular passage. The intervals between the notes are then visualized by displaying a line between the labels corresponding to the note labels on the circle. In some embodiments, the lines representing the intervals are color coded with a different color for each of the six intervals. In other embodiments, the music and other sounds are visualized upon a helix that allows an indication of absolute frequency to be displayed for each note or sound.
U.S. Pat. No. 7,873,127 discloses techniques where sample vectors of a signal received simultaneously by an array of antennas are processed to estimate a weight for each sample vector that maximizes the energy of the individual sample vector that resulted from propagation of the signal from a known source and/or minimizes the energy of the sample vector that resulted from interference with propagation of the signal from the known source. Each sample vector is combined with the weight that is estimated for the respective sample vector to provide a plurality of weighted sample vectors. The plurality of weighted sample vectors are summed to provide a resultant weighted sample vector for the received signal. The weight for each sample vector is estimated by processing the sample vector which includes a step of calculating a pseudoinverse by a simplified method.
U.S. Pat. No. 7,849,126 discloses a system and method for fast computing the Cholesky factorization of a positive definite matrix. In order to reduce the computation time of matrix factorizations, the present invention uses three atomic components, namely MA atoms, M atoms, and an S atom. The three kinds of components are arranged in a configuration that returns the Cholesky factorization of the input matrix.
U.S. Pat. No. 7,844,117 discloses an image digest based search approach allowing images within an image repository related to a query image to be located despite cropping, rotating, localized changes in image content, compression formats and/or an unlimited variety of other distortions. In particular, the approach allows potential distortion types to be characterized and to be fitted to an exponential family of equations matched to a Bregman distance. Image digests matched to the identified distortion types may then be generated for stored images using the matched Bregman distances, thereby allowing searches to be conducted of the image repository that explicitly account for the statistical nature of distortions on the image. Processing associated with characterizing image noise, generating matched Bregman distances, and generating image digests for images within an image repository based on a wide range of distortion types and processing parameters may be performed offline and stored for later use, thereby improving search response times.
U.S. Pat. No. 7,454,453 discloses a fast correlator transform (FCT) algorithm and methods and systems for implementing same, correlate an encoded data word with encoding coefficients, wherein each coefficient has k possible states. The results are grouped into groups. Members of each group are added to one another, thereby generating a first layer of correlation results. The first layer of results is grouped and the members of each group are summed with one another to generate a second layer of results. This process is repeated until a final layer of results is generated. The final layer of results includes a separate correlation output for each possible state of the complete set of coefficients.
Our inventor's certificate of USSR SU1319013 discloses a generator of basis functions generating basis function systems in form of sets of components of scarcely populated matrices, product of which is a matrix of a corresponding linear orthogonal transform. The generated sets of components serve as parameters of fast linear orthogonal transformation systems.
Finally, our inventor's certificate of USSR SU1413615 discloses another generator of basis functions generating wider class of basis function systems in form of sets of components of scarcely populated matrices, product of which is a matrix of a corresponding linear orthogonal transform.
It is believed that tensor-vector multiplications can be further accelerated, the methods of multiplication can be construed to become faster, and the systems for multiplication can be designed with smaller number of components.
Digital data often arises from the sampling of an analogue signal, for example by determining the amplitude of an analogue signal at specified times. The particular values derived from the sampling can constitute the components of a vector.
The linear operation upon the data can then be represented by the operation of a tensor upon the vector to produce a tensor of lower rank. Ordinarily tensors of higher than 2d order are not necessary, but are useful where the resulting signal may comprise multiple channels in form of a matrix or a tensor.
The operation of a digital filters comprises, or can be approximated by, the operation of a linear operator on a representation of the digital signal. In that case, the digital filter can be implemented by the operation of a tensor upon a vector. The present invention applies to both linear, time-invariant digital filters and adaptive filters whose coefficients are calculated and changed according to the system goal of optimization.
For a causal discrete-time multichannel (M-channel) direct-form FIR filter of order “N”, each value of the output sequence of each channel is a weighted sum of the most recent input values:
Here Xn,y−,n is a signal in an nth time slot. X denotes input, y denotes output to and from the filter.
In other words the output of m-th channel is described as:
which in matrix product notation is:
Here:
N—is filter order;
M—number of channels;
xn—is the input signal during the nth time slot,
—is the vector of output values of filters (or channels) number from 0 to M−1, or −ym,n —is the output value of filter number m during the nth time slot.
—is the matrix of filter coefficients and which is factored into a product of a commutator and a kernel, and bm,i—is the value of the impulse response at the i'th instant for 0<=i<=N of N-th order FIR filter number m for 0<=m<M. Since each filter channel is a direct form FIR filter then bm,i is also a coefficient of the filter.
The Hardware
The construction of the digital filter proceeds by building a network of dedicated modular filter components designed to implement various repetitive steps involved in progressively obtaining the result of operating upon the data vector. One benefit of the present invention is a significant reduction in the number of such modules. Those modules are preferably constructed in an integrated chip design primarily dedicated to the filtering function.
In particular several examples will be provided where the number of such modules is greatly reduced due to an improved logical structure. In particular, the burdensome task of calculating the action of the tensor upon a sequence of vectors is simplified by reorganizing the tensor into a commutator and a kernel. The commutator is a tensor of one degree higher order, but whose elements are simplified so that they are simply pointers to elements of the kernel. The kernel is a simple vector which contains only unique elements corresponding to the nonzero values present in the original tensor.
The multiplication proceeds by forming a matrix product of the kernel by the vector. All the non-trivial multiplication takes place during the formation of that matrix product. Subsequently the matrix product is contracted by the commutator to form the output vector.
In this manner, the present invention provides a significant improvement of the operation of any digital device constructed to execute the filtering function.
Accordingly within the present invention I provide a method and a system for tensor-vector multiplication, which is a further improvement of the existing methods and systems of this type.
In keeping with these objects and with others which will become apparent hereinafter, one feature of the present invention resides, briefly stated, in a method of tensor-vector multiplication, comprising the steps of factoring an original tensor into a kernel and a commutator; multiplying the kernel obtained by the factoring of the original tensor, by the vector and thereby obtaining a matrix; and summating elements and sums of elements of the matrix as defined by the commutator obtained by the factoring of the original tensor, and thereby obtaining a resulting tensor which corresponds to a product of the original tensor and the vector.
In accordance with another feature of the present invention, the method further comprises rounding elements of the original tensor to a desired precision and obtaining the original tensor with the rounded elements, wherein the factoring includes factoring the original tensor with the rounded elements into the kernel and the commutator.
Still another feature of the present invention resides in that the factoring of the original tensor includes factoring into the kernel which contains kernel elements that are different from one another, and the multiplying includes multiplying the kernel which contains the different kernel elements.
Still another feature of the present invention resides in that the method also comprises using as the commutator a commutator image in which indices of elements of the kernel are located at positions of corresponding elements of the original tensor.
In accordance with the further feature of the present invention, the summating includes summating on a priority basis of those pairs of elements whose indices in the commutator image are encountered most often and thereby producing the sums when the pair is encountered for the first time, and using the obtained sum for all remaining similar pairs of elements.
In accordance with still a further feature of the present invention, the method also includes using a plurality of consecutive vectors shifted in a manner selected from the group consisting of cyclically and linearly; and, for the cyclic shift, carrying out the multiplying by a first of the consecutive vectors and cyclic shift of the matrix for all subsequent shift positions, while, for the linear shift, carrying out the multiplying by a last appeared element of each of the consecutive vectors and linear shift of the matrix.
The inventive method further comprises using as the original tensor a tensor which is either a matrix or a vector.
In the inventive method, elements of the tensor and the vector can be elements selected from the group consisting of single bit values, integer numbers, fixed point numbers, floating point numbers, non-numeric literals, real numbers, imaginary numbers, complex numbers represented by pairs having one real and one imaginary components, complex numbers represented by pairs having one magnitude and one angle components, quaternion numbers, and combinations thereof.
Also in the inventive method, operations with the tensor and the vector with elements being non-numeric literals can be string operations selected from the group consisting of concatenation operations, string replacement operations, and combinations thereof.
Finally, in the inventive method, operations with the tensor and the vector with elements being single bit values can be logical operations and their logical inversions selected from the group consisting of logic conjunction operations, logic disjunction operations, modulo two addition operations, and combinations thereof.
The present invention also deals with a system for fast tensor-vector multiplication. The inventive system comprises means for factoring an original tensor into a kernel and a commutator; means for multiplying the kernel obtained by the factoring of the original tensor, by the vector and thereby obtaining a matrix; and means for summating elements and sums of elements of the matrix as defined by the commutator obtained by the factoring of the original tensor, and thereby obtaining a resulting tensor which corresponds to a product of the original tensor and the vector.
In the system in accordance with the present invention, the means for factoring the original tensor into the kernel and the commutator can comprise a precision converter converting tensor elements to desired precision and a factorizing unit building the kernel and the commutator; the means for multiplying the kernel by the vector can comprise a multiplier set performing all component multiplication operations and a recirculator storing and moving results of the component multiplication operations; and the means for summating the elements and the sums of the elements of the matrix can comprise a reducer which builds a pattern set and adjusts pattern delays and number of channels, a summator set which performs all summating operations, an indexer and a positioner which define indices and positions of the elements or the sums of elements utilized in composing the resulting tensor, the recirculator storing and moving results of the summation operations, and a result extractor forming the resulting tensor.
The novel features of the present invention are set forth in particular in the appended claims. The invention itself, however, will be best understood from the following description of the preferred embodiments, which is accompanied by the following drawings.
Digital filters may be utilized in audio or video systems where a signal originates in analog signals that are sampled to provide on incoming signal. An analog to digital converter produces the digital signal that is then operated upon, i.e. filtered, and typically sent to one or more digital to analog converters to be fed to various transducers. In many cases, the filter may operate upon signals that originate in a digital format, for example signals received from digital communication systems such as computers, cell phones or the like. The digital signal is operated upon in a system that employs a microprocessor and some memory to store data and filter coefficients. The system is integrated into specialized computers controlled by software.
Configurable Filter Bank
Time varying signal from a sensor such as microphone, vibration sensor, electromagnetic sensor, etc. is digitized to digital samples produced at a constant time rate.
Each new sample is passed to “input for vectors” of a block 1 (
Each new sample of each filter in the filter bank is produced in the system 1 and sequentially conveyed to a multichannel output marked as “output for resulting tensor” on
The numerical precision of the filter bank is defined by a value present at the input marked as “input for precision values” on
The impulse response of each filter of the filter bank is defined by values simultaneously present at the input marked “input for original tensor”. The size of this input is equal to the impulse response size of the longest filter of the filter bank and the number of filters on the bank.
Additionally, the input signal can be interchangeably sampled from more than one sensor. In this case the number of physical channels multiplexed to a single “input for vectors” is more than one. In this case the output samples present at the “output for resulting tensor” belong to different physical inputs and are interleaved similarly to input samples. The number of such channels is provided as a value present at the input marked as “input for number of channels”.
In these examples, the system 1 includes means 2 for factoring an original tensor into a kernel and a commutator, means 3 for multiplying the kernel obtained by the factoring of the original tensor, by the vector and thereby obtaining a matrix, and means 4 for summating elements and sums of elements of the matrix as defined by the commutator obtained by the factoring of the original tensor, and thereby obtaining a resulting tensor which corresponds to a product of the original tensor and the vector.
In the system in accordance with the present invention, the means 2 for factoring the original tensor into the kernel and the commutator can comprise a precision converter 5 converting tensor elements to desired precision and a factorizing unit 6 building the kernel and the commutator. The precision converter can be a digital circuit comprising bitwise logical AND operation on the input values of the tensor and the desired precision value in form of a bit mask with the number of bits in it similar to the number of bits in the tensor elements. For full precision all precision value bits must be logical ones. In this case the logical AND operation preserves all bits in the tensor elements. If least significant bit of the mask is set to logical zero, the precision of the resulting tensor elements decreases 2 times since their least significant bit becomes zero. If several least significant bits of the mask are set to logical zero, the precision of the resulting tensor elements decreases by 2 times per each zeroed bit.
The factoring unit 6 may be implemented as a processor controlled circuit performing the below algorithm.
The means 3 for multiplying the kernel by the vector can comprise a multiplier set 7 performing all component multiplication operations and a recirculator 8 storing and moving results of the component multiplication operations. The means 4 for summating the elements and the sums of the elements of the matrix can comprise a reducer 9 which builds a pattern set and adjusts pattern delays and number of channels, a summator set 10 which performs all summating operations, an indexer 11 and a positioner 12 which together define indices and positions of the elements or the sums of elements utilized in composing the resulting tensor. The recirculator 8 stores and moves results of the summation operations. A result extractor 13 forms the resulting tensor.
The multiplier set 7 can comprise of several amplifiers with their gain being controlled by the values of the corresponding elements of kernel. For digital implementation the multiplier set can comprise of the number of multipliers corresponding to the number of elements of kernel. Each multiplier takes the same input signal and multiplies it to the kernel element corresponding to this multiplier.
Recirculator 8 can comprise of a number of separate tapped delay lines (in digital implementation each delay line is a chain of N digital registers connected so that for every clock cycle the data from register n−1 is passed to the register n where n is 2 to N). The number of delay lines is corresponding to the number of kernel elements and the number of elements of the output tensor and the number of intermediate terms obtained in the system. All the resulting values produced by multiplier set and summator set are directed to the inputs of the corresponding delay lines. The previously calculated values propagate along the delay lines until they reach the end of the delay lines and disappear.
The reducer 9 is presented in
A reducer may be implemented as a processor controlled circuit performing decomposition of the operation defined by commutator to the number of individual 2-argument summation operations performed by summator set. It also provides control information to the indexer, positioner, and result extractor.
The summator set consists of several digital 2-imput addition units with their inputs connected through multiplexers to taps of the delay lines of the recirculator in according to the nonzero value positions of the commutator and defined by the reducer. The outputs of the addition units are connected to the inputs of corresponding delay lines of the recirculator as defined by the reducer.
Indexer is a set of hardware multiplexers that connect outputs of delay lines of the recirculator to inputs of delay lines of the result extractor. The configuration of the multiplexers is defined by reducer.
Positioner 12 can comprise a set of hardware multiplexers that connect outputs of the result extractor to corresponding taps of result extractor delay lines. The configuration of the multiplexers is defined by a reducer.
Result extractor 13 is a set of tapped delay lines that is controlled and used by indexer and positioner.
The components described above are connected in the following way. Input 21 of the precision converter 5 is the input for the original tensor of the system 1. It contains the transformation tensor [{tilde over (T)}]N
The reducer 9 is presented in
The components of the reducer 9 are connected in the following way. Input 51 of the pattern set builder 14 is the input 28 of the reducer 9. It contains the entirety of the obtained commutator image [Y]N
In the embodiment, the delay adjuster 15 operates first and its output is supplied to the input of the number of channels adjuster 16. Alternatively, it is also possible to arrange the above components so that the number of channels adjuster 16 operates first and its output is supplied to the input of the delay adjuster 15.
Functional algorithmic block-diagrams of the precision converter 5, the factorizing unit 6, the multiplier set 7, the summator set 10, the indexer 11, the positioner 12, the recirculator 8, the result extractor 13, the pattern set builder 14, the delay adjuster 15, and the number of channels adjuster 16 are present in
Fast Tensor Vector Multiplication
In accordance with the present invention the method for fast tensor-vector multiplication includes factoring an original tensor into a kernel and a commutator. The process of factorization of a tensor consists of the operations described below. A tensor is
[T]N
To obtain the kernel and the commutator, the tensor [T]N
The length of the kernel is set to 0:
L
0;
Initially the kernel is an empty vector of length zero:
[U]L[ ];
The commutator image is the tensor [Y]N
[Y]N
The indices n1, n2, . . . , nm, . . . , nm are initially set to 1:
n
1
1;n21; . . . ;nm1; . . . ;nM1;
n
1
,n
2
, . . . ,n
m
, . . . ,n
M
n
mε[1,Nm],mε[1,M]
Then for each set of indices n1, n2, . . . , nm, . . . , nM, where nm ε[1, Nm], mε[1, M], the following operations are carried out:
Step 1:
If the element tn
Step 2:
The length of the kernel is increased by 1:
L
L+1;
The element tn
The intermediate tensor [P]N
[P]N
All elements of the tensor [T]N
[T]N
To the representation of the commutator, the tensor [Y]N
[Y]N
Next go to step 3.
Step 3:
The index m is set equal to M:
m
M;
Next go to step 4.
Step 4:
The index nm is increased by 1:
n
m
n
m+1;
If nm≦Nm, go to step 1. Otherwise, go to step 5.
Step 5:
The index nm is set equal to 1:
n
m
1;
The index m is reduced by 1:
m
m−1;
If m≧1, go to step 4. Otherwise the process is terminated.
The results of the process described herein for the factorization of the tensor [T]N
Here, a tensor
[T]N
of dimensions Πm=1MNm containing L≦Πm=1MNm distinct nonzero elements. The kernel
is obtained, containing all the distinct nonzero elements of the tensor [T]N
From the same tensor [T]N
[Y]N
was generated, with the same dimensions Πm=1MNm as the original tensor [T]N
From the resulting intermediate tensor [Y]N
[Z]N
as a tensor of rank M+1, was obtained by replacing every nonzero element yn
The tensor [T]N
[T]N
Further in the inventive method, the kernel [U]L obtained by the factoring of the original tensor [T]N
[Z]N
and the kernel
Then the product of the tensor [T]N
In this expression each nested sum contains the same coefficient (ul·vn) which is an element of the matrix [P]L,N which is the product of the kernel [U]L and the transposed vector [V]N
[P]L,N=[U]L·[V]N
Then elements and sums of elements of the matrix as defined by the commutator are summated, and thereby a resulting tensor which corresponds to a product of the original tensor and the vector is obtained as follows.
The product of the tensor [T]N
[R]N
Thus the multiplication of a tensor by a vector of length Nm may be carried out in two steps. First, the matrix is obtained which contains the product of each element of the original vector and each element of the kernel [T]N
Thus the ratio of the number of operations with a method using the decomposition of the vector into a kernel and a commutator to the number of operations required with a method that does not include such a decomposition is
for addition and
for multiplication.
The inventive method can include rounding of elements of the original tensor to a desired precision and obtaining the original tensor with the rounded elements, and the factoring can include factoring the original tensor with the rounded elements into the kernel and the commutator as follows.
For the original tensor [{tilde over (T)}]N
Still another feature of the present invention resides in that the factoring of the original tensor includes factoring into the kernel which contains kernel elements that are different from one another. This can be seen from the process of obtaining intermediate tensor in the recursive process of building the kernel and the commutator, where the said intermediate tensor [P]N
Thereby, the multiplying includes only multiplying the kernel which contains the different kernel elements.
In the method of the present invention as the commutator [Z]N
In this case the product of the tensor [T]N
[R]N
This representation of the commutator can be used for the process of tensor factoring and for the process of building fast tensor-vector multiplication computational structures and systems.
The summating can include summating on a priority basis of those pairs of elements whose indices in the commutator image are encountered most often and thereby producing the sums when the pair is encountered for the first time, and using the obtained sum for all remaining similar pairs of elements.
It can be carried out with the aid of a preliminary synthesized computation control structure presented in the embodiment in a matrix form. This structure, along with the input vector, can be used as an input data for a computer algorithm for carrying out a tensor-vector multiplication. The same preliminary synthesized computation control structure can be further used for synthesis a block diagram of a system to perform multiplication of a tensor by a vector.
The computation control structure synthesis process is described below as following. The four objects—the kernel [U]L, the commutator image [Y]N
{K+(M−1)·N|Kε[1,N],Mε[0,∞]}.
The process of constructing a description of the computational system for performing one iteration of multiplication by a factored tensor contains the steps described below.
For a given kernel [U]L, commutator tensor [Y]N
The empty matrix
[Q]0,4[ ];
is initialized, to which the combinations
[P]4=[p1p2p3p4]
are to be added. These combinations are represented by vectors of length 4. In every such vector the first element p1 is the identifier or index of the combination. These numbers are an extension of the numeration of elements of the kernel. Thus the index of the first combination is L+1, and each successive combination has an index one more than the preceding combination:
q
1,1
=L+1,qn,1=qn−1,1+1,n>1
The second element p2 of each combination is an element of the subset
{[Y]n
of elements of the commutator tensor [Y]N
The third element p3 of the combination represents an element of the subset
{[Y]n
of elements of the commutator tensor [Y]N
The fourth element p4 ε[1, N1−1] of the combination represents the distance along the dimension N1 between the elements equal to p2 and p3 in the commutator tensor [Y]N
The index of the first element of the combination is set equal to the dimension of the kernel:
p
1
L;
Here ends the initialization and begins the iterative section of the process of constructing a description of the computational structure.
Step 1:
The variable containing the number of occurrences of the most frequent combination is set equal to 0:
α0;
Go to step 2.
Step 2:
The index of the second element is set equal to 1:
p
2
1;
Go to step 3.
Step 3:
The index of the third element of the combination is set equal to 1:
p
3
1;
Go to step 4.
Step 4:
The index of the fourth element is set equal to 1:
p
4
1;
Go to step 5.
Step 5:
The variable containing the number of occurrences of the combination is set equal to 0:
β0;
The indices n1, n2, . . . , nm, . . . , nM are set equal to 1:
n
1
1;n21; . . . ;nm1; . . . ;nM1;
Go to step 6.
Step 6:
The elements of the commutator tensor [Y]N
[Θ]NM={θη|ηε[1,NM]}{yn
Go to step 7.
Step 7:
If θn
Step 8:
The variable containing the number of occurrences of the combination is increased by 1:
ββ+1;
The elements θn
θnM0;
θn
If β≦αα, skip to step 10. Otherwise, go to step 9.
Step 9:
The variable containing the number of occurrences of the most frequently occurring combination is set equal to the number of occurrences of the combination:
αβ;
The most frequently occurring combination is recorded:
[P]4[p1+1p2p3p4];
Go to step 10.
Step 10:
The index m is set equal to M:
m
M;
Go to step 11.
Step 11:
The index nm is increased by 1:
n
m
n
m+1;
If nm≦Nm, then if m=M, go to step 7, and if m<M, go to step 6. If nm>Nm, go to step 12.
Step 12:
The index nm is set equal to 1:
n
m
1;
The index m is decreased by 1:
m
m−1;
If m≧1,go to step 11. Otherwise, go to step 13.
Step 13:
The index of the fourth element of the combination is increased by 1:
p
4
p
4+1;
If p4<NM, go to step 4. Otherwise go to step 14.
Step 14:
The index of the third element of the combination is increased by 1:
p
3
p
3+1;
If p3≦p1, go to step 3. Otherwise, go to step 15.
Step 15:
The index of the second element of the combination is increased by 1:
p
2
p
2+1;
If p2 Pt, go to step 2. Otherwise, go to step 16.
Step 16:
If a>0, go to step 17. Otherwise, skip to step 18.
Step 17:
The index of the first element is increased by 1:
p
1
p
1+1;
To the matrix of combinations the most frequently occurring combination is added:
Go to step 18.
Step 18:
The indices n1, n2, . . . ,nm, . . . ,nm are set equal to 1:
n
1
1;n21; . . . ;nm1; . . . ;nM1;
Go to step 19.
Step 19:
If yn
Step 20:
The element yn
y
n
,n
, . . . ,n
, . . . ,n
0;
The element yn
y
n
,n
, . . . ,n
, . . . ,n
p
1;
Go to step 21.
Step 21:
The index m is set equal to M:
m
M;
Go to step 22.
Step 22:
The index nm is increased by 1:
n
m
n
m+1;
If m<M and nm≦Nm or m=M and nm≦Nm−p4, then go to step 19. Otherwise, go to step 23.
Step 23:
The index nm is set equal to 1:
n
m
1;
The index m is decreased by 1:
m
m−1;
If m≧1, go to step 22. Otherwise, go to step 24.
Step 24:
At the end of each row of the matrix of combinations, append a zero element:
Go to step 25.
Step 25:
The variable fl is set equal to the number p1−L of rows in the resulting matrix of combinations
[Q]p
Ωp1−L;
Go to step 26.
Step 26:
The index μ is set equal to 1:
μ1;
Go to step 27.
Step 27:
The index is set equal to one more than the index μ:
ξμ+1;
Go to step 28.
Step 28:
If pμ,1≠qξ,2, skip to step 30. Otherwise, go to step 29.
Step 29:
The element qξ,4 of the matrix of combinations is decreased by the value of the operational delay δ:
q
ξ,4
q
ξ,4−δ;
Go to step 30.
Step 20:
If pμ, 1≠qξ,3, skip to step 32. Otherwise, go to step 31.
Step 31:
The element qξ,5 of the matrix of combinations is decreased by the value of the operational delay δ:
q
ξ,5
q
ξ,5−δ;
Go to step 32.
Step 32:
The index is increased by 1:
ξξ+1;
If ξ≦Ω, go to step 28. Otherwise go to step 33.
Step 33:
The index μ is increased by 1:
μμ+1;
If μ<Ω, go to step 27. Otherwise go to step 34.
Step 34:
The cumulative operational delay of the computational scheme is set equal to 0:
Δ0;
The index μ is set equal to 1:
μ1;
Go to step 35.
Step 35:
The index ξ is set equal to 4:
ξ4;
Go to step 36.
Step 36:
If Δ>qμ,86, skip to step 38. Otherwise, go to step 37.
Step 37:
The value of the cumulative operational delay of the computational scheme is set equal to the value of qμ,ξ:
Δqμ,ξ;
Go to step 38.
Step 38:
The index n is increased by 1:
ξξ+1;
If ξ≦5, go to step 36. Otherwise, go to step 39.
Step 39:
The index μ is increased by 1:
μμ+1;
If μ<Ω, go to step 35. Otherwise, go to step 40.
Step 40:
To each element of the two rightmost columns of the matrix of combinations, add the calculated value of the cumulative operational delay of the computational scheme:
{qμ,ξqμ,ξ+Δ|με[1,Ω],ξε[4,5]};
Go to step 41.
Step 41:
After step 24, any subset {yn
The tensor [D]N
[D]N
The indices of the combinations comprising the resultant tensor [R]N
[R]N
Go to step 42.
Step 42:
Each of the elements of the two rightmost columns of the matrix of combinations is multiplied by the number of channels σ:
{qμ,ξσ·qμ,ξ|με[1,Ω],ξε[4,5]};
The construction of the computational structure is concluded. The results of this process are:
The described above computational structure serves as the input for an algorithm of fast tensor-vector multiplication. The algorithm and the process of carrying out of such multiplication is described below as following.
The initialization step consists of allocating memory within the computational system for the storage of copies of all components with the corresponding time delays. The iterative section is contained within the waiting loop or is activated by an interrupt caused by the arrival of a new element of the input tensor. It results in the movement through the memory of the components that have already been calculated, the performance of operations represented by the rows of the matrix of combinations [Q]Ω,5 and the computation of the result. The following is a more detailed discussion of one of the many possible examples of such a process.
For a given initial vector of length NM, number σ of channels, cumulative operational delay Δ, matrix [Q]Ω,5 of combinations, kernel vector [U]ω
Step 1 (Initialization):
A two-dimensional array is allocated and initialized, represented here by the matrix [Φ]ω
[Φ]ω
The variable ξ, serving as the indicator of the current column of the matrix [Φ]ω
ξσ·(Nm+Δ);
Go to step 2.
Step 2:
Obtain the value of the next element of the input vector and record it in variable χ.
The indicator ξ of the current column of the matrix [Φ]ω
ξ1+(ξ)mod(σ·(NM+Δ));
The product of the variable χ by the elements of the kernel [U]ω
{φμ,ξχ·uμ|με[1,ω1,1−1]};
The variable μ, serving as an indicator of the current row of the matrix of combinations [Q]Ω,5 is initialized:
μ1;
Go to step 3.
Step 3:
Find the new value of combination μ and assign it to the element φμ+ω
Φμ+ω
The variable μ is increased by 1:
μμ+1;
Go to step 4.
Step 4:
If ≦Ω, go to step 3. Otherwise, go to step 5.
Step 5:
The elements of the tensor [P]N
[P]N
If all elements of the input vector have been processed, the process is concluded and the tensor [P]N
When a digital or an analog hardware platform must be used for performing the operation of tensor-vector multiplication, a schematic of such system can be synthesized with the usage of the same computation control structure as the one used for guiding the process above. The synthesis of such schematic represented in a form of a component set with their interconnections is described below.
There are a total of three basic elements used for synthesis. For a synchronous digital system these elements are: a time delay element of one system count, a two-input summator with an operational delay of δ system counts, and a scalar multiplication operator. For an asynchronous analog system or an impulse system, these are a delay time between successive elements of the input vector, a two-input summator with a time delay of δ element counts, and a scalar multiplication component in the form of an amplifier or attenuator.
Thus, for an input vector of length Nm, number of channels σ, matrix [Q]Ω,5 of combinations, kernel vector [U]ω
Step 1:
The initially empty block diagram of the system is generated, and within it the node “N—0” which is the input port for the elements of the input vector.
The variable ξ is initialized, serving as the indicator of the current element of the kernel [U]ω
ξ1;
Go to step 2.
Step 2:
To the block diagram of the apparatus add the node “N_ξ—0” and the multiplier “M_ξ—0” the input of which is connected to the node “N—0”, and the output to the node “N+ξ—0”.
The value of the indicator of the current element of the kernel [U]ω
ξξ+1;
Go to step 3.
Step 3:
If ≧ω1,1, go to step 2. Otherwise, go to step 4.
Step 4:
The variable μ is initialized, serving as an indicator of the current row of the matrix of combinations [Q]Ω,5:
μ1;
Go to step 5.
Step 5:
To the block diagram of the system add the node “N_qμ,1—0” and the summator “A_qμ,1” the output of which is connected to the node “N_qμ,1—0”.
The variable ξ is initialized, serving as an indicator of the number of the input of the summator “A_qμ,1”:
ξ1;
Go to step 6.
Step 6:
The variable γ is initialized, storing the delay component index offset:
γ0;
Go to step 7.
Step 7:
If the node N_qμ,ξ+1_qμ,ξ+3−γ has already been initialized, skip to step 12. Otherwise, go to step 8.
Step 8:
To the block diagram of the system add the node N_qμ,ξ+1_qμ,ξ+3−γ and a unit delay Z_qμξ+1_qμ,ξ+3−y, the output of which is connected to the node N_qμ,ξ+1_qμ,ξ+3−γ.
If γ>0, go to step 10. Otherwise, go to step 9.
Step 9:
Input number of the summator “A_qμ,1” is connected to the node N_qμ,ξ+1_qμ,ξ+3.
Go to step 11
Step 10:
The input of the element of one count delay Z_qμ,ξ+1_qμ,ξ+3−γ is connected to the node N_qμ,ξ+1_qμ,ξ+3−γ+1.
Go to step 11.
Step 11:
The delay component index offset is increased by 1:
γγ+1;
If γ<2, go to step 7. Otherwise, go to step 12.
Step 12:
The indicator μ of the current row of the matrix of combinations [Q]Ω,5 is increased by 1:
μμ+1;
If ≦Ω, go to step 5. Otherwise, go to step 13.
Step 13:
From each element of the delay tensor [D]N
[D]N
The indices n1, n2, . . . , nm, . . . , nM−1 are set equal to 1:
n
1
1;n21; . . . ;nm1; . . . ;nM1;
Go to step 14.
Step 14:
To the block diagram of the system add the node N_n1_n2— . . ._nm— ..._nm−1at the output of the element n1, n2, . . . , nm, . . . , nM−1 of the result of multiplying the tensor by the vector.
Go to step 15.
Step 15:
The variable γ is initialized, storing the delay component index offset:
γ0;
Go to step 16.
Step 16:
If the node N_rn
Step 17:
To the block diagram of the system introduce the node N_n
If γ>0, Go to step 18. Otherwise skip to step 19.
Step 18:
The output of the delay element Z_rn
Go to step 19.
Step 19:
The output of the delay element Z_rn
Go to step 20.
Step 20:
The delay component index offset is increased by 1:
γγ+1;
Go to step 16.
Step 21:
If γ>0, skip to step 23. Otherwise, go to step 22.
Step 22:
The node N_rn
Go to step 23.
Step 23:
The index m is set equal to M:
m
M;
Go to step 24.
Step 24:
The index nm is increased by 1:
If m<M and nm≦Nm then go to step 14. Otherwise, go to step 25.
Step 25:
The index nm is set equal to 1:
n
m
1;
The index m is decreased by 1:
m
m−1;
If m≧1, go to step 24. Otherwise, the process is concluded.
The described process of synthesis of the computation description structure along with the process and the synthesized schematic for carrying out a continuous multiplying of incoming vector by a tensor represented in a form of a product of the kernel and the commutator, enable usage of minimal number of addition operations which are carried out on the priority basis.
In the method of the present invention a plurality of consecutive cyclically shifted vectors can be used; and the multiplying can be performed by multiplying a first of the consecutive vectors and cyclic shift of the matrix for all subsequent shift positions. This step of the inventive method is described herein below.
The tensor
[T]N
containing
L≦Π
k=1
M
N
k
distinct nonzero elements is to be multiplied by the vector
and all its circularly-shifted variants:
The tensor [T]N
[Z]N
and the kernel
First the product of the tensor [T]N
[R]N
, where pl,n are the elements of the matrix [P]L,N
To obtain the succeeding value, the product of the tensor [T]N
the new matrix [P1]L,N
Clearly, the matrix [P1]L,N
pk
l,1=(n−1+k)mod(N
)=pl,n
p
k
=p
l,1+(n−1+k)mod(N
)
All elements pk,
The recursive multiplication of a tensor by a vector of length Nm may be carried out in two steps. First the tensor [P]N
Thus the ratio of the number of operations with a method using the decomposition of the tensor into a kernel and a commutator to the number of operations required with a method that does not include such a decomposition is
for addition and
for multiplication.
In the method of the present invention a plurality of consecutive linearly shifted vectors can also be used and the multiplying can be performed by multiplying a last appeared element of each of the consecutive vectors and linear shift of the matrix. This step of the inventive method is described herein below.
Here the objective is sequential and continuous, which is to say iterative multiplication of a known and constant tensor
[T]N
containing
L≦Π
k=1
M
N
k
distinct nonzero elements, by a series of vectors, each of which is obtained from the preceding vector by a linear shift of each of its elements one position upward. At each successive iteration the lowest position of the vector is filled by a new element, and the uppermost element is lost. At each iteration the tensor [T]N
after obtaining the matrix [P1]L,N
In its turn the tensor [T]N
Obviously, at the previous iteration the tensor [T]N
and therefore there exists a matrix [P0]L,N
The matrix [P1]L,N
and the new value vN
Each element {p1l,n|lε[1,L],nε[1,Nm−1]} of the matrix [P1]L,N
Thus, iteration iε[1,∞[ is written as:
Every such iteration consists of two steps—the first step contains all operations of multiplication and the formation of the matrix [Pi]L,N
Thus the ratio of the number of operations with a method using the decomposition of the vector into a kernel and a commutator to the number of operations required with a method that does not include such a decomposition is
for addition and
for multiplication.
The inventive method further comprises using as the original tensor a tensor which is a matrix. The examples of such usage are shown below.
Factorization of the original tensor which is a matrix is carried out as follows.
The original tensor which is a matrix
has dimensions M×N and contains L≦M·N distinct nonzero elements. Here, the kernel is a vector
consisting of all the unique nonzero elements of the matrix [T]M,N.
This same matrix [T]M,N is used to form a new intermediate matrix
of the same dimensions M×N as the matrix [T]M,N each of whose elements is either equal to zero or equal to the index of the element of the vector [U]L, which is equal in value to this element of the matrix [T]M,N. The matrix [Y]M,N can be obtained by replacing each nonzero element tm,n of the matrix [T]M,N by the index l of the equivalent element ul in the vector [U]L.
From the resulting intermediate matrix [Y]M,N the commutator
[Z]M,N,L={Zm,n,l|mε[1,M],nε[1,N],lε[1,L]}
a tensor of rank 3, is obtained by replacing each nonzero element ym,n of the matrix [Y]M,N by the vector of length L with all elements equal to 0 if ym,n=0, or with a single unit element in the position corresponding to the nonzero value of ym,n and L−1 zero elements in all other positions.
The resulting commutator can be expressed as:
The factorization of the matrix [T]M,N is equivalent to the convolution of the commutator [Z]M,N,L with the kernel [U]L:
[T]M,N=[z]M,N,L·[U]L={Σl=1l=Lzm,n,l·ul|mε[1,M],nε[1,N]}
An example of factorization of the original tensor which is a matrix is shown below.
The matrix
of dimension M×N=4×3 contains L=5 distinct nonzero elements 2, 3, 5, 7, and 9 comprising the kernel
From the intermediate matrix
the following commutator, a tensor of rank 3, is obtained:
The matrix [T]M,N has the form of the convolution of the commutator[Z]M,N,L with the kernel [U]L:
A factorization of the original tensor which is a matrix whose rows constitute all possible permutations of a finite set of elements is carried out as follows.
For finitely many distinct nonzero elements
E={e
1
,e
2
, . . . ,e
k},
the matrix [T]M,N, of dimensions M×N and containing L≦M˜N distinct nonzero elements, whose rows constitute a complete set of the permutations of the elements of E of length M will contain N columns and M=kN rows:
From this matrix the kernel is obtained as the vector
consisting of all the distinct nonzero elements of the matrix [T]M,N.
From the same matrix [T]M,N the intermediate matrix
is obtained, with the same dimensions M×N as the matrix [T]M,N and with each element equal either to zero or to the index of that element of the vector [U]L which is equal in value to this element of the matrix [T]M,N. The matrix [Y]M,N may be obtained by replacing each nonzero element tm,n of the matrix [T]M,N by the index 1 of the equivalent element ul of the vector [U]L.
From the resulting intermediate matrix [Y]M,N the commutator,
[Z]M,N,L={Zm,n,l|mε[1,M],nε[1,N],lε[1,L]}
a tensor of rank 3, is obtained by replacing each nonzero element ym,n of the matrix [Y]M,N by the vector of length L, with all elements equal to 0 if ym,n=0, or with a single unit element in the position corresponding to the nonzero value of ym,n and L−1 elements equal to 0 in all other positions.
The resulting commutator may be written as:
The factorization of the matrix [T]M,N is of the form of the convolution of the commutator [Z]M,N,L with the kernel [U]L:
[T]M,N=[Z]M,N,L[U]L={Σl=1l=Lzm,n,l·ul|mε[1,M],nε[1,N]}
An example of factorization of the original tensor which is a matrix whose rows constitute all possible permutations of a finite set of elements is shown below.
The matrix
of dimensions M×N=4×3 contains L=5 distinct nonzero elements 2, 3, 5, 7, and 9 constituting the kernel
From the intermediate matrix
the following commutator, a tensor of rank 3, is obtained:
The matrix [T]M,N is equal to the convolution of the commutator [Z]M,N,L and the kernel [U]L:
The inventive method further comprises using as the original tensor a tensor which is a vector. The example of such usage is shown below.
A vector
has length N and contains L≦N distinct nonzero elements. From this vector the kernel consisting of the vector
is obtained by including the unique nonzero elements of [T]N in the vector [U]L, in arbitrary order.
From the same vector [T]N the intermediate vector
is formed, with the same dimension N as the vector [T]N and with each element equal either to zero or to the index of the element of the vector [U]L which is equal in value to this element of vector [T]N. The vector [Y]N can be obtained by replacing every nonzero element tn of the vector [T]N by the index l of the element ul of the vector [U]L that has the same value.
From the intermediate vector [Y]N the commutator
is obtained by replacing every nonzero element yn of the vector [Y]N with a row vector of length L, with a single unit element in the position with index equal to the value of yn and L−1 zero elements in all other positions. The resulting commutator is represented as:
The vector [T]N is factored as the product of the multiplication of the commutator [Z]N,L by the kernel [U]L:
An example of factorization of the original tensor which is a vector is shown below.
The vector
of length N=7 contains L=3 distinct nonzero elements, 1, 5, and 7, which yield the kernel
From the intermediate vector
the commutator
is obtained.
The factorization of the vector [T]N is the same as the product of the multiplication of the commutator [Z]N,L by the kernel [U]L:
In the inventive method, the elements of the tensor and the vector can be single bit values, integer numbers, fixed point numbers, floating point numbers, non-numeric literals, real numbers, imaginary numbers, complex numbers represented by pairs having one real and one imaginary components, complex numbers represented by pairs having one magnitude and one angle components, quaternion numbers, and combinations thereof.
Also in the inventive method, operations with the tensor and the vector with elements being non-numeric literals can be string operations such as string concatenation operations, string replacement operations, and combinations thereof.
Finally, in the inventive method, operations with the tensor and the vector with elements being single bit values can be logical operations such as logic conjunction operations, logic disjunction operations, modulo two addition operations with their logical inversions, and combinations thereof.
The present invention also deals with a system for fast tensor-vector multiplication. The inventive system shown in
On
Input signal samples are supplied to the input S of size 1. Output samples come from multichannel output c of size 32. Each channel of the output s is a corresponding element of the result of the matrix-vector multiplication or, in other words, the filtered signal samples of channel 1 to 32. values Blocks uz1 . . . uz12 perform matrix multiplication according to the kernel-multiplexer matrix decomposition.
Blocks uz1 . . . uz12 internal structure is shown on
All “mm” blocks (matrix multiply) do not use scalar products since they multiply by only zeros and ones and essentially are multiplexers controlled by corresponding elements of multiplexer tensor.
Each block uz1 . . . 12 takes one element of the kernel and a part of the multiplexer associated with the kernel element. Alternative implementation of the system is shown on
On
Input signal samples are supplied to the input S of size 1. Output samples come from multichannel output c of size 128. Each channel of the output s is a corresponding element of the result of the matrix-vector multiplication or, in other words, the filtered signal samples of channel 1 to 128. values Blocks uz1 . . . uz16 perform matrix multiplication according to the kernel-multiplexer matrix decomposition. Blocks uz1 . . . 16 internal structure is the same to the 20×32 matrix multiplier.
On
Input signal samples are supplied to the input S of size 1. Output samples come from multichannel outputs c+ and c− each of size 1024. Each channel of the output s is a corresponding element of the result of the matrix-vector multiplication or, in other words, the filtered signal samples of channel 1 to 2048. values Blocks uz1 . . . uz20 perform matrix multiplication according to the kernel-multiplexer matrix decomposition. Blocks uz1 . . . 20 internal structure is the same to the 20×32 and 28×128 matrix multiplier.
The present invention is not limited to the details shown since further modifications and structural changes are possible without departing from the main spirit of the present invention.
What is desired to be protected by Letters Patent is set forth in particular in the appended claims.
This patent application contains the subject matter of my U.S. patent application Ser. No. 13/726,367 filed on Dec. 24, 2012, which in turn claims priority of U.S. provisional application 61/723,103 filed November, 6th 2012 for method and system for fast calculation of tensor-vector multiplication, from which this patent application claims its priority under 35 USC 119(a)-(d).
Number | Date | Country | |
---|---|---|---|
61723103 | Nov 2012 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13726326 | Dec 2012 | US |
Child | 14748541 | US |