The present disclosure generally relates to machine learning models, and more specifically, relates to feature extraction for machine learning models.
Machine learning is a category of artificial intelligence. In machine learning, a model is defined by a machine learning algorithm. A machine learning algorithm is a mathematical and/or logical expression of a relationship between inputs to and outputs of the machine learning model. The model is trained by applying the machine learning algorithm to input data. A trained model can be applied to new instances of input data to generate model output. Machine learning model output can include a prediction, a score, or an inference, in response to a new instance of input data. Application systems can use the output of trained machine learning models to determine downstream execution decisions, such as decisions regarding various user interface functionality.
The disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the disclosure. The drawings, however, should not be taken to limit the disclosure to the specific embodiments, but are for explanation and understanding only.
Aspects of the present disclosure are directed to feature extraction using scalable wavelet transformers. The disclosed methods are useful for extracting features from digital data such as audio, images, text, video, or multimodal data, for purposes of training and/or operating machine learning models and/or creating compressed versions of the digital data.
In some cases, the input data (also referred to as features) used to train machine learning models include data sets that contain raw data, such as digital audio files, video files, image files or text files. In other cases, data sets containing raw data are not used as the input data (or features) but instead are processed using one or more computational techniques to create the input data (or features) that are used to train machine learning models. As data sets become larger and larger and as there are an ever-increasing number of data sets available, the preparation of machine learning model inputs (i.e., features) increasingly strains computing resources.
For example, if a particular data set is not represented in the inputs used to train a machine learning model, the trained machine learning model is unable to generate statistically reliable outputs for that data set. For instance, if the data set used to train a machine learning model only includes images of cats, then at inference time the machine learning model will not be able to classify images of dogs as dogs but rather might label images of dogs as “not cats.” Thus, the size, number and diversity of data sets used in training are often strongly correlated with a machine learning model's predictive reliability. However, as the size, number, and diversity of training data sets increases, both the training and inference processes of the prior machine learning model approaches, such as convolutional neural network-based approaches, become slower and less efficient.
The traditional way to improve speed and efficiency for machine learning models is to employ conventional transformers that can perform parallel processing of model inputs, but which do not use wavelet functions in conjunction with the parallel processing. However, conventional transformers, which are not scalable wavelet transformers, are paired with kernel operations. The kernel operations lose information during the transformation process performed by the conventional transformer. For example, a kernel operation is limited by the kernel size and can therefore only capture either local features or global features, but not both local features and global features. For example, because a kernel is of a set size, the conventional transformers only learn features for the specific kernel size (e.g., a large kernel extracts only global features while a small kernel extracts only local features). Local features include, for example, data patterns in small segments of data (e.g., a pattern of fur on a dog's head in a picture of the dog) whereas global features are data patterns in larger segments of data (e.g., the outline of the entire dog's body).
As a result, using conventional transformer approaches that do not use scalable wavelet transformers, either local features or global features are excluded from the machine learning model training. Additionally, even if a particular data set is included in the training, the kernel operations may only select certain values of the data set and discard the rest. As a result, potentially valuable training data is lost. Because data values are discarded, this loss of information during model training is irreversible. Because the loss of information is irreversible, the machine learning model would need to be retrained with training data that reflects the lost information.
Aspects of the present disclosure address the above and other deficiencies by configuring scalable wavelet transformers to perform feature extraction (e.g., the process of extracting features from data sets). By using multiple layers of wavelet filters as well as transformers, the machine learning system is able to learn more comprehensive information about the system because both local and global features can be used instead of just local features or just global features and both spectral and temporal features can be used instead of just spectral or temporal features). As a result, the disclosed feature extraction techniques produce features that contain both spectral and temporal features as well as both local and global features. For example, spectral features are data features in the frequency domain (e.g., patterns in the frequency of pixel colors in an image; for instance, the color green occurs ten time more often than the color red in a particular segment of an image), whereas temporal information is information about the data in the time domain (e.g., the pattern of the actual pixels making up the image; e.g., green appears in one segment but not in another, neighboring segment of an image).
The disclosed approaches improve upon conventional approaches because they enable reversible wavelet transformation operations that can extract both spectral and temporal features (e.g., without omitting either the spectral features or the temporal features) as well as both local and global features (e.g., without omitting either the local features or the global features). For example, conventional transformers using kernel operations lose information in the kernel operations whereas wavelet filters preserve the information and can therefore be reversed. The improvements to feature extraction, which are provided by the disclosed approaches but not by the prior approaches, improve the efficiency of the training process on large data sets and enable the trained machine learning models to produce statistically reliable predictive outputs without sacrificing speed or efficiency. As a result, the disclosed approaches can be utilized in resource constrained computing environments as well as non-resource constrained environments.
In the embodiment of
User system 110 includes at least one computing device, such as a personal computing device, a server, a mobile computing device, or a smart appliance. User system 110 includes at least one software application, including a user interface 112, installed on or accessible by a network to a computing device. For example, user interface 112 can be or include a front-end portion of application software system 130.
User interface 112 is any type of user interface as described above. User interface 112 can be used to input search queries and view or otherwise perceive output that includes data produced by application software system 130. For example, user interface 112 can include a graphical user interface and/or a conversational voice/speech interface that includes a mechanism for entering a search query and viewing query results and/or other digital content. Examples of user interface 112 include web browsers, command line interfaces, and mobile apps. User interface 112 as used herein can include application programming interfaces (APIs).
Network 120 can be implemented on any medium or mechanism that provides for the exchange of data, signals, and/or instructions between the various components of computing system 100. Examples of network 120 include, without limitation, a Local Area Network (LAN), a Wide Area Network (WAN), an Ethernet network or the Internet, or at least one terrestrial, satellite or wireless link, or a combination of any number of different networks and/or communication links.
Application software system 130 is any type of application software system that includes or utilizes functionality and/or outputs provided by wavelet transformer component 150, sequence generation component 160, and/or machine learning system 170. Examples of application software system 130 include but are not limited to online services including connections network software, such as social media platforms, and systems that are or are not be based on connections network software, such as general-purpose search engines, content distribution systems including media feeds, bulletin boards, and messaging systems, special purpose software such as but not limited to job search software, recruiter search software, sales assistance software, advertising software, learning and education software, enterprise systems, customer relationship management (CRM) systems, or any combination of any of the foregoing.
A client portion of application software system 130 can operate in user system 110, for example as a plugin or widget in a graphical user interface of a software application or as a web browser executing user interface 112. In an embodiment, a web browser can transmit an HTTP request over a network (e.g., the Internet) in response to user input that is received through a user interface provided by the web application and displayed through the web browser. A server running application software system 130 and/or a server portion of application software system 130 can receive the input, perform at least one operation using the input, and return output using an HTTP response that the web browser receives and processes.
Data store 140 can include any combination of different types of memory devices. Data store 140 stores digital data used by user system 110, application software system 130, wavelet transformer component 150, sequence generation component 160, and machine learning system 170. Data store 140 can reside on at least one persistent and/or volatile storage device that can reside within the same local network as at least one other device of computing system 100 and/or in a network that is remote relative to at least one other device of computing system 100. Thus, although depicted as being included in computing system 100, portions of data store 140 can be part of computing system 100 or accessed by computing system 100 over a network, such as network 120.
The computing system 110 includes a wavelet transformer component 150 that can apply wavelet transforms to digital data for the purpose of extracting features of the digital data. In some embodiments, the application software system 130 includes at least a portion of the wavelet transformer component 150. As shown in
The wavelet transformer component 150 can apply wavelet transforms to digital data and extract features for use in machine learning models. The disclosed technologies can be described with reference to an example use case of transforming data for image classification for use in a ranking machine learning model; for example, ranking search results including classified images in a social graph application such as a professional social network application. The disclosed technologies are not limited to social graph applications but can be used to compress and classify data more generally. The disclosed technologies can be used by many different types of network-based applications in which compression and/or classification are useful. For example, data compression, image captioning, nature language processing, image classification, and content moderation.
The computing system 110 includes a sequence generation component 160 that can generate sequences from digital data for use in feature extraction of the digital data. In some embodiments, the application software system 130 includes at least a portion of the sequence generation component 160. As shown in
The sequence generation component 160 can divide digital data into patches and generate sequences using the patches of divided data. Patches are subdivisions of digital data (e.g., a group of pixels in an image file) used as inputs into a sequence generation component (e.g., sequence generation component 160) for subsequent transformation operations. Sequences are patches with positional embeddings used as inputs into a transformer component (e.g., wavelet transformer component 150). Further details with regard to the definition and use of patches and sequences are described with reference to
Further details with regards to the operations of the wavelet transformer component 150 and sequence generation component 160 are described below.
Each of user system 110, application software system 130, data store 140, wavelet transformer component 150, sequence generation component 160, and machine learning system 170 is implemented using at least one computing device that is communicatively coupled to electronic communications network 120. Any of user system 110, application software system 130, data store 140, wavelet transformer component 150, sequence generation component 160, and machine learning system 170 can be bidirectionally communicatively coupled by network 120. User system 110 as well as one or more different user systems (not shown) can be bidirectionally communicatively coupled to application software system 130.
A typical user of user system 110 can be an administrator or end user of application software system 130, wavelet transformer component 150, sequence generation component 160, and machine learning system 170. User system 110 is configured to communicate bidirectionally with any of application software system 130, data store 140, wavelet transformer component 150, sequence generation component 160, and machine learning system 170 over network 120.
While not specifically shown, it should be understood that any of user system 110, application software system 130, data store 140, wavelet transformer component 150, sequence generation component 160, and machine learning system 170 includes an interface embodied as computer programming code stored in computer memory that when executed causes a computing device to enable bidirectional communication with any other of user system 110, application software system 130, data store 140, wavelet transformer component 150, sequence generation component 160, and machine learning system 170 using a communicative coupling mechanism. Examples of communicative coupling mechanisms include network interfaces, inter-process communication (IPC) interfaces and application program interfaces (APIs).
The features and functionality of user system 110, application software system 130, data store 140, wavelet transformer component 150, sequence generation component 160, and machine learning system 170 are implemented using computer software, hardware, or software and hardware, and can include combinations of automated functionality, data structures, and digital data, which are represented schematically in the figures. User system 110, application software system 130, data store 140, wavelet transformer component 150, sequence generation component 160, and machine learning system 170 are shown as separate elements in
As shown in
In some embodiments, sequence generation component determines the size and/or number of patches 212, 214, 216, and 218. For example, sequence generation component 160 receives metadata 205 from application software system 110 and uses metadata 205 to determine the size and/or number of patches 212, 214, 216, and 218. In some embodiments, metadata 205 includes a data type and sequence generation component 160 uses the data type to determine the size and/or number of patches 212, 214, 216, and 218. For example, the patch size for an image may be predetermined while sequence generation component 160 determines the patch sizes for text (e.g., determines a patch as a subdivision of the text such as determining a patch for each word or sentence with different size patches corresponding to different word or sentence lengths). In some embodiments, digital data 210 includes multiple data types and sequence generation component 160 separates the different data types of digital data 210 using metadata 205.
In some embodiments, digital data 210 includes video data and sequence generation component 160 samples the video data and divides each sample of the video data into patches. In some embodiments, sequence generation component 160 randomly samples the video data. In other embodiments, sequence generation component 160 uses a representative sample (e.g., a thumbnail) to represent the video. In still other embodiments, sequence generation component 160 samples the video data at a certain rate (e.g., ten samples per second of video).
In some embodiments, digital data 210 includes text data and sequence generation component 160 prepares the text data for sequence generation by performing tokenization on the patches containing text data (e.g., patches 212, 214, 216, and 218) to transform the text data into a numerical representation that can be processed by wavelet transformer component 150 (e.g., a one-hot encoded numerical 2D array).
Sequence generation component 160 performs a positional embedding operation on patches 212, 214, 216, and 218 to determine positional embeddings for each of patches 212, 214, 216, and 218 and combine the positional embeddings with their associated patches to generate embedded patches 213, 215, 217, and 219. For example, the positional embedding for patch 212 includes information about the position of patch 212 in the larger set of digital data 210 (e.g., identifying its place in the top left corner). The positional embedding therefore contains information about where each patch fits in the larger set of digital data 210. This positional information is useful for understanding context for the patches since wavelet transformer component 150 does not process embedded patches 213, 215, 217, and 219 sequentially. For example, digital data 210 is an image and patches 212, 214, 216, and 218 are groups of pixels in the image. The placement of the groups of pixels in the image is tracked by positional embeddings such that embedded patches 213, 215, 217, and 219 contains information on the groups of pixels as well as their location with the picture.
Sequence generation component 160 combines embedded patches 213, 215, 217, and 219 to produce sequence 220. Sequence 220 represents the original digital data 210 split into smaller patches with associated positional embeddings. By producing sequence 220, sequence generation component 160 is able to preserve local features expressed in each of the patches 212, 214, 216, and 218 while also preserving global features expressed in digital data 210 and preserved through the positional embeddings. For example, sequence 220 includes global features since sequence 220 includes all data in digital data 210 (with positions indicated through positional embeddings) but also include local features since sequence 220 is made up of subdivisions of digital data 210 (e.g., patches 212, 214, 216, and 218). Sequence generation component 160 sends sequence 220 to wavelet transformer component 150. In some embodiments, sequence generation component 160 sends embedded patches 213, 215, 217, and 219 to wavelet transformer component 150 rather than sequence 220.
As shown in
Wavelet transformer component 150 takes the output of first transformer block 305 (e.g., transformed patches) and inputs it into first wavelet block 310. First wavelet block 310 receives transformed patches of sequence 220 and performs wavelet compression on sequence 220 to reduce the size of sequence 220. For example, as shown in
In some embodiments, wavelet filter 420 includes a high pass and/or a low pass wavelet filter. In embodiments where wavelet filter 420 includes a low pass wavelet filter, wavelet filter 420 attenuates sudden changes in the input creating an averaged output (as an illustrative example imagine blurring the edges of an image). In embodiments where wavelet filter 420 includes a high pass wavelet filter, wavelet filter 420 accentuates the sudden changes in the input (e.g., an edge detector). In embodiments where both a low pass and a high pass filter are used, the subsequent applications of transformer and wavelet blocks (as illustrated in
In some embodiments, as illustrated in
In some embodiments, the filter values of α and β are updated through backpropagation. For example, the values of α and β are updated in response to a machine learning model being trained (e.g., model building 515). In such an example, the filter values of α and β are updated based on the loss of a machine learning model (such as loss 535 of
In some embodiments, first filter dimension 425 is less than first patch dimension 410 and second filter dimension 430 is the same as first patch dimension 410. For example, transformed patch 405 is a vector of length N with first patch dimension 410 of N and second patch dimension 415 of 1. In such embodiments, wavelet filter 420 is therefore a matrix of size N/2 by N.
In some embodiments, first wavelet block 310 transposes transformed patch 405 to line the dimensions of transformed patch 405 with the dimensions of wavelet filter 420 for matrix multiplication. For example, transformed patch 405 is represented as matrix A with dimensions N×1 and wavelet filter 420 is represented as matrix L with dimensions N/2×N. For a product to be defined for a matrix multiplication operation, first wavelet block 310 transposes matrix A before multiplication. For example, first wavelet block 310 constructs matrix B such that B=/AT and performs matrix multiplication to compute filtered patch 435 represented by matrix C, such that C=L·B. The resulting dimensions of filtered patch 435, represented by matrix C are therefore N/2 by 1 or half the size of transformed patch 405.
Although only one wavelet filter is illustrated, multiple wavelet filters may be used. For example, first wavelet block 310 includes a first wavelet filter used as a low pass filter and a second wavelet filter used as a high pass filter. Wavelet filter 420, represented by matrix L may therefore be a low pass filter and a second wavelet filter, not illustrated, is represented by matrix H as a high pass filter. In such embodiments, first wavelet block 310 performs matrix multiplication to compute a second filtered patch represented by matrix D, such that D=H·B. The resulting dimensions of the second filtered patch, represented by matrix D are therefore N/2 by 1, the same as the dimensions of filtered patch 435 and half the size of transformed patch 405. In this way, first wavelet block 310 compresses transformed patch 405 to be half its original size.
In some embodiments, first wavelet block 310 combines the filtered patches including filtered patch 435 to create a combined patch. For example, first wavelet block 310 uses softmax or a similar function to convert values of matrix D from numbers to probability values, where the probability values for each number is proportional to the relative scale of each value in matrix D. For example, first wavelet block 310 uses softmax to generate filtered probability patch matrix E, such that E=softmax(D). In some embodiments, first wavelet block 310 performs an element-wise product operation of matrix E with matrix C. For example, first wavelet block 310 performs a Hadamard product of matrix E with matrix C to generate a combined patch (e.g., a portion of combined sequence 240 of
As explained with reference to
As shown in
In some embodiments, model building 515 is a component for training, validating, and executing a machine learning model. In some embodiments, model building 515 is a component for training, validating, and executing a neural network, such as a Bayesian neural network, which classifies inputs based on training data 510. For example, model building 515 uses training data 510 as inputs and creates a neural network with hidden layers such as probabilistic layer 520. Model building 515 generates prediction 530 using probabilistic layer 520. For example, model building 515 calculates gradients and applies backpropagation to training data 510. Prediction 530 is a predicted classification for training data 510. Model building 515 compares prediction 530 to actual 525 which is the actual classification for digital data 210. For example, training data 510 may include a transformed/filtered version of a picture of a dog with an associated classification.
Model building 515 generates loss 535 based on the difference between actual 525, the actual classification and prediction 530, the predicted classification. For example, model building 515 generates loss 535 based on whether the training data 510 is correctly classified with the one or more input classifiers associated with digital data 210 and therefore trains the machine learning model to identify the classification for digital data 210 based on its transformed/filtered version (e.g., combined sequence 240). In some embodiments, the loss is a validation loss. In some embodiments, model building 515 determines whether the validation loss (i.e., loss 535) satisfies a validation loss threshold. The validation loss threshold is a threshold that determines an acceptable accuracy for model building 515. In some embodiments, if the validation loss exceeds the validation loss threshold, model building 515 sends the trained prediction model 550 to model rewriter 545. In other embodiments, if the validation loss exceeds validation loss threshold, model building 515 sends the trained prediction model 550 to a model serving system through network 120.
In some embodiments model rewriter 545 receives trained prediction model 550 and rewrites the trained prediction model 550 to generate a new machine learning model with the same structure and weights as trained prediction model 550. In some embodiments, model rewriter 545 again rewrites the new machine learning model to generate a machine learning model to be served online. Model rewriter 545 sends the machine learning model to network 120 for distribution and execution.
As shown in
In the embodiments of
Deep tower 620 includes an embeddings layer 625 and a probabilistic layer 630. For example, input features 605 of deep tower 620 are converted from high-dimensional features to a low-dimensional, real valued vector (i.e., embedded vector) in embeddings layer 625. In some embodiments, embeddings layer 625 sends the embedded vectors to probabilistic layer 630. Probabilistic layer 630 performs computations on the embedded vectors to generate model weights which are updated based on loss 645.
In some embodiments, embeddings layer 625 sends the embedded vectors to sequence generation component 160. Sequence generation component 160 generates a sequence (e.g., sequence 220) from these embedded vectors by dividing the embedded vectors into patches (e.g., patches 212, 214, 216, and 218) and generating embedded patches (e.g., embedded patches 213, 215, 217, and 219) by combining the patches with positional embeddings of the embedded vectors. Further details regarding the operations of sequence generation component 160 are explained with reference to
Wavelet transformer component 160 performs transformation, filtering, and combination operations on the sequence to generate a combined sequence (e.g., combined sequence 240). Further details regarding the operations of wavelet transformer component 150 are explained with reference to
In some embodiments, if loss 645 exceeds a loss threshold, model training component 505 sends the trained model to model rewriter 545. In other embodiments, if loss 645 exceeds the loss threshold, model training component 505 sends the trained model to a model serving system through network 120.
In some embodiments model rewriter 545 receives the prediction model and rewrites the trained prediction model 550 to generate a new machine learning model with the same structure and weights as the trained model. In some embodiments, model rewriter 545 again rewrites the new machine learning model to generate a machine learning model to be served online. Model rewriter 545 sends the machine learning model to network 120 for distribution and execution.
At operation 705, the processing device receives digital data. For example, sequence generation component 160 receives digital data 210 which can include one or more of audio, text, image, and/or video data. In some embodiments, the processing device receives the digital data from a data store, such as data store 140 of
At operation 710, the processing device divides the digital data into patches. For example, sequence generation component 160 divides digital data 210 into patches 212, 214, 216, and 218. In some embodiments, the number of patches is predetermined, and the size of the patches depends on the predetermined number of patches and the size of digital data. In other embodiments, the size of the patches is predetermined, and the number of patches depends on the predetermined size of patches and the size of the digital data. Further details regarding the operations of dividing the digital data into patches are explained with reference to
At operation 715, the processing device generates embedded patches using patches and positional embeddings. For example, sequence generation component 160 performs a positional embedding operation on patches 212, 214, 216, and 218 to determine positional embeddings and combine the positional embeddings with their associated patches to generate embedded patches 213, 215, 217, and 219. Further details regarding the operations of generating embedded patches are explained with reference to
At operation 720, the processing device generates transformed patches. For example, wavelet transformer component 150 inputs embedded patches 213, 215, 217, and 219 of sequence 220 into first transformer block 305. In some embodiments, the processing device adds a learnable classification embedding to the sequence including the embedded patches before inputting the embedded patches into the transformer block. The transformer block includes a deep learning model that uses self-attention to weigh the significance of the embedded patches with reference to the sequence as a whole. Further details regarding the operations of generating transformed patches are explained with reference to
At operation 725, the processing device creates filtered patches by applying wavelet filter to transformed patches. For example, first wavelet block 310 transposes transformed patch 405 and performs matrix multiplication using transformed patch 405 and wavelet filter 420 to compute filtered patch 435. In some embodiments, the processing device creates multiple filtered patches for a single transformed patch by performing matrix multiplication using the transformed patch and multiple wavelet filters. Further details regarding the operations of generating transformed patches are explained with reference to
At operation 730, the processing device creates combined patches by combining filtered patches. For example, first wavelet block 310 uses softmax or a similar function to convert values in one or more of the filtered patches from numbers to probability values, where the probability values for each number is proportional to the relative scale of each value. In some embodiments, the processing device performs an element-wise product operation of the generated probability matrix with one or more of the other filtered patches. For example, first wavelet block 310 performs a Hadamard product of the probability matrix and a filtered patch to generate a combined patch (e.g., a portion of combined sequence 240 of
At operation 735, the processing device generates a set of training data using the combined patches. For example, model training component 505 generates training data 510 including combined sequence 240 as well as one or more input classifiers identifying digital data 210 that combined sequence 240 was generated from. Further details regarding the operations of generating a set of training data are explained with reference to
At operation 740, the processing device generates a trained prediction model. For example, model building 515 uses training data 510 as inputs and trains a neural network with hidden layers such as probabilistic layer 520. The processing device generates a prediction for the training data using the probabilistic layer and compares the prediction to the actual classification to generate a loss. The processing device then updates the trained prediction model based on the generated loss. Further details regarding the operations of generating a set of training data are explained with reference to
At operation 745, the processing device applies the trained prediction model to a set of execution data. For example, the processing device uses the execution data as inputs to the trained prediction model. In some embodiments, the processing device inputs execution data into a multitower model such as the model shown in model training component 505 of
At operation 750, the processing device determines output based on the trained prediction model. For example, the processing device determines one or more output classifiers for execution data based on the output of the trained prediction model. In some embodiments, the processing device determines an output from a multitower prediction model such as the model shown in model training component 505 of
At operation 805, the processing device receives digital data. For example, sequence generation component 160 receives digital data 210 which can include one or more of audio, text, image, and/or video data. In some embodiments, the processing device receives the digital data from a data store, such as data store 140 of
At operation 810, the processing device generates embedded patches. For example, sequence generation component 160 divides digital data 210 into patches 212, 214, 216, and 218 and performs a positional embedding operation on patches 212, 214, 216, and 218 to determine positional embeddings and combine the positional embeddings with their associated patches to generate embedded patches 213, 215, 217, and 219. Further details regarding the operations of generating embedded patches are explained with reference to
At operation 815, the processing device generates a transformed patch. For example, wavelet transformer component 150 inputs embedded patches 213, 215, 217, and 219 of sequence 220 into first transformer block 305. In some embodiments, the processing device adds a learnable classification embedding to the sequence including the embedded patches before inputting the embedded patches into the transformer block. The transformer block includes a deep learning model that uses self-attention to weigh the significance of the embedded patches with reference to the sequence as a whole. Further details regarding the operations of generating transformed patches are explained with reference to
At operation 820, the processing device creates filtered patches by applying wavelet filters. For example, first wavelet block 310 transposes transformed patch 405 and performs matrix multiplication using transformed patch 405 and wavelet filter 420 to compute filtered patch 435. In some embodiments, the processing device creates multiple filtered patches for a single transformed patch by performing matrix multiplication using the transformed patch and multiple wavelet filters. Further details regarding the operations of generating transformed patches are explained with reference to
At operation 825, the processing device creates a combined patch. For example, first wavelet block 310 uses softmax or a similar function to convert values in one or more of the filtered patches from numbers to probability values, where the probability values for each number is proportional to the relative scale of each value. In some embodiments, the processing device performs an element-wise product operation of the generated probability matrix with one or more of the other filtered patches. For example, first wavelet block 310 performs a Hadamard product of the probability matrix and a filtered patch to generate a combined patch (e.g., a portion of combined sequence 240 of
At operation 830, the processing device generates a set of training data. For example, model training component 505 generates training data 510 including combined sequence 240 as well as one or more input classifiers identifying digital data 210 that combined sequence 240 was generated from. Further details regarding the operations of generating a set of training data are explained with reference to
At operation 835, the processing device generates a trained prediction model. For example, model building 515 uses training data 510 as inputs and trains a neural network with hidden layers such as probabilistic layer 520. The processing device generates a prediction for the training data using the probabilistic layer and compares the prediction to the actual classification to generate a loss. The processing device then updates the trained prediction model based on the generated loss. Further details regarding the operations of generating a set of training data are explained with reference to
The machine can be a personal computer (PC), a smart phone, a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein.
The example computer system 900 includes a processing device 902, a main memory 904 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM) such as synchronous DRAM (SDRAM) or Rambus DRAM (RDRAM), etc.), a memory 906 (e.g., flash memory, static random access memory (SRAM), etc.), an input/output system 910, and a data storage system 940, which communicate with each other via a bus 930.
Processing device 902 represents one or more general-purpose processing devices such as a microprocessor, a central processing unit, or the like. More particularly, the processing device can be a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets, or processors implementing a combination of instruction sets. Processing device 902 can also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. The processing device 902 is configured to execute instructions 912 for performing the operations and steps discussed herein.
The computer system 900 can further include a network interface device 908 to communicate over the network 920. Network interface device 908 can provide a two-way data communication coupling to a network. For example, network interface device 908 can be an integrated-services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, network interface device 908 can be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links can also be implemented. In any such implementation network interface device 908 can send and receive electrical, electromagnetic, or optical signals that carry digital data streams representing various types of information.
The network link can provide data communication through at least one network to other data devices. For example, a network link can provide a connection to the world-wide packet data communication network commonly referred to as the “Internet,” for example through a local network to a host computer or to data equipment operated by an Internet Service Provider (ISP). Local networks and the Internet use electrical, electromagnetic, or optical signals that carry digital data to and from computer system computer system 900.
Computer system 900 can send messages and receive data, including program code, through the network(s) and network interface device 908. In the Internet example, a server can transmit a requested code for an application program through the Internet and network interface device 908. The received code can be executed by processing device 902 as it is received, and/or stored in data storage system 940, or other non-volatile storage for later execution.
The input/output system 910 can include an output device, such as a display, for example a liquid crystal display (LCD) or a touchscreen display, for displaying information to a computer user, or a speaker, a haptic device, or another form of output device. The input/output system 910 can include an input device, for example, alphanumeric keys and other keys configured for communicating information and command selections to processing device 902. An input device can, alternatively or in addition, include a cursor control, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processing device 902 and for controlling cursor movement on a display. An input device can, alternatively or in addition, include a microphone, a sensor, or an array of sensors, for communicating sensed information to processing device 902. Sensed information can include voice commands, audio signals, geographic location information, and/or digital imagery, for example.
The data storage system 940 can include a machine-readable storage medium 942 (also known as a computer-readable medium) on which is stored one or more sets of instructions 912 or software embodying any one or more of the methodologies or functions described herein. The instructions 912 can also reside, completely or at least partially, within the main memory 904 and/or within the processing device 902 during execution thereof by the computer system 900, the main memory 904 and the processing device 902 also constituting machine-readable storage media.
In one embodiment, the instructions 912 include instructions to implement functionality corresponding to a wavelet transformer component (e.g., the wavelet transformer component 150 of
Some portions of the preceding detailed descriptions have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the ways used by those skilled in the data processing arts to effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, images, audio, videos or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. The present disclosure can refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage systems.
The present disclosure also relates to an apparatus for performing the operations herein. This apparatus can be specially constructed for the intended purposes, or it can include a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. For example, a computer system or other data processing system, such as the computing system 100, 200, 500, and/or 600, can carry out the computer-implemented methods 700 and 800 in response to its processor executing a computer program (e.g., a sequence of instructions) contained in a memory or other non-transitory machine-readable storage medium. Such a computer program can be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems can be used with programs in accordance with the teachings herein, or it can prove convenient to construct a more specialized apparatus to perform the method. The structure for a variety of these systems will appear as set forth in the description below. In addition, the present disclosure is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages can be used to implement the teachings of the disclosure as described herein.
The present disclosure can be provided as a computer program product, or software, that can include a machine-readable medium having stored thereon instructions, which can be used to program a computer system (or other electronic devices) to perform a process according to the present disclosure. A machine-readable medium includes any mechanism for storing information in a form readable by a machine (e.g., a computer). In some embodiments, a machine-readable (e.g., computer-readable) medium includes a machine (e.g., a computer) readable storage medium such as a read only memory (“ROM”), random access memory (“RAM”), magnetic disk storage media, optical storage media, flash memory components, etc.
Illustrative examples of the technologies disclosed herein are provided below. An embodiment of the technologies may include any of the examples or a combination of the described below.
An example 1 includes receiving digital data, generating embedded patches using the digital data, generating a transformed patch by applying a transformer block to an embedded patch of the embedded patches, creating a filtered patches for the transformed patch by applying wavelet filters to the transformed patch, creating a combined patch by combining the filtered patches, generating a set of training data using the combined patch, and generating a trained prediction model by applying a prediction model to the set of training data.
An example 2 includes the subject matter of example 1 where generating the embedded patches includes dividing the digital data into patches and generating the embedded patches by combining the patches with positional embeddings for the patches. An example 3 includes the subject matter of example 2, where the digital data includes metadata and dividing the digital data into the patches uses the metadata to determine a patch size of a patch. An example 4 includes the subject matter of example 3, where the metadata includes a data type including at least one of text, audio, image, or video and where dividing the digital data into the patches uses the data type to determine the patch size. An example 5 includes the subject matter of any of examples 1-4, further including generating the wavelet filters, where a wavelet filter of the wavelet filters includes a first filter dimension and a second filter dimension, the transformed patch includes a first patch dimension and a second patch dimension, and a size of the second filter dimension is less than a size of the first patch dimension. An example 6 includes the subject matter of example 5, where the size of the second filter dimension is half the size of the first patch dimension and where generating the wavelet filter includes determining a first filter value and a second filter value and generating a diagonal constant matrix with the first filter value and the second filter value on diagonals of the diagonal constant matrix. An example 7 includes the subject matter of example 6, further including determining one or both of the first filter value and the second filter value based on an output of the trained prediction model. An example 8 includes the subject matter of any of examples 6 and 7, where the wavelet filters include a first wavelet filter and a second wavelet filter, the method further including generating the first wavelet filter as a high pass wavelet filter, generating the second wavelet filter as a low pass wavelet filter, where the filtered patches include a first filtered patch, created by applying the first wavelet filter to the transformed patch and a second filtered patch, created by applying the second wavelet filter to the transformed patch, and creating a combined patch by combining the first filtered patch and the second filtered patch. An example 9 includes the subject matter of example 8, where creating the combined patch includes converting the first filtered patch into a filtered probability patch, where probability values of the filtered probability patch are proportional to a scale of values of the first filtered patch and creating the combined patch by applying an element-wise product operation to the filtered probability patch and the second filtered patch. An example 10 includes the subject matter of any of examples 1-9, where the digital data is identified by one or more input classifiers and generating the set of training data further uses the one or more input classifiers, the method further including applying the trained prediction model to a set of execution data and determining, by the trained prediction model, an output based on the set of execution data, where the output includes one or more output classifiers identifying the set of execution data.
An example 11 includes a system including at least one memory device and a processing device operatively coupled with the at least one memory device, the processing device to receive digital data, generate embedded patches using the digital data, generate a transformed patch by applying a transformer block to an embedded patch of the embedded patches, create filtered patches for the transformed patch by applying wavelet filters to the transformed patch, create a combined patch by combining the filtered patches, generate a set of training data using the combined patch, and generate a trained prediction model by applying a prediction model to the set of training data.
An example 12 includes the subject matter of example 11 where generating the embedded patches includes dividing the digital data into patches and generating the embedded patches by combining the patches with positional embeddings for the patches. An example 13 includes the subject matter of example 12, where the digital data includes metadata and dividing the digital data into the patches uses the metadata to determine a patch size of a patch. An example 14 includes the subject matter of example 13, where the metadata includes a data type including at least one of text, audio, image, or video and where dividing the digital data into the patches uses the data type to determine the patch size. An example 15 includes the subject matter of any of examples 11-14, where the processing device is further to generate the wavelet filters, where a wavelet filter of the wavelet filters includes a first filter dimension and a second filter dimension, the transformed patch includes a first patch dimension and a second patch dimension, and a size of the second filter dimension is less than a size of the first patch dimension. An example 16 includes the subject matter of example 15, where the size of the second filter dimension is half the size of the first patch dimension and where generating the wavelet filter includes determining a first filter value and a second filter value and generating a diagonal constant matrix with the first filter value and the second filter value on diagonals of the diagonal constant matrix. An example 17 includes the subject matter of example 16, where the processing device is further to determine one or both of the first filter value and the second filter value based on an output of the trained prediction model. An example 18 includes the subject matter of any of examples 16 and 17, where the wavelet filters include a first wavelet filter and a second wavelet filter and where the processing device is further to generate the first wavelet filter as a high pass wavelet filter, generate the second wavelet filter as a low pass wavelet filter, where the filtered patches include a first filtered patch, created by applying the first wavelet filter to the transformed patch and a second filtered patch, created by applying the second wavelet filter to the transformed patch, and create a combined patch by combining the first filtered patch and the second filtered patch. An example 19 includes the subject matter of example 18, where creating the combined patch includes converting the first filtered patch into a filtered probability patch, where probability values of the filtered probability patch are proportional to a scale of values of the first filtered patch and creating the combined patch by applying an element-wise product operation to the filtered probability patch and the second filtered patch.
An example 20 includes a system including at least one memory device and a processing device operatively coupled with the at least one memory device, the processing device to receive digital data, where the digital data is identified by one or more input classifiers, generate embedded patches using the digital data, generate a transformed patch by applying a transformer block to an embedded patch of the embedded patches, create filtered patches for the transformed patch by applying wavelet filters to the transformed patch, create a combined patch by combining the filtered patches, generate a set of training data using the combined patch and the one or more input classifiers, apply the trained prediction model to a set of execution data, and determine, by the trained prediction model, an output based on the set of execution data, wherein the output includes one or more output classifiers identifying the set of execution data.
In the foregoing specification, embodiments of the disclosure have been described with reference to specific example embodiments thereof. It will be evident that various modifications can be made thereto without departing from the broader spirit and scope of embodiments of the disclosure as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense.