This disclosure relates generally to machine learning and, more particularly, to systems, apparatus, articles of manufacture, and methods to generate digitized handwriting with user style adaptations.
Handwriting style of every person is unique and reflects traits about their personality and their engagement with the subject matter. Handwriting has distinct advantages over typing in some respects, such as increased flexibility, especially in mathematical and/or chemical equations, better retention of the subject matter, and increased opportunity for creativity. As such, handwriting remains important today—even in digital settings.
In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts.
As used herein, connection references (e.g., attached, coupled, connected, and joined) may include intermediate members between the elements referenced by the connection reference and/or relative movement between those elements unless otherwise indicated. As such, connection references do not necessarily infer that two elements are directly connected and/or in fixed relation to each other. As used herein, stating that any part is in “contact” with another part is defined to mean that there is no intermediate part between the two parts.
Unless specifically stated otherwise, descriptors such as “first,” “second,” “third,” etc., are used herein without imputing or otherwise indicating any meaning of priority, physical order, arrangement in a list, and/or ordering in any way, but are merely used as labels and/or arbitrary names to distinguish elements for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for identifying those elements distinctly that might, for example, otherwise share a same name.
As used herein “substantially real time” refers to occurrence in a near instantaneous manner recognizing there may be real world delays for computing time, transmission, etc. Thus, unless otherwise specified, “substantially real time” refers to real time+/−1 second.
As used herein, the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events.
As used herein, “processor circuitry” is defined to include (i) one or more special purpose electrical circuits structured to perform specific operation(s) and including one or more semiconductor-based logic devices (e.g., electrical hardware implemented by one or more transistors), and/or (ii) one or more general purpose semiconductor-based electrical circuits programmable with instructions to perform specific operations and including one or more semiconductor-based logic devices (e.g., electrical hardware implemented by one or more transistors). Examples of processor circuitry include programmable microprocessors, Field Programmable Gate Arrays (FPGAs) that may instantiate instructions, Central Processor Units (CPUs), Graphics Processor Units (GPUs), Digital Signal Processors (DSPs), XPUs, or microcontrollers and integrated circuits such as Application Specific Integrated Circuits (ASICs). For example, an XPU may be implemented by a heterogeneous computing system including multiple types of processor circuitry (e.g., one or more FPGAs, one or more CPUs, one or more GPUs, one or more DSPs, etc., and/or a combination thereof) and application programming interface(s) (API(s)) that may assign computing task(s) to whichever one(s) of the multiple types of processor circuitry is/are best suited to execute the computing task(s).
Although digital interactions have become widely adopted in many fields, handwriting has remained prevalent for users as a result of, for example, increased flexibility provided by handwriting compared to typing, improved haptic perception that results from handwriting, improved recollection of content that is handwritten, and improved creativity enabled by handwriting. However, utilizing handwriting in digital settings (e.g., settings that utilize computing devices) presents difficulties that can make handwriting less convenient than typing. For instance, handwritten content cannot be searched for keywords with the press of a button(s) (e.g., CTRL+F). Further, after being scanned by a computing device, the handwriting cannot be edited and occupies significant storage space. As such, users in digital settings often do not experience the benefits of handwriting as a result of the inconveniences associated with utilizing handwritten content in the digital settings.
To enable handwriting usage to increase, adapt to, and improve with digital settings, artificial intelligence and/or machine learning (AI/ML) may be utilized for handwriting generation. Machine learning models, such as neural networks (e.g., artificial neural networks (ANNs), convolutional neural networks (CNNs), deep neural networks (DNNs), etc.) are useful tools that have demonstrated value in solving complex problems, such as handwriting recognition. Neural networks operate, for example, by using artificial neurons arranged into layers that process data from an input layer to an output layer and apply weighting values to the data during processing of the data. Such weighting values are determined during a training process. For instance, typical DNN architectures have learnable parameters that are stacked using complex network topologies, which gives them improved ability to fit training data with respect to other types of AI/ML techniques.
In some known handwriting generation techniques, generative adversarial networks (GANs) are utilized for handwriting image generation. However, with generated images, the handwritten content occupies substantial storage space and remains unsearchable and inflexible with respect to editing. Moreover, such techniques do not enable adaptations to a custom style displayed in handwritten content from a user.
In some other known handwriting generation techniques, handwriting ink coordinates are generated. However, the generated handwriting often fails to accurately reflect the particular style of the user, which can impair the haptic perception associated with the handwriting when viewed by the user. Additionally, such techniques have demonstrated an inability to adapt to cursive handwriting. Furthermore, with such techniques, the generated handwriting often has an uptrend or a downtrend from horizontal (e.g., or a left-trend/right-trend from vertical), which can cause interference between lines of handwriting and causes the handwriting to be difficult for a user or other reader to follow. Accordingly, the detriments of known handwriting generation techniques limit the utilization of handwriting in digital settings.
Examples disclosed herein provide systems, apparatus, articles of manufacture, and methods to generate digitized handwriting with user style adaptations. Examples disclosed herein generate digitized handwriting using a recurrent neural network model that is adaptable to a style displayed by a user when provided a relatively small handwriting sample (e.g., less than twenty (20) handwritten words and/or symbols) from the user. Examples disclosed herein train the recurrent neural network model to generate generalized handwriting styles based on one or more online dataset(s). Furthermore, examples disclosed herein re-train the recurrent neural network model to adapt to the style displayed by any user using few-shot learning.
In disclosed examples, the recurrent neural network model includes an attention layer, a sequence generator network including one or more long short-term memories (LSTMs), a mixture density network (MDN) layer, and a reparameterization layer. The example attention layer indicates what text is to be generated. For example, the attention layer can guide the model to generate identified characters (e.g., letters, numbers, symbols, etc.) in a certain style. Furthermore, the example sequence generator can include three (3) layers of stacked LSTMs to generate handwriting coordinate sequences with attention. In turn, the MDN can learn a distribution of the generated handwriting coordinates based on the character being generated and/or the style in which the character is to be presented. The example reparameterization layer performs a comparison between an original handwriting sequence and the generated handwriting sequence and, in turn, modifies weighting values and/or connections in the MDN layer and/or the LSTM(s) to reduce (i.e., minimize) an error based on the comparison.
In disclosed examples, the reparameterization layer analyzes two types of differences between the original handwriting sequence and the generated handwriting sequence: mixture density network loss (MDN loss) and mean squared error (MSE). In particular, with the MDN loss, the reparameterization layer can analyze the generated coordinate distribution, which relates to the handwriting style. In other words, the MDN loss enables the reparameterization layer to evaluate the performance of the MDN layer by comparing a distribution of digital ink coordinates for a generated character to a geometry of ink defining the character in the original handwriting sequence. Furthermore, with the MSE, the reparameterization layer can analyze an average deviation of the generated handwriting sequence relative to the original handwriting sequence. In particular, using the MSE enables the reparameterization layer to analyze coordinates of the generated digital ink relative to coordinates of the ink in the original handwriting sequence. As a result, the MSE enables the reparameterization layer to detect deviations in an alignment of the generated handwriting sequence relative to the original handwriting sequence. As such, the reparameterization layer can detect an uptrend, a downtrend, a left-trend, and/or a right-trend in the generated handwriting sequence and, in turn, cause modifications in the MDN layer and/or the LSTM(s) to enable the digital ink in the generated handwriting sequence to be pixel-exact or near pixel-exact. Advantageously, the reparameterization layer causes the recurrent neural network to generate more realistic handwriting, whereas existing techniques smoothen generated words, which causes the generated words to appear different from a style with which the words were written.
In response to training the recurrent neural network model to generate generalized handwriting styles based on one or more online dataset(s), examples disclosed herein re-train the recurrent neural network to adopt a writing style of a given user using few gradient descent steps. In particular, to adapt to the writing style of the user, examples disclosed herein utilize few-shot learning. Few-shot learning is an optimization-based model-agnostic meta-learning (MAML) technique that learns via gradient descent. Specifically, examples disclosed herein fix (e.g., maintain, lock, etc.) the LSTM layers after the initial training period but re-train the MDN layer using MAML techniques and running a few gradient descent steps with a small data sample (e.g., twenty or fewer words, numbers and/or symbols in a handwritten sequence) from the user. In examples disclosed herein, hyperparameters for the user-specific training are set (e.g., empirically selected) to account for learning rate, batch size, number of training steps, etc.
During a user-specific training period, examples disclosed herein can access handwriting of a user via writing on a touch screen, a scanned or downloaded image, etc. and provide the handwriting to the recurrent neural network. At this point, the LSTM layers of the recurrent neural network are fixed such that the MDN layer can learn to reflect the style of the handwriting presented by the user using few-shot learning. For example, using few-shot learning, the selected hyperparameters, and the reparameterization layer, the MDN layer can adapt connections and/or weights to reduce (i.e., minimize) an error between handwriting vectors outputted by the MDN layer and an initial set of words, numbers, and/or symbols written by the user. In some examples, the MDN layer trains in substantially real time (e.g., as the user writes user-specific training data on the touch screen).
In turn, examples disclosed herein can execute the recurrent neural network with the adapted MDN layer for usage with any subsequent writing from the user to cause the writing to be digitized in substantially real time. In some examples, the recurrent neural network model outputs the digitized handwriting as vectors in a compressed scalable vector graphics format (e.g., an SVGZ format) to reduce the storage space occupied by the digitized handwriting and enable the digitized writing to scale to any format. To enable different models to be generated for different users, examples disclosed herein may maintain a generalized version of the recurrent neural network model in response to performing the initial training such that the generalized version of the recurrent neural network model can be utilized as a starting point for any new and/or additional user models and/or adaptations.
Advantageously, examples disclosed herein generate digitized handwriting that is adaptable to a handwriting style presented by the user with a small sample size (e.g., twenty or fewer handwritten words and/or numbers). Furthermore, by comparing the generated and original handwriting sequences with both the MDN loss and the MSE loss, the example reparameterization layer can help train the LSTM layers and/or the MDN layer to account for handwriting style (e.g., digital ink coordinate distribution) as well as position (e.g., digital ink pixel location). As such, examples disclosed herein are able to capture cursive styles commonly missed with known techniques. Additionally, examples disclosed herein are able to maintain an alignment of a handwritten sequence such that an organization of the handwriting can be preserved as adjacent rows or columns in the generated handwriting do not overlap. Thus, examples disclosed herein generate author-specific digitized handwriting that is searchable, editable, and scalable to improve flexibility, efficiency, retention, and/or collaboration in digital settings.
In the illustrated example of
In some examples, the electronic system 102 is an SoC representative of one or more integrated circuits (ICs) (e.g., compact ICs) that incorporate components of a computer or other electronic system in a compact format. For example, the electronic system 102 may be implemented with a combination of one or more programmable processors, hardware logic, and/or hardware peripherals and/or interfaces. Additionally or alternatively, the example electronic system 102 of
In some examples, the first hardware accelerator 108 of the illustrated example of
In some examples, the second hardware accelerator 110 of the illustrated example of
In some examples, the general purpose processor circuitry 112 of the illustrated example of
In the illustrated example of
The electronic system 102 of the illustrated example includes the power source 118 to deliver power to portion(s) of the electronic system 102. In some examples, the power source 118 is a battery. For example, the power source 118 can be a limited-energy device, such as a lithium-ion battery or any other chargeable battery or power source. In some examples, the power source 118 is chargeable using a power adapter or converter (e.g., an alternating current (AC) to direct current (DC) power converter), a wall outlet (e.g., a 120V AC wall outlet, a 224V AC wall outlet, etc.), etc.
The electronic system 102 of the illustrated example of
In the illustrated example of
In the illustrated example of
In this example, the DHG model 124 is a machine learning model trained, deployed, instantiated, and executed by the handwriting generation circuitry 104A-E. For example, the handwriting generation circuitry 104A-E trains the DHG model 124 based on the training data 122. In particular, the handwriting generation circuitry 104A-E can cause the DHG model 124 to undergo initial training using a first portion of the training data 122 (e.g., the IAM On-Line Handwriting Database). In some examples, the first portion of the training data 122 is obtained from the network 128 via the interface circuitry 114. Further, the handwriting generation circuitry 104A-E can cause the DHG model to undergo user-specific training using a second portion of the training data 122 (e.g., twenty or fewer handwritten words and/or symbols from the user). In some examples, the second portion of the training data 122 is obtained via the interface circuitry 114 and/or the user interface circuitry 126.
After the initial and author-specific training, the handwriting generation circuitry 104A-E can execute the DHG model 124 to generate digitized handwriting in accordance with the handwriting style of a given user. Specifically, the handwriting generation circuitry 104A-E can execute the DHG model 124 to generate the digitized handwriting in a scalable vector graphics (e.g., SVG, SVGZ) format based on an input writing sequence (e.g., a handwritten sequence, text, etc.) obtained at, or indicated by, the user interface circuitry 126.
In the illustrated example of
In the illustrated example of
In some examples, the network 128 of the illustrated example of
In the illustrated example of
In some examples, one or more of the external electronic systems 130 execute the DHG model 124 to process a workload (e.g., an AI/ML workload, a computing workload, etc.). For example, the mobile device 134 can be implemented as a cellular or mobile phone having one or more processors (e.g., a CPU, a GPU, a VPU, an AI/NN specific processor, etc.) on one or more SoCs to process an AI/ML workload using the DHG model 124. For example, the desktop computer 132, the mobile device 134, the laptop computer 136, the tablet computer 138, the server 140, and/or the interactive whiteboard 142 can be implemented as electronic device(s) having one or more processors (e.g., a CPU, a GPU, a VPU, an AI/NN specific processor, etc.) on one or more SoCs to process an AI/ML workload using the DHG model 124. In some examples, the server 140 includes and/or otherwise is representative of one or more servers that can implement a central facility, a data facility, a cloud service (e.g., a public or private cloud provider, a cloud-based repository, etc.), a research institution (e.g., a laboratory, a research and development organization, a university, etc.), etc., to process AI/ML workload(s) using the DHG model 124.
In the illustrated example of
In the illustrated example of
Artificial intelligence (AI), including machine learning (ML), deep learning (DL), and/or other artificial machine-driven logic, enables machines (e.g., computers, logic circuits, etc.) to use a model to process input data to generate an output based on patterns and/or associations previously learned by the model via a training process. For instance, the handwriting generation circuitry 104A-E may train the DHG model 124 with data (e.g., the training data 122) to recognize patterns and/or associations and follow such patterns and/or associations when processing input data such that other input(s) result in output(s) consistent with the recognized patterns and/or associations.
Many different types of machine learning models and/or machine learning architectures exist. In examples disclosed herein, the handwriting generation circuitry 104A generates the DHG model 124 as a recurrent neural network and/or a deep neural network model. Using a deep neural network model enables feature generation automation and improved self-learning capabilities compared to classical machine learning models that require a degree of human intervention to determine the accuracy of an output. Furthermore, using a recurrent neural network model enables bi-directional data flow that propagates data from later processing stages to earlier processing stages. Using a neural network model enables the hardware accelerators 108, 110 to execute an AI/ML workload. In general, machine learning models/architectures/layers that are suitable to use in the example approaches disclosed herein will be a recurrent sequence model, long short-term memories (LSTMs), mixture density networks, reparameterization networks, and/or optimization-based model-agnostic meta-learning (MAML) (e.g., few-shot learning) models. However, other types of machine learning models could additionally or alternatively be used such as. Etc.
In general, implementing a ML/AI system involves two phases, a learning/training phase and an inference phase. In the learning/training phase, the handwriting generation circuitry 104A-E utilizes a general style training algorithm to train the DHG model 124 to operate in accordance with patterns and/or associations based on, for example, a first portion of the training data 122 obtained via the network 128. Further, in the learning/training phase, the handwriting generation circuitry 104A-E utilizes a user-adaptor algorithm to train the DHG model 124 to operate in accordance with patterns and/or associations based on, for example, a second portion of the training data 122 obtained via the user interface circuitry 126. In general, the DHG model 124 includes internal parameters (e.g., configuration data, weights, etc.) that guide how input data is transformed into output data, such as through a series of nodes and connections within the DHG model 124 to transform input data into output data. Additionally, hyperparameters are used as part of the training process to control how the learning is performed (e.g., a learning rate, a number of layers to be used in the machine learning model, etc.). Hyperparameters are defined to be training parameters that are determined prior to initiating the training process.
Different types of training may be performed based on the type of ML/AI model and/or the expected output. For example, the handwriting generation circuitry 104A-E may invoke supervised training to use inputs and corresponding expected (e.g., labeled) outputs to select parameters (e.g., by iterating over combinations of select parameters) for the DHG model 124 that reduce model error. As used herein, “labelling” refers to an expected output of the machine learning model (e.g., a classification, an expected output value, etc.) Alternatively, the handwriting generation circuitry 104A-E may invoke unsupervised training (e.g., used in deep learning, a subset of machine learning, etc.) that involves inferring patterns from inputs to select parameters for the DHG model 124 (e.g., without the benefit of expected (e.g., labeled) outputs).
In examples disclosed herein, the handwriting generation circuitry 104A-E trains the DHG model 124 using stochastic gradient descent (e.g., few-shot learning), parameterization, and/or reparameterization. However, any other training algorithm may additionally or alternatively be used.
In examples disclosed herein, the handwriting generation circuitry 104A-E trains the DHG model 124 during an initial training period until an acceptable amount of error is achieved. For example, the handwriting generation circuitry 104A-E can train the DHG model 124 utilizing the first portion of the training data 122 until the level of error is no longer reducing. Further, in examples disclosed herein, the handwriting generation circuitry 104A-E further trains the DHG model 124 for a specific user until the level of error is no longer reducing using an initial writing sample from the user (e.g., twenty or fewer handwritten words, numbers, and/or symbols in sequence). As such, the handwriting generation circuitry 104A-E trains the DHG model 124 to adopt character and spacing patterns in the handwriting style presented by the individual user.
In examples disclosed herein, the handwriting generation circuitry 104A-E trains the DHG model 124 locally on the electronic system 102 and/or remotely at an external electronic system (e.g., one(s) of the external electronic systems 130) communicatively coupled to the electronic system 102. In some examples, the handwriting generation circuitry 104A-E causes an initial training of the DHG model 124 to be performed remotely, such as at the server 140, and causes user-specific training to be performed locally on the electronic system 102. In some examples, the handwriting generation circuitry 104A-E causes the initial training of the DHG model 124 to be performed at a first one of the external electronic systems 130, such as the server 140, and causes the user-specific training of the DHG model 124 to be performed at a second one of the external electronic systems, such as the interactive whiteboard 142. In some examples, the handwriting generation circuitry 104A-E causes the initial training and the user-specific training to be performed at the electronic system 102. In examples disclosed herein, the handwriting generation circuitry 104A-E trains the DHG model 124 using hyperparameters that control how the learning is performed (e.g., a learning rate, a batch size, a quantity of training steps, a number of layers to be used in the machine learning model, etc.). In some examples, the handwriting generation circuitry 104A-E may use hyperparameters that control model performance and training speed such as the learning rate and regularization parameter(s). The handwriting generation circuitry 104A-E may select such hyperparameters by, for example, trial and error to reach an optimal model performance. In examples disclosed herein, a first hyperparameter controls weights provided to certain components in a loss function. Specifically, the first hyperparameter controls the weight distributed to an MDN error component of the loss function as well as an MSE component of the loss function to account for character-level and/or word-level stylistic and positional differences when comparing the generated handwriting to corresponding original handwriting. In examples disclosed herein, other hyperparameters control a learning rate, a batch size, and/or a quantity of training steps. In examples disclosed herein, the handwriting generation circuitry 104A-E performs re-training. Specifically, the handwriting generation circuitry 104A-E re-trains the DHG model 124, or trains a separate version of the DHG model 124, in response to encountering a new or additional handwriting style (e.g., a new or additional author). During re-training, the handwriting generation circuitry 104A-E maintains a fixed version of a first portion of the DHG model 124 (e.g., lower-level layers, the LSTM layers, etc.) while re-training a second portion of the DHG model 124 (e.g., the MDN layer) using MAML techniques and running a few gradient descent steps with a small number of data samples (e.g., twenty or fewer handwritten words).
Training is performed using training data. In some examples, during the initial training period, the handwriting generation circuitry 104A-E utilizes training data that originates from online, publicly available handwriting samples, such as those in the IAM On-Line Handwriting Database. In some examples, during the initial training period, the handwriting generation circuitry 104A-E utilizes handwriting samples collected by the external electronic devices 130 as the training data. In examples disclosed herein, during the author-specific training period, the handwriting generation circuitry 104A-E utilizes twenty or fewer handwritten words numbers, and/or symbols provided by the user as training data.
Once training is complete, the handwriting generation circuitry 104A-E may deploy the DHG model 124 for use as an executable construct that processes an input and provides an output based on the network of nodes and connections defined in the DHG model 124. The handwriting generation circuitry 104A-E may store the DHG model 124 in the datastore 120. In some examples, the handwriting generation circuitry 104A-E may invoke the interface circuitry 114 to transmit the DHG model 124 to one(s) of the external electronic systems 130. In some such examples, the handwriting generation circuitry 104A-E enables one(s) of the external electronic systems 130 to generate digitized handwriting for a user without being directly provided training data from the user. Accordingly, the electronic system 102 enables the trained DHG model 124 for a particular user to be deployed at any device associated with the user to avoid having to reduce training for the user.
Once trained, the deployed DHG model 124 may be operated in an inference phase to process data. In the inference phase, data to be analyzed (e.g., live data) is input to the DHG model 124, and the DHG model 124 executes to create an output. This inference phase can be thought of as the AI “thinking” to generate the output based on what it learned from the training (e.g., by executing the DHG model 124 to apply the learned patterns and/or associations to the live data). In some examples, input data undergoes pre-processing before being used as an input to the DHG model 124. For example, the handwriting generation circuitry 104A-E may cause an image of a handwritten document to undergo optical character recognition to enable characters in the handwritten document to serve as the input for the DHG model 124. Moreover, in some examples, handwriting generation circuitry 104A-E causes the output data to undergo post-processing after it is generated by the DHG model 124 to transform the output into a useful result (e.g., an SVG or SVGZ file, a display of data, an instruction to be executed by a machine, etc.).
In some examples, output of the deployed DHG model 124 may be captured and provided as feedback. By analyzing the feedback, the handwriting generation circuitry 104A-E can determine an accuracy of the deployed DHG model 124. In some examples, if the feedback indicates that the accuracy of the deployed DHG model 124 is less than a threshold or other criterion, the handwriting generation circuitry 104A-E can trigger using the feedback and an updated training data set, hyperparameters, etc., to generate an updated, deployed DHG model 124.
In examples operation, the handwriting generation circuitry 104A-E trains the DHG model 124 based on the training data 122. For example, the third handwriting generation circuitry 104C of the first hardware accelerator 108 can retrieve the DHG model 124 from the datastore 120, the external electronic systems 130 via the network 128, etc. In some examples, the third handwriting generation circuitry 104C can retrieve the training data 122, or a portion thereof, from the datastore 120, the user interface circuitry 126, the network 128, the external electronic systems 130, etc.
During an initial training period, the handwriting generation circuitry 104A-E can retrieve a first portion of the training data 122 (e.g., the IAM On-Line Handwriting Database) from the datastore 120 and/or the network 128. In turn, the example handwriting generation circuitry 104A-E can train the DHG model 124 to generate handwriting sequences corresponding to original handwritten sequences in the first portion of the training data 122. Specifically, the example handwriting generation circuitry 104A-E can train the DHG model 124 to synthesize ink coordinates for the generated handwriting sequences based on the original handwritten sequences using a recurrent neural network architecture with a reparameterization layer to adapt the DHG model 124 based on an MDN loss and a mean squared error of the generated handwriting sequences with respect to the original handwritten sequences. The first portion of the training data 122 can include original handwriting sequences with various styles and/or languages such that the example handwriting generation circuitry 104A-E can train the DHG model 124 to learn a generalized handwriting style. In some examples, the handwriting generation circuitry 104A-E causes the datastore 120 to maintain a generalized version of the DHG model 124 such that the generalized version of the DHG model 124 can be copied and at least a portion of the generalized version of the DHG model 124 can be adapted to styles presented by encountered users.
In response to training the DHG model 124 to learn the generalized handwriting style, the handwriting generation circuitry 104A-E can retrieve a second portion of the training data 122 (e.g., twenty or fewer handwritten words, numbers, and/or symbols from a user) via the user interface circuitry 126. For example, a user can write the second portion of the training data 122 on a touch screen of the user interface circuitry 126, scan the second portion of the training data 122 via a scanner of the user interface circuitry 126, and/or capture an image of the second portion of the training data 122 via a camera of the user interface circuitry 126. Alternatively, the user interface circuitry 126 can capture a handwriting sample from the user in any other manner. In some examples, when the user writes the second portion of the training data 122 on the touch screen of the user interface circuitry 126, the example handwriting generation circuitry 104A-E can utilize a first twenty words, numbers, and/or symbols written by the user as the second portion of the training data 122. Thus, the user can continue to write on the touch screen of the user interface circuitry 126 uninterrupted as the handwriting generation circuitry 104A-E retrieves the second portion of the training data 122 and enables the DHG model 124 to learn a specific style displayed by the writing from the user in substantially real time.
In turn, the example handwriting generation circuitry 104A-E can cause the DHG model 124 to generate handwriting sequences corresponding to the sequences obtained via the user interface circuitry 126. Specifically, the handwriting generation circuitry 104A-E can cause the DHG model 124 to learn a user-specific style displayed by the sequences obtained via the user interface circuitry 126 based on the MDN errors and the mean squared errors of the generated handwriting sequences with respect to the original handwriting sequences and/or the user-written handwriting sequences. During this user-specific training period, a first portion of the DHG model 124 remains fixed with weights and/or connections set based on the initial training period while a second portion of the DHG model 124 adapts to the style displayed by the user using few-shot learning techniques and empirically selected hyperparameters for brief learning tasks, such as learning rate, batch size, and/or a quantity of training steps. As a result, the example handwriting generation circuitry 104A-E can cause the DHG model 124 to adopt the handwriting style of the user in substantially real time as the user continues to write (e.g., on the touch screen of the user interface circuitry 126). In some examples, in response to training the DHG model 124 to learn the user-specific handwriting style, the handwriting generation circuitry 104A-E can cause the user interface circuitry 126 to prompt the user to provide a label for the handwriting style, such as a name of the user. In turn, the handwriting generation circuitry 104A-E can store the DHG model 124 specifically generated for the handwriting style of the user in the datastore 120. Similarly, the handwriting generation circuitry 104A-E can copy and adapt the generalized version of the DHG model 124 to other encountered handwriting styles and assign unique labels to the different handwriting styles. As such, ones of the DHG model 124 associated with the handwriting styles of the respective users can be utilized for example inference operations to generate digitized handwriting corresponding to the style of the user when the user is encountered.
During an example inference operation, the user interface circuitry 126 can provide a writing sample, such as a document image, plain text, handwriting sequences recorded on a touch screen to the handwriting generation circuitry 104A-E. Further, the handwriting generation circuitry 104A-E can identify a user style to be utilized based on configuration settings selected by the user at the user interface circuitry 126. In turn, the handwriting generation circuitry 104A-E can cause execution of the respective DHG model 124 associated with the user style and provide the writing sample as an input to cause the DHG model to output the writing in a digitized format that is searchable and editable.
The handwriting generation circuitry 104A-E of
The handwriting generation circuitry 104A-E of
The handwriting generation circuitry 104A-E of
The handwriting generation circuitry 104A-E of
(1−λ))MDNloss+λMSE(So,Sp),0<λ<1, Equation (1)
In Equation (1), So is representative of an original handwriting sequence vector, Sp is representative of a predicted handwriting sequence vector, MDNloss (e.g., mixture density network loss) is representative of a style loss, MSE is representative of a mean squared error that indicates a pixel-wise loss in coordinates of digital ink in the predicted handwriting sequence compared to coordinates of ink in the original handwriting sequence, and λ is a hyperparameter that applies weights to the MDNloss and the MSE for the error computation. In this example, configuration determination circuitry 220 and/or the generalized style generation circuitry 230 set the λ value as 0.2, which may be stored in the configuration data 292. In some examples, the value of 0.2 for λ results in the best (i.e., lowest) validation error scores. In this example, the coordinates are x-y coordinates on a page with a third temporal value and a binary value indicative whether the specific x-y coordinate contains ink. That is, the binary ink value can be 1 in locations where a pen did not contact a screen or substrate and 0 otherwise. Moreover, the x-y coordinates are integer values constrained by a resolution of (e.g., captured by) the user interface circuitry 126.
In the illustrated example, the MDNloss provides a first weighted error based on a difference between the distribution of the generated digital ink points for characters and/or words in the predicted handwriting sequence and an actual distribution of ink for characters and/or words in the original handwriting sequence. As such, the MDNloss helps train the DHG base model 286 to replicate styles of characters and/or words encountered in the general training data 282. Moreover, the MSE provides a second weighted error based on a difference between coordinates of the generated digital ink points and coordinates occupied by ink in the original handwriting sequence. As such, the MSE helps train the DHG base model 286 to replicate a relative position of ink points on a page and, in turn, helps prevent step-by-step deviations between coordinates of generated ink points and original ink (e.g., an uptrend, a downtrend, a left-trend, a right-trend) that would otherwise be left undetected with MDNloss alone. In some examples, during the initial training, generalized style generation circuitry 230 causes the DHG base model 286 to perform the parameterization and the reparameterization until the loss indicated by Equation (1) is no longer reducing or otherwise trends toward stability with training iterations. In some examples, during the initial training, generalized style generation circuitry 230 causes the DHG base model 286 to perform the parameterization and the reparameterization until the loss indicated by Equation (1) satisfies (e.g., is less than) a loss threshold. In some examples, during the initial training, generalized style generation circuitry 230 causes the DHG base model 286 to perform the parameterization and the reparameterization until all of the general training data 282 has been utilized as an input in the DHG base model 286.
The DHG base model 286 is a recurrent sequence model. Thus, the DHG base model 286 generates a sequence of coordinates in an autoregressive manner (i.e., predicts one coordinate step at a time and feeds the coordinate back to the DHG base model 286 as an input to predict the next sequence element). Additionally, the DHG base model 286 models the first order difference, which causes the starting coordinate difference for autoregressive generation to be temporally offset by one data point.
To enable the DHG base model 286 to be adaptable to styles displayed by various users, the generalized style generation circuitry 230 causes the DHG base model 286 to initialize hidden states during the initial training period. In some examples, the generalized style generation circuitry 230 is instantiated by processor circuitry executing generalized style generation instructions and/or configured to perform operations such as, for example, those instructions and/or operations represented by the flowchart of
The handwriting generation circuitry 104A-E of
In some examples, the author adaptation circuitry 240 causes the DHG user-specific model 288 to learn and/or adapt to new or additional user styles using optimization-based MAML, which is a few-shot learning technique that learns through a few gradient descent steps. Because the user-specific training data 284 includes a relatively small samples size (e.g., twenty or fewer handwritten words, numbers, and/or symbols), the author adaptation circuitry 240 can cause the DHG user-specific model 288 to efficiently update parameters (e.g., weighting values and/or connections) of the DHG base model 286 by running the DHG user-specific model 288 with the user-specific training data 284 as an input and adjusting parameters between run iterations using the gradient descent steps. In particular, the author adaptation circuitry 240 causes a first portion of the DHG user-specific model 288 to match (e.g., be maintained with, remain identical to, etc.) a configuration set in the DHG base model 286. Further, the author adaptation circuitry 240 causes a second portion of the DHG user-specific model 288 to be different from the DHG base model 286 and correspond with style characteristics displayed by the sample data from the user. Specifically, adaptations to the second portion of the DHG user-specific model 288 can correspond to a vector (e.g., a style encoding vector) that is computed by taking a given real coordinate sequence from the user and running the vector through the DHG user-specific model 288 with few gradient descent steps to compute weights and/or connections to be utilized in the second portion of the DHG user-specific model 288 (e.g., the final hidden states of the DHG user-specific model 288). As such, the style encoding vector for the specific user can be computed once to update the DHG user-specific model 288 and then utilized to digitize any subsequent writing from the user.
In some examples, the author adaptation circuitry 240 assigns a label to the DHG user-specific model 288 and/or the specific style encoding vector generated for the user based on an indication from the user (e.g., a response to a prompt at the user interface circuitry 126 requesting a name for the handwriting style). In response to training the DHG user-specific model 288, the author adaptation circuitry 240 can create an executable file (e.g., the DHG executable 290) corresponding to the DHG user-specific model 288 for subsequent digitized handwriting generation for the given user. Accordingly, the author adaptation circuitry 240 can create ones of the DHG user-specific model 288 and/or the DHG executable 290 for respective handwriting styles encountered (e.g., users encountered). In some examples, the author adaptation circuitry 240 is instantiated by processor circuitry executing author adaptation instructions and/or configured to perform operations such as, for example, those instructions and/or operations represented by the flowchart of
The handwriting generation circuitry 104A-E of
The handwriting generation circuitry 104A-E of
In the illustrated example, the electronic system 102 can receive the writing from the user as an image of a written document, a text document with text to be converted to digitized handwriting, and/or a direct writing input (e.g., with a cursor on a computer or laptop, with contact on a touchscreen of a tablet, mobile device, or interactive whiteboard, etc.). In some examples, the style transformation circuitry 260 causes the optical character recognition circuitry 250 to identify characters in the writing in response to the writing received as an image or a direct writing input. In turn, the style transformation circuitry 260 can cause execution of the DHG user-specific model 288 and/or the respective DHG executable 290 to convert the writing into digitized content. In some examples, the style transformation circuitry 260 outputs the digitized content as a scalable vector in an SVG format or a compressed SVGZ format that reduces (i.e., minimizes) storage space occupied by the digitized content. In some examples, the style transformation circuitry 260 is instantiated by processor circuitry executing style transformation instructions and/or configured to perform operations such as, for example, those instructions and/or operations represented by the flowchart of
The handwriting generation circuitry 104A-E of
In intra-class evaluations, the model evaluation circuitry 270 utilizes the original handwriting from the user as the key image and utilizes the digitized content that the DHG user-specific model 288 and/or the DHG executable 290 produced for the original handwriting as the query image. In some example intra-class evaluations, the model evaluation circuitry 270 utilizes the original handwriting from the user as the key image and utilizes digitized content that the DHG user-specific model 288 and/or the DHG executable 290 produced for a different handwriting sequence from the user as the query image to enable style to be compared across different text. In inter-class evaluations, the model evaluation circuitry 270 utilizes digitized content generated for a first user by a first one(s) of the DHG user-specific model 288 and/or the DHG executable 290 as the key image and utilizes digitized content generated for a second user by a second one(s) of the DHG user-specific model 288 and/or the DHG executable 290 as the query image to evaluate distinctions between generated handwriting styles.
In some examples, the model evaluation circuitry 270 includes and/or utilizes a feature extractor (e.g., a Visual Geometry Group (VGG) model, VGG16, etc.) and a convolutional neural network (CNN) to extract and identify features in the key and query images that have an impact on the classification. In turn, the model evaluation circuitry 270 can compute a cosine similarity of the identified features in the key and query images. In response to the cosine similarity satisfying (e.g., being greater than) a cosine similarity threshold, the model evaluation circuitry 270 classifies the key and query images as being from the same user. Conversely, in response to the cosine similarity not satisfying (e.g., being less than) the cosine similarity threshold, the model evaluation circuitry 270 classifies the key and query images as being from different users. Therefore, the model evaluation circuitry 270 can evaluate an accuracy as well as an adaptability of ones of the DHG user-specific model 288.
In some examples, after the model evaluation circuitry 270 performs the inter-class evaluation with at least a certain quantity of samples, the model evaluation circuitry 270 can cause the author adaptation circuitry 240 to modify or replace (e.g., retrain) the DHG user-specific model 288 in response to a percentage of the key and query images in the inter-class evaluation not satisfying the cosine similarity threshold being greater than a threshold percentage. In such examples, the model evaluation circuitry 270 can consider a quantity of samples recently encountered (e.g., a most recent 20 samples encountered from the user, a most recent 50 samples encountered from the user, etc.) to enable the author adaptation circuitry 240 to adapt the DHG user-specific model 288 when the handwriting style of the user changes (e.g., as the user matures). In such examples, the interface circuitry 210 can replace a previous version of the user-specific training data 284 with handwriting sequences more recently encountered from the user for a second user-specific training period. In some examples, the model evaluation circuitry 270 is instantiated by processor circuitry executing model evaluation instructions and/or configured to perform operations such as, for example, those instructions and/or operations represented by the flowchart of
The handwriting generation circuitry 104A-E of
The training dataset 306 of the illustrated example includes an example online handwriting sequence data corpus 316 and an example one-hot representation of character sets database 318. In some examples, data in the one-hot representation of character sets database 318 is indicative of characters in handwriting sequences in the online handwriting sequence data corpus 316.
The first stage 310 of the illustrated example includes an example feature extractor 320, an example attention network 322, an example sequence-to-sequence generator network 324, and an example handwriting sequence generator 326. In the illustrated example, the feature extractor 320 processes the handwriting sequences in the online handwriting sequence data corpus 316 to learn features of characters in the handwriting sequences. For example, the feature extractor 320 can be a deep learning model that automatically learns features of encountered characters.
In the illustrated example, the attention network 322 identifies the characters corresponding to the features identified by the feature extractor 320 using the one-hot representation of character sets database 318. In turn, the sequence-to-sequence generator network 324 can correlate the characters with the identified features and indicate such correlations between the characters and the identified features to the handwriting sequence generator 326. In some examples, the sequence-to-sequence generator network 324 parameterizes predictions of distributions of coordinates for characters in the DHG model using a MDN. For example, the sequence-to-sequence generator network 324 can parameterize weights and/or connections between nodes in the DHG model based on the identified features for respective characters. Furthermore, the handwriting sequence generator 326 can reparametrize the weights and/or connections between nodes in the DHG model based on an error computed by a reparameterization layer using Equation (1). In response to the error no longer decreasing with training iterations (e.g., in response to the error being minimized), the handwriting sequence generator 326 can output a base model (e.g., the DHG base model 286) capable of generating digitized handwriting in a generic style learned through the various handwriting sequences in the online handwriting sequence data corpus 316.
The second stage 312 of the illustrated example includes the example user-specific handwriting sequences 308, an example few-shot learning model 328 (identified by FEW-SHOT LEARNING OF HANDWRITING SEQUENCE GENERATOR), and an example author adaptor network 330. The example user-specific handwriting sequences 308 include handwritten words, symbols, and/or numbers from the user, such as writing obtained from the user at the user interface circuitry 126 of
During the user-specific training operation, the example few-shot learning model 328 adjusts weights and/or connections of a first portion of the base model (e.g., the MDN) using the reparameterization layer while maintaining the weights and/or connections in a second portion of the base model (e.g., LSTMs), which is defined at a lower level in the model than the first portion. Specifically, the example few-shot learning model 328 utilizes optimization-based MAML techniques to modify the weights and/or connections in the first portion of the base model using a few gradient descent steps. Furthermore, the example author adaptor network 330 can form a user-specific model based on the modifications to the weights and/or connections from the few gradient descent steps that result in the lowest error using Equation (1).
The third stage 314 of the illustrated example includes the example style transformer network 302 to input writing into the user-specific model to generate the example digitized handwriting output 304 (e.g., an SVGZ file). In the illustrated example, the third stage 314 includes writing inputs, such as an image 332 (identified by HANDWRITING CAPTURE DOC), text 334, and/or handwriting sequences 336 directly encountered by the user interface circuitry 126. In some examples, an example optical character recognition network 338 (e.g., the optical character recognition circuitry 250 of
The attention layer 402 of the illustrated example determines text (i.e., characters) to be generated as well as a digitized handwriting style in which the text is to be generated. In some examples, the input sequence vectors 416 provide data corresponding to an original handwriting sequence from a user to the sequence generation layer 404. In some examples, the one-hot vectors 418 are indicative of the characters (e.g., ASCII characters) associated with the original handwriting sequence.
During example initial training operations, the original handwriting sequence and the associated characters to be converted to digitized handwriting are downloaded or accessed via an online database (e.g., IAM On-Line Handwriting Database) and communicated to the attention layer 402 as inputs. During example user-specific training operations, the original handwriting sequence and the associated characters to be converted to digitized handwriting are obtained from the user (e.g., via the user interface circuitry 126 of
The sequence generation layer 404 of the illustrated example generates handwriting coordinate sequences with attention. In some examples, the first LSTM 420, the second LSTM 424, and the third LSTM 426 generate digital ink points (e.g., the handwriting coordinates) based on the original handwriting sequence and the associated characters to be converted to digitized handwriting. In some examples, the first LSTM 420, the second LSTM 424, and the third LSTM 426 generate a mixture of 2-dimensional (2-D) Gaussians to generate the digital ink points. In some examples, the window layer 422 is a sliding window that directs the attention of the first LSTM 420, the second LSTM 424, and the third LSTM 426, respectively, to different portions of the original handwriting sequence and the associated characters. In turn, the window layer 422 enables the first LSTM 420, the second LSTM 424, and the third LSTM 426 to direct respective attention to different parts of the input (e.g., the original handwriting sequence) when generating corresponding parts of an output (e.g., the digital ink points). By utilizing attention during example training operations, the first LSTM 420, the second LSTM 424, and the third LSTM 426 are able to learn the output corresponding to the input at a faster rate and with improved accuracy. Moreover, by utilizing attention during example inference operations, the first LSTM 420, the second LSTM 424, and the third LSTM 426 are able to predict digital ink points corresponding to the original handwriting sequence at a faster rate to increase a rate at which the DHG neural network generates digitized handwriting.
The MDN layer 406 of the illustrated example determines or learns a distribution of the digital ink points (e.g., the handwriting coordinates) generated by the sequence generation layer 404 for the original handwriting sequence and the associated text. During example training operations, the MDN layer 406 can parameterize the sequence generation layer 404 based on an error of the sequence prediction vectors 414 with respect to the input sequence vectors 416. In particular, the MDN layer 406 outputs parameters of a mixture model (πt, μjt, σjxt, σjyt, ρjt) from which the next digital ink point is generated by the sequence generation layer 404. For example, the MDN layer 406 can determine a probability of the next digital ink point using Equation (2).
In Equation (2), πt denotes the mixture weights, which is a vector of M weights that sums to 1, μjt denotes the location or mean of the jth bivariate Gaussian mixture component, σ and ρ represent the variance and covariance of the jth bivariate mixture component, respectively, and et is a Boolean variable denoting pen up or down. Because the parameters of the mixture model are given by the MDN layer 406, an output (e.g., the sequence prediction vectors 414) cannot be directly generated with a forward pass through the machine learning model architecture 400. Instead, the MDN layer 406 backpropagates a distribution of the parameters of the mixture model (e.g., as 2-D Gaussians) at every time step to the sequence generation layer 404. In turn, the sequence generation layer 404 can generate a coordinate point for the digital ink based on the distribution of the parameters of the mixture model and sampling of the mixture components. In turn, the sequence generation layer 404 can feed the generated coordinate point to the MDN layer 406, which utilizes the generated coordinate point to guide the generation of 2-D Gaussians that can be fed back to the sequence generation layer 404 as the process repeats for a subsequent coordinate point.
While backpropagation is defined for deterministic functions and not stochastic functions, the reparameterization layer 408 of the illustrated example enables the backpropagation to occur in the example machine learning model architecture 400 of the illustrated example. The reparameterization layer 408 of the illustrated example outputs the sequence prediction vectors 414 based on the coordinates of the digital ink points generated by the sequence generation layer 404 and the distribution of the coordinates generated by the MDN layer 406. During example training operations, the example reparameterization layer 408 utilizes Equation (1) to compute a loss between the sequence prediction vectors 414 and the input sequence vectors 416. In turn, the reparameterization layer 408 can utilize the computed loss to determine a reparameterization of the MDN layer 406.
Specifically, the reparameterization layer 408 converts the representation of a random variable, X, following a complicated distribution, into a deterministic part and a stochastic part. The stochastic part contains a random variable from known simple distributions. For example, if X˜N(μ, σ2) is a univariate random variable following a Gaussian distribution with location parameter μ and scale parameter σ, then, using the reparameterization process or trick, the univariate random variable can be computed using Equation (3).
X=μ+σ·Z, Equation (3)
In Equation (3), Z˜N(0, 1). Furthermore, the deterministic part is μ and the stochastic part is a Z, which can be represented in terms of a known simple distribution (i.e., a standard Gaussian distribution). The reparameterization layer 408 extends this to multi-variate Gaussians by changing Equation (3) to vector form, as shown in Equation (4).
X=μ+Σ
sqrt
z,z˜N(0,I), Equation (4)
In Equation (4), z represents samples from standard bivariate Gaussian mixtures with mean (0, 0) and a 2×2 identity covariance matrix, and Σ is the multivariate extension of σ2. The reparameterization layer 408 can determine Σ using Equation (5) and, in turn, Σsqrt using Equation (6), Equation (7), and Equation (8).
Because the reparameterization trick does not readily extend to mixture density models as a result of difficulties associated with reparameterization of the discrete distribution over mixture weights, the reparameterization layer 408 performs a 2-step reparameterization. Specifically, the reparameterization layer 408 performs a first reparameterization for the mixture components and a second reparameterization for the categorically distributed mixture weights. In particular, the reparameterization layer 408 performs the first reparameterization of the mixture components using the Gumbel distributed variables 410. Additionally, the reparameterization layer 408 performs the second reparameterization of the categorically distributed mixture weights using a Gumbel-max trick with the Gumbel Softmax random variables 412, which can be defined using Equation (9), for example.
In Equation (9), τ>0 is a temperature parameter. When τ→0, the vector xk becomes close to one-hot, following categorical distribution. On the other hand, when τ→+∞ the vector xk becomes uniformly distributed, and all samples look the same. During example training operations, the reparameterization layer 408 sets the temperature parameter, τ, to 0.01.
The Gumbel max-trick enables the reparameterization layer 408 to express a discrete categorical variable as a deterministic function of the class probabilities and independent random variables (i.e., the Gumbel distributed variables 410). For example, K with possible values having respective probabilities of πk can be represented as discrete categorically distributed variables, X˜P(πk). While a sample from the distribution of the discrete categorically distributed variables can be represented as
where ∈k is one of Gumbel distributed variables 410, the argmax function is non-differentiable and cannot be used in a backpropagation setting. Instead, the reparameterization layer 408 replaces the argmax with the softmax function in Equation (9) to make the function differentiable and, thus, enables backpropagation.
As a result, the reparameterization layer 408 can first compute one-hot samples mi for mixture weights using Gumbel softmax reparameterization and subsequently compute the samples for each component of the bivariate Gaussian mixtures using reparameterization. Specifically, by setting the temperature parameter, τ, sufficiently small (e.g., at 0.01) in Equation (9), most of the mi are close to 0 and only one mi is close to 1. In turn, the reparameterization layer 408 enables the coordinates for the digital ink points to be directly computed using Equation (10).
(xt+1,yt+1)T=+Σimi(μi+Sizt), Si=sqrtΣi, Equation (10)
In some examples, in advance of training the DHG model, the model evaluation circuitry 270 of
As such, the example sequence generation layer 404 and the example MDN layer 406 can directly compute the sequence prediction vectors 414. During example training operations, the reparameterization layer 408 compares the computed sequence prediction vectors 414 with the input sequence vectors 416. Because the MDN layer 406 determines the distribution of the coordinates of the digital ink points and, thus, models a difference series between successive coordinates generated by the sequence generation layer 404, the reparameterization layer 408 compares a cumulative sum of the computed sequence prediction vectors 414 with a cumulative sum of the input sequence vectors 416 to determine the MSE and the MDN loss of Equation (1). Accordingly, the example reparameterization layer 408 can compute the overall error between the computed sequence prediction vectors 414 and the input sequence vectors 416. Further, the example reparameterization layer 408 utilizes gradient descent to modify weights and/or connections in the sequence generation layer 404 and/or the MDN layer 406 to reduce (i.e., minimize) the overall error between the computed sequence prediction vectors 414 and the input sequence vectors 416. In response to determining weights and/or connections in the sequence generation layer 404 and the MDN layer 406 that result in a small (i.e., minimal) error with a generalized training dataset (e.g., the general training data 282 of
Further, to adapt the machine learning model architecture 400 to a specific handwriting style of a user, the author adaptation circuitry 240 of
In some examples, the handwriting generation circuitry 104A-E includes first means for training a machine learning model to generate at least a first digitized handwriting sequence based on at least one handwriting sample in at least one handwriting dataset. For example, the first means for training may be implemented by the generalized style generation circuitry 230. In some examples, the generalized style generation circuitry 230 may be instantiated by processor circuitry such as the example processor circuitry 1012 of
In some examples, the handwriting generation circuitry 104A-E includes second means for training the machine learning model to generate at least a second digitized handwriting sequence based on a user handwriting sample. In some examples, the second means for training is to maintain a first portion of the machine learning model configured by the first means for training and modify a second portion of the machine learning model using model-agnostic meta-learning. For example, the second means for training may be implemented by the author adaptation circuitry 240. In some examples, the author adaptation circuitry 240 may be instantiated by processor circuitry such as the example processor circuitry 1012 of
In some examples, the handwriting generation circuitry 104A-E means for implementing a machine learning model, such as the DHG model 124 of
While an example manner of implementing the handwriting generation circuitry 104A-E of
Flowcharts representative of example machine readable instructions, which may be executed to configure processor circuitry to implement the handwriting generation circuitry 104A-E of
The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a compiled format, an executable format, a packaged format, etc. Machine readable instructions as described herein may be stored as data or a data structure (e.g., as portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers) located at the same or different locations of a network or collection of networks (e.g., in the cloud, in edge devices, etc.). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, compilation, etc., in order to make them directly readable, interpretable, and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and/or stored on separate computing devices, wherein the parts when decrypted, decompressed, and/or combined form a set of machine executable instructions that implement one or more operations that may together form a program such as that described herein.
In another example, the machine readable instructions may be stored in a state in which they may be read by processor circuitry, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc., in order to execute the machine readable instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, machine readable media, as used herein, may include machine readable instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s) when stored or otherwise at rest or in transit.
The machine readable instructions described herein can be represented by any past, present, or future instruction language, scripting language, programming language, etc. For example, the machine readable instructions may be represented using any of the following languages: C, C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language (HTML), Structured Query Language (SQL), Swift, etc.
As mentioned above, the example operations of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc., may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, or (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, or (3) at least one A and at least one B.
As used herein, singular references (e.g., “a”, “an”, “first”, “second”, etc.) do not exclude a plurality. The term “a” or “an” object, as used herein, refers to one or more of that object. The terms “a” (or “an”), “one or more”, and “at least one” are used interchangeably herein. Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., the same entity or object. Additionally, although individual features may be included in different examples or claims, these may possibly be combined, and the inclusion in different examples or claims does not imply that a combination of features is not feasible and/or advantageous.
At block 504, the handwriting generation circuitry 104A-E identifies a handwriting sequence to digitize. For example, the handwriting generation circuitry 104A-E can execute the DHG model 124 with a portion of the training data 122 as an input. Specifically, the portion of the training data 122 can include vectors corresponding to a sequence of handwritten words, numbers, and/or symbols. In some examples, the configuration determination circuitry 220 (
At block 506, the handwriting generation circuitry 104A-E identifies characters in the handwriting sequence. For example, the handwriting generation circuitry 104A-E can determine characters in the handwriting sequence based on a one-hot vector associated with the handwriting sequence in the portion of the training data 122. In some examples, the generalized style generation circuitry 230 identifies the one-hot vector indicative of ASCII characters in the handwriting sequence in the general training data 282. In turn, the example generalized style generation circuitry 230 can cause the one-hot vector to serve as the one-hot vector 418 (
At block 508, the handwriting generation circuitry 104A-E learns features of the characters in the handwriting sequence. For example, the generalized style generation circuitry 230 can cause execution of the DHG base model 286 with the handwriting sequence and characters as inputs to cause the DHG baser model 286 to identify and/or learn features of the characters in the handwriting sequence. In some examples, the sequence generation layer 404 (
At block 510, the handwriting generation circuitry 104A-E generates coordinates for digital ink points in the handwriting sequence. For example, the generalized style generation circuitry 230 can cause execution of the DHG base model 286 to generate the coordinates for the digital ink points corresponding to ink of the characters in the handwriting sequence. In some examples, the first LSTM 420, the second LSTM 424, and/or the third LSTM 426 generate the coordinates as a displacement from a previously generated coordinate. The window layer 422 (
At block 512, the handwriting generation circuitry 104A-E distributes the coordinates to generate a digitized handwriting sequence. For example, the generalized style generation circuitry 230 can cause execution of the DHG base model 286 to distribute the generated coordinates to put together words and/or sentences in the handwriting sequence. In some examples, the MDN layer 406 (
At block 514, the handwriting generation circuitry 104A-E computes an error of the digitized handwriting sequence relative to the original handwriting sequence. For example, the generalized style generation circuitry 230 can cause execution of the DHG base model 286 to cause the error to be computed using Equation (1). In some examples, the reparameterization layer 408 computes the MDN loss and the MSE between the sequence prediction vectors 414 and the input sequence vectors 416. Specifically, the MSE can be indicative of a difference between the generated coordinates and coordinates of ink points in the input handwriting sequence. The MDN loss can be indicative of a difference between the distribution of the generated coordinates for the characters in the digitized handwriting sequence and the distribution of the coordinates for the characters in the input handwriting sequence. Further, the reparameterization layer 408 can apply weights to the MDN loss and the MSE, respectively, using the hyperparameter, λ, as shown in Equation (1). In some examples, the reparameterization layer 408 provides a first weight (e.g., 0.2) to the MSE and a second weight (e.g., 0.8) to the MDN loss. Accordingly, the reparameterization layer 408 can compute the weighted sum of the MSE and the MDN loss to determine the overall error between the input handwriting sequence and the digitized handwriting sequence.
At block 516, the handwriting generation circuitry 104A-E determines whether to advance the DHG model 124 to user-specific training. For example, the generalized style generation circuitry 230 can determine the DHG base model 286 is set and, thus, a general style training is complete in response to the error being minimized and/or satisfying a threshold. In some examples, the generalized style generation circuitry 230 determines the general style training is complete in response to utilizing all of the general training data 282 as inputs during iterations of executing the DHG base model 286. In response to the generalized style generation circuitry 230 determining not to advance to the user-specific training, the operations 500 proceed to block 518. Otherwise, the operations terminate.
At block 518, the handwriting generation circuitry 104A-E parameterizes the DHG model 124. For example, generalized style generation circuitry 230 can cause execution of the DHG base model 286 to cause a parameterization of a coordinate generation process associated with generating the coordinates for the digitized handwriting (e.g., at block 510). In some examples, the MDN layer 406 modifies weights and/or connections in the sequence generation layer 404 to parameterize the coordinate generation process based on the computed error.
At block 520, the handwriting generation circuitry 104A-E reparameterizes the DHG model 124. For example, the generalized style generation circuitry 230 can cause execution of the DHG base model 286 to cause a reparameterization of a coordinate distribution process. In some examples, the reparameterization layer 408 modifies weights and/or connections in the MDN layer 406 to reparameterize the coordinate distribution process based on the computed error. In some examples, the reparameterization layer 408 performs a first reparameterization of mixture components utilized to generate the distribution of the coordinates and a second reparameterization of categorically distributed mixture weights. In response to performing the reparameterization of the DHG model 124, the operations 500 return to block 504.
At block 604, the handwriting generation circuitry 104A-E identifies hyperparameters associated with few-shot learning. For example, the handwriting generation circuitry 104A-E can identify the hyperparameters in the DHG model 124. In some examples, the configuration determination circuitry 220 (
At block 606, the handwriting generation circuitry 104A-E generates digitized handwriting corresponding to the user-specific training data. For example, the handwriting generation circuitry 104A-E can cause execution of the DHG model 124 using the portion of the training data 122 as an input to generate the digitized handwriting. In some examples, the author adaptation circuitry 240 causes execution of the DHG user-specific model 288 with the user-specific training data 284 as an input to generate the digitized handwriting.
At block 608, the handwriting generation circuitry 104A-E causes a reparameterization of the machine learning model based on the handwriting style of the particular user using few gradient descent steps. For example, the handwriting generation circuitry 104A-E can cause execution of the DHG model 124 using the portion of the training data 122 as an input to cause a first portion of the DHG model 124 to cause reparameterization of a second portion of the DHG model 124 while a third portion of the DHG model 124 remains fixed. In some examples, the author adaptation circuitry 240 causes execution of the DHG user-specific model 288 with the user-specific training data 284 as an input to cause the reparameterization layer 408 to reparameterize the MDN layer 406 while the sequence generation layer 404 remains fixed (e.g., remains the same as in the DHG base model 286). In some examples, the author adaptation circuitry 240 causes a few cycles of reparameterization to occur by causing execution of the DHG user-specific model 288 a few times with the user-specific training data 284 as the input to parameters associated with a minimal error to be determined using few gradient descent steps. That is, the reparameterization layer 408 can modify the weights and/or connections in the MDN layer 406 more than once to test different weights and/or connections and identify parameters associated with the lowest error for the style displayed by the user.
At block 704, the handwriting generation circuitry 104A-E identifies characters in the writing sequence. For example, in response to the writing being an in image format or a handwritten format, the optical character recognition circuitry 250 (
At block 706, the handwriting generation circuitry 104A-E executes the user-specific DHG model 288 to generate digitized handwriting corresponding to the writing sequence. For example, the style transformation circuitry 260 can convert the DHG user-specific model 288 (
At block 804, the handwriting generation circuitry 104A-E identifies which extracted features are important for a classification of the writings. For example, the model evaluation circuitry 270 can include and/or utilize a convolutional neural network (CNN) to identify the important features that have an impact on a similarity classification between the writings. An important feature is a feature that is used in the example operations 800 to identify a character of the writing. For example, an important feature may be a feature and/or a combination of features that is unique to a particular letter, number, symbol, and/or other text. An important feature may be a feature and/or a combination of features indicative of a writing style, a language, a writing direction, etc. Different features of words, letters, numbers, symbols, and/or other text may be defined as important in different contexts or processes. Any feature of writing useful in classification of the writing may be an important feature.
At block 806, the handwriting generation circuitry 104A-E computes a cosine similarity between the important features of the original handwriting sequence and the digitized handwriting sequence. For example, the model evaluation circuitry 270 can compute the cosine similarity of the important features.
At block 808, the handwriting generation circuitry 104A-E determines whether a similarity threshold is satisfied. For example, the model evaluation circuitry 270 can compare the cosine similarity to the similarity threshold. In response to the cosine similarity satisfying (e.g., being greater than) the similarity threshold, the operations 800 proceed to block 810. Otherwise, in response to the cosine similarity not satisfying (e.g., being less than) the similarity threshold, the operations 800 proceed to block 812.
At block 810, the handwriting generation circuitry 104A-E classifies the digitized handwriting sequence as matching the original handwriting sequence. For example, the model evaluation circuitry 270 can classify the original and digitized handwriting sequences as a match in response to the cosine similarity threshold being satisfied.
At block 812, the handwriting generation circuitry 104A-E classifies the digitized handwriting sequence as different from the original handwriting sequence. For example, the model evaluation circuitry 270 can classify the original and digitized handwriting sequences as different in response to the cosine similarity threshold not being satisfied.
The processor platform 1000 of the illustrated example includes processor circuitry 1012. The processor circuitry 1012 of the illustrated example is hardware. For example, the processor circuitry 1012 can be implemented by one or more integrated circuits, logic circuits, FPGAs, microprocessors, CPUs, GPUs, DSPs, and/or microcontrollers from any desired family or manufacturer. The processor circuitry 1012 may be implemented by one or more semiconductor based (e.g., silicon based) devices. In this example, the processor circuitry 1012 implements the example configuration determination circuitry 220, the example generalized style generation circuitry 230, the example author adaptation circuitry 240, the example optical character recognition circuitry 250, the example style transformation circuitry 260, and the example model evaluation circuitry 270.
The processor circuitry 1012 of the illustrated example includes a local memory 1013 (e.g., a cache, registers, etc.). The processor circuitry 1012 of the illustrated example is in communication with a main memory including a volatile memory 1014 and a non-volatile memory 1016 by a bus 1018. The volatile memory 1014 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAIVIBUS® Dynamic Random Access Memory (RDRAM®), and/or any other type of RAM device. The non-volatile memory 1016 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1014, 1016 of the illustrated example is controlled by a memory controller 1017.
The processor platform 1000 of the illustrated example also includes interface circuitry 1020. The interface circuitry 1020 may be implemented by hardware in accordance with any type of interface standard, such as an Ethernet interface, a universal serial bus (USB) interface, a Bluetooth® interface, a near field communication (NFC) interface, a Peripheral Component Interconnect (PCI) interface, and/or a Peripheral Component Interconnect Express (PCIe) interface. In this example, the interface circuitry 1020 implements the interface circuitry 210 of
In the illustrated example, one or more input devices 1022 are connected to the interface circuitry 1020. The input device(s) 1022 permit(s) a user to enter data and/or commands into the processor circuitry 1012. The input device(s) 1022 can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, an isopoint device, and/or a voice recognition system.
One or more output devices 1024 are also connected to the interface circuitry 1020 of the illustrated example. The output device(s) 1024 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube (CRT) display, an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer, and/or speaker. The interface circuitry 1020 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip, and/or graphics processor circuitry such as a GPU.
The interface circuitry 1020 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) by a network 1026. The communication can be by, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, an optical connection, etc.
The processor platform 1000 of the illustrated example also includes one or more mass storage devices 1028 to store software and/or data. Examples of such mass storage devices 1028 include magnetic storage devices, optical storage devices, floppy disk drives, HDDs, CDs, Blu-ray disk drives, redundant array of independent disks (RAID) systems, solid state storage devices such as flash memory devices and/or SSDs, and DVD drives.
The machine readable instructions 1032, which may be implemented by the machine readable instructions of
The processor platform 1000 of the illustrated example of
In some examples, the GPU 1040 may implement the first hardware accelerator 108, the second hardware accelerator 110, and/or the general purpose processor circuitry 112 of
The cores 1102 may communicate by a first example bus 1104. In some examples, the first bus 1104 may be implemented by a communication bus to effectuate communication associated with one(s) of the cores 1102. For example, the first bus 1104 may be implemented by at least one of an Inter-Integrated Circuit (I2C) bus, a Serial Peripheral Interface (SPI) bus, a PCI bus, or a PCIe bus. Additionally or alternatively, the first bus 1104 may be implemented by any other type of computing or electrical bus. The cores 1102 may obtain data, instructions, and/or signals from one or more external devices by example interface circuitry 1106. The cores 1102 may output data, instructions, and/or signals to the one or more external devices by the interface circuitry 1106. Although the cores 1102 of this example include example local memory 1120 (e.g., Level 1 (L1) cache that may be split into an L1 data cache and an L1 instruction cache), the microprocessor 1100 also includes example shared memory 1110 that may be shared by the cores (e.g., Level 2 (L2 cache)) for high-speed access to data and/or instructions. Data and/or instructions may be transferred (e.g., shared) by writing to and/or reading from the shared memory 1110. The local memory 1120 of each of the cores 1102 and the shared memory 1110 may be part of a hierarchy of storage devices including multiple levels of cache memory and the main memory (e.g., the main memory 1014, 1016 of
Each core 1102 may be referred to as a CPU, DSP, GPU, etc., or any other type of hardware circuitry. Each core 1102 includes control unit circuitry 1114, arithmetic and logic (AL) circuitry (sometimes referred to as an ALU) 1116, a plurality of registers 1118, the local memory 1120, and a second example bus 1122. Other structures may be present. For example, each core 1102 may include vector unit circuitry, single instruction multiple data (SIMD) unit circuitry, load/store unit (LSU) circuitry, branch/jump unit circuitry, floating-point unit (FPU) circuitry, etc. The control unit circuitry 1114 includes semiconductor-based circuits structured to control (e.g., coordinate) data movement within the corresponding core 1102. The AL circuitry 1116 includes semiconductor-based circuits structured to perform one or more mathematic and/or logic operations on the data within the corresponding core 1102. The AL circuitry 1116 of some examples performs integer based operations. In other examples, the AL circuitry 1116 also performs floating point operations. In yet other examples, the AL circuitry 1116 may include first AL circuitry that performs integer based operations and second AL circuitry that performs floating point operations. In some examples, the AL circuitry 1116 may be referred to as an Arithmetic Logic Unit (ALU). The registers 1118 are semiconductor-based structures to store data and/or instructions such as results of one or more of the operations performed by the AL circuitry 1116 of the corresponding core 1102. For example, the registers 1118 may include vector register(s), SIMD register(s), general purpose register(s), flag register(s), segment register(s), machine specific register(s), instruction pointer register(s), control register(s), debug register(s), memory management register(s), machine check register(s), etc. The registers 1118 may be arranged in a bank as shown in
Each core 1102 and/or, more generally, the microprocessor 1100 may include additional and/or alternate structures to those shown and described above. For example, one or more clock circuits, one or more power supplies, one or more power gates, one or more cache home agents (CHAs), one or more converged/common mesh stops (CMSs), one or more shifters (e.g., barrel shifter(s)) and/or other circuitry may be present. The microprocessor 1100 is a semiconductor device fabricated to include many transistors interconnected to implement the structures described above in one or more integrated circuits (ICs) contained in one or more packages. The processor circuitry may include and/or cooperate with one or more accelerators. In some examples, accelerators are implemented by logic circuitry to perform certain tasks more quickly and/or efficiently than can be done by a general purpose processor. Examples of accelerators include ASICs and FPGAs such as those discussed herein. A GPU or other programmable device can also be an accelerator. Accelerators may be on-board the processor circuitry, in the same chip package as the processor circuitry and/or in one or more separate packages from the processor circuitry.
More specifically, in contrast to the microprocessor 1100 of
In the example of
The configurable interconnections 1210 of the illustrated example are conductive pathways, traces, vias, or the like that may include electrically controllable switches (e.g., transistors) whose state can be changed by programming (e.g., using an HDL instruction language) to activate or deactivate one or more connections between one or more of the logic gate circuitry 1208 to program desired logic circuits.
The storage circuitry 1212 of the illustrated example is structured to store result(s) of the one or more of the operations performed by corresponding logic gates. The storage circuitry 1212 may be implemented by registers or the like. In the illustrated example, the storage circuitry 1212 is distributed amongst the logic gate circuitry 1208 to facilitate access and increase execution speed.
The example FPGA circuitry 1200 of
Although
In some examples, the processor circuitry 1012 of
A block diagram illustrating an example software distribution platform 1305 to distribute software such as the example machine readable instructions 1032 of
From the foregoing, it will be appreciated that example systems, methods, apparatus, and articles of manufacture have been disclosed that generate digitized handwriting with style adaptations specific to the unique handwriting styles of different users. In addition, to many benefits disclosed above, example systems, methods, apparatus, and articles of manufacture disclosed herein are agnostic to language, alphabet, and writing direction (e.g., left to right, right to left, top to bottom). Thus, examples disclosed herein are useful in many applications throughout the world.
Additionally, example systems, methods, apparatus, and articles of manufacture disclosed herein can be utilized in conjunction with a speech detection system to generate digitized handwriting in a style of user based on detected spoken words. For example, when the speech detection system detects words spoken by a user, the example systems, methods, apparatus, and articles of manufacture disclosed herein can generate digitized handwriting indicative of the words in a user-specific style. In some examples, the user-specific style of the digitized handwriting is attributable to a voice through which the spoken words are presented. For instance, when a professor gives a lecture, an electronic device of the user can detect a voice of the professor and generate digitized handwriting in a style specific to the professor. In some other examples, the user-specific style of the digitized handwriting is agnostic to the voice through which the spoken words are presented. For example, when the professor gives the lecture, the electronic device of the user can generate digitized handwriting corresponding to the spoken words in a style specific to the user or in a style specific to a third party.
The following claims are hereby incorporated into this Detailed Description by this reference. Although certain example systems, methods, apparatus, and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all systems, methods, apparatus, and articles of manufacture fairly falling within the scope of the claims of this patent.