This description relates to identifying individual fonts present in images by using one or more techniques such as artificial intelligence.
Graphic designers along with other professionals are often interested in identifying fonts noticed in various media (e.g., appearing on signs, books, periodicals, etc.) for later use. Some may take a photo of the text represented in the font of interest and later attempt to manually identify the font, which can be an extremely laborious and tedious task. To identify the font, the individual may need to exhaustively explore a seemingly endless list of hundreds or even thousands of alphabetically ordered fonts.
The described systems and techniques are capable of effectively identifying fonts in an automatic manner from an image (e.g., a photograph) by using artificial intelligence. By extensive training of a machine learning system, font identification can be achieved with a high probability of success. Along with using a relative large font sample set, training the machine learning system can include using different image types and augmented images (e.g., distorted images) of fonts so the system is capable of recognizing fonts presented in less than pristine imagery.
In one aspect, a computing device implemented method includes receiving an image that includes textual content in at least one font. The method also includes identifying the at least one font represented in the received image using a machine learning system. The machine learning system being trained using images representing a plurality of training fonts. A portion of the training images includes text located in the foreground and being positioned over captured background imagery.
Implementations may include one or more of the following features. The text located in the foreground may be synthetically augmented. Synthetic augmentation may be provided in a two-step process. The text may be synthetically augmented based upon one or more predefined conditions. The text located in the foreground may be undistorted. The text may be included in captured imagery. The captured background imagery may be predominately absent text. The text located in the foreground may be randomly positioned in the portion of training images. Prior to the text being located in the foreground, a portion of the text may be removed. The captured background imagery may be distorted when captured. Font similarity may be used to identify the at least one font. Similarity of fonts in multiple image segments may be used to identify the at least one font. The machine learning system may be trained by using transfer learning. An output of the machine learning system may represent each font used to train the machine learning system. The output of the machine learning system may provide a level of confidence for each font used to train the machine learning system. A subset of the output of the machine learning system may be scaled and a remainder of the output is removed. Some of the training images may be absent identification. Identifying the at least one font represented in the received image using the machine learning system may include using additional images received by the machine learning system. Outputs of the machine learning system for the received image and the additional images may be combined to identify the at least one font.
In another aspect, a system includes a computing device that includes a memory configured to store instructions. The system also includes a processor to execute the instructions to perform operations that include receiving an image that includes textual content in at least one font. Operations also include identifying the at least one font represented in the received image using a machine learning system. The machine learning system being trained using images representing a plurality of training fonts. A portion of the training images includes text located in the foreground and being positioned over captured background imagery.
Implementations may include one or more of the following features. The text located in the foreground may be synthetically augmented. Synthetic augmentation may be provided in a two-step process. The text may be synthetically augmented based upon one or more predefined conditions. The text located in the foreground may be undistorted. The text may be included in captured imagery. The captured background imagery may be predominately absent text. The text located in the foreground may be randomly positioned in the portion of training images. Prior to the text being located in the foreground, a portion of the text may be removed. The captured background imagery may be distorted when captured. Font similarity may be used to identify the at least one font. Similarity of fonts in multiple image segments may be used to identify the at least one font. The machine learning system may be trained by using transfer learning. An output of the machine learning system may represent each font used to train the machine learning system. The output of the machine learning system may provide a level of confidence for each font used to train the machine learning system. A subset of the output of the machine learning system may be scaled and a remainder of the output is removed. Some of the training images may be absent identification. Identifying the at least one font represented in the received image using the machine learning system may include using additional images received by the machine learning system. Outputs of the machine learning system for the received image and the additional images may be combined to identify the at least one font.
In another aspect, one or more computer readable media store instructions that are executable by a processing device, and upon such execution cause the processing device to perform operations including receiving an image that includes textual content in at least one font. Operations also include identifying the at least one font represented in the received image using a machine learning system. The machine learning system being trained using images representing a plurality of training fonts. A portion of the training images includes text located in the foreground and being positioned over captured background imagery.
Implementations may include one or more of the following features. The text located in the foreground may be synthetically augmented. Synthetic augmentation may be provided in a two-step process. The text may be synthetically augmented based upon one or more predefined conditions. The text located in the foreground may be undistorted. The text may be included in captured imagery. The captured background imagery may be predominately absent text. The text located in the foreground may be randomly positioned in the portion of training images. Prior to the text being located in the foreground, a portion of the text may be removed. The captured background imagery may be distorted when captured. Font similarity may be used to identify the at least one font. Similarity of fonts in multiple image segments may be used to identify the at least one font. The machine learning system may be trained by using transfer learning. An output of the machine learning system may represent each font used to train the machine learning system. The output of the machine learning system may provide a level of confidence for each font used to train the machine learning system. A subset of the output of the machine learning system may be scaled and a remainder of the output is removed. Some of the training images may be absent identification. Identifying the at least one font represented in the received image using the machine learning system may include using additional images received by the machine learning system. Outputs of the machine learning system for the received image and the additional images may be combined to identify the at least one font.
In another aspect, a computing device implemented method includes receiving an image that includes textual content in at least one font, and, identifying the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts, wherein a portion of the training images is produced by a generator neural network.
Implementations may include one or more of the following features. The generator neural network may provide augmented imagery to a discriminator neural network for preparing the generator neural network. The augmented imagery produced by the generator neural network may include a distorted version of a font to train the machine learning system. Determinations produced by the discriminator neural network may be used to improve operations of the discriminator neural network. Determinations produced by the discriminator neural network may be used to improve operations of the generator neural network. Font similarity may be used to identify the at least one font. Similarity of fonts in multiple image segments may be used to identify the at least one font. The machine learning system may be trained by using transfer learning. An output of the machine learning system may represent each font used to train the machine learning system. The output of the machine learning system may provide a level of confidence for each font used to train the machine learning system. A subset of the output of the machine learning system may be scaled and a remainder of the output is removed. Some of the training images may be absent identification. Identifying the at least one font represented in the received image using the machine learning system may include using additional images received by the machine learning system. Outputs of the machine learning system for the received image and the additional received images may be combined to identify the at least one font.
In another aspect, a system includes a computing device that includes a memory configured to store instructions. The system also includes a processor to execute the instructions to perform operations that include receiving an image that includes textual content in at least one font. Operations also include identifying the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts, wherein a portion of the training images is produced by a generator neural network.
Implementations may include one or more of the following features. The generator neural network may provide augmented imagery to a discriminator neural network for preparing the generator neural network. The augmented imagery produced by the generator neural network may include a distorted version of a font to train the machine learning system. Determinations produced by the discriminator neural network may be used to improve operations of the discriminator neural network. Determinations produced by the discriminator neural network may be used to improve operations of the generator neural network. Font similarity may be used to identify the at least one font. Similarity of fonts in multiple image segments may be used to identify the at least one font. The machine learning system may be trained by using transfer learning. An output of the machine learning system may represent each font used to train the machine learning system. The output of the machine learning system may provide a level of confidence for each font used to train the machine learning system. A subset of the output of the machine learning system may be scaled and a remainder of the output is removed. Some of the training images may be absent identification. Identifying the at least one font represented in the received image using the machine learning system may include using additional images received by the machine learning system. Outputs of the machine learning system for the received image and the additional received images may be combined to identify the at least one font.
In another aspect, one or more computer readable media store instructions that are executable by a processing device, and upon such execution cause the processing device to perform operations including receiving an image that includes textual content in at least one font. Operations also include identifying the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts, wherein a portion of the training images is produced by a generator neural network.
Implementations may include one or more of the following features. The generator neural network may provide augmented imagery to a discriminator neural network for preparing the generator neural network. The augmented imagery produced by the generator neural network may include a distorted version of a font to train the machine learning system. Determinations produced by the discriminator neural network may be used to improve operations of the discriminator neural network. Determinations produced by the discriminator neural network may be used to improve operations of the generator neural network. Font similarity may be used to identify the at least one font. Similarity of fonts in multiple image segments may be used to identify the at least one font. The machine learning system may be trained by using transfer learning. An output of the machine learning system may represent each font used to train the machine learning system. The output of the machine learning system may provide a level of confidence for each font used to train the machine learning system. A subset of the output of the machine learning system may be scaled and a remainder of the output is removed. Some of the training images may be absent identification. Identifying the at least one font represented in the received image using the machine learning system may include using additional images received by the machine learning system. Outputs of the machine learning system for the received image and the additional received images may be combined to identify the at least one font.
These and other aspects, features, and various combinations may be expressed as methods, apparatus, systems, means for performing functions, program products, etc.
Other features and advantages will be apparent from the description and the claims.
Referring to
Referring to
To provide this functionality, the font identifier 204 may use various machine learning techniques such as deep learning to improve the identification processes through training the system (e.g., expose multilayer neural networks to training data, feedback, etc.). Through such machine learning techniques, the font identifier 204 uses artificial intelligence to automatically learn and improve from experience without being explicitly programmed. Once trained (e.g., from images of identified fonts, distorted images of identified fonts, images if unidentified fonts, etc.), one or more images, representation of images, etc. can be input into the font identifier 204 to yield an output. Further, by returning information about the output (e.g., feedback), the machine learning technique can use the output as additional training information. Other training data can also be provided for further training. By using increased amounts of training data (e.g., images of identified fonts, unidentified fonts, etc.), feedback data (e.g., data representing user confirmation of identified fonts), etc. the accuracy of the system can be improved (e.g., to predict matching fonts).
Other forms of artificial intelligence techniques may be used by the font identifier 204. For example, to process information (e.g., images, image representations, etc.) to identify fonts, etc., the architecture may employ decision tree learning that uses one or more decision trees (as a predictive model) to progress from observations about an item (represented in the branches) to conclusions about the item's target (represented in the leaves). In some arrangements, random forests or random decision forests are used and can be considered as an ensemble learning method for classification, regression and other tasks. Such techniques generally operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Support vector machines (SVMs) can be used that are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.
Ensemble learning systems may also be used for font prediction in which multiple system members independently arrive at a result. System members can be of the same type (e.g., each is a decision tree learning machine, etc.) or members can be of different types (e.g., one Deep CNN system such as a ResNet50, one SVM system, one decision tree system, etc.). Upon each system member determining a result, a majority vote among the system members is used (or other type of voting technique) to determine an overall prediction result. In some arrangements, one or more knowledge-based systems such as an expert systems may be employed. In general, such expert systems are designed by solving relatively complex problems by using reasoning techniques that may employ conditional statements (e.g., if-then rules). In some arrangements such expert systems may use multiple systems such as a two sub-system design, in which one system component stores structured and/or unstructured information (e.g., a knowledge base) and a second system component applies rules, etc. to the stored information (e.g., an inference engine) to determine results of interest (e.g., select images likely to be presented).
Referring to
In this arrangement, such image data may be collected by an image collector 300 and stored (e.g., in a collected image database 302) on a storage device 304 for later retrieval. In some arrangements, information associated with images (e.g., font information, image attributes, etc.) may be provided and stored in an image information database 306. Retrieving the image data (stored in database 302) and/or image information (stored in the database 306), a trainer 308 is provided the data to train a font machine learning system 310. Various type of data may be used for training the system; for example, images (e.g., thousands of images, millions of images) can be used by the trainer 308. For example, pristine images of fonts (e.g., portions of font characters, font characters, phrases using a font), distorted images of fonts (e.g., synthetically altered versions of fonts), real-world images of fonts (e.g., images captured by individuals in real-world conditions that include one or more fonts) may be used to train the font machine learning system 310. For some images of fonts (e.g., images of pristine fonts, synthetically altered versions of fonts, etc.) information that identifies each included font (e.g., labels) may be provided for training. Alternatively, for some images (e.g., captured under real-world conditions), identifying information (of included fonts) may be absent.
Once trained, the font machine learning system 310 may be provided input data such as one or more images to identify the font or fonts present in the images. For example, after being trained using pristine, distorted, and real-world images of fonts, images containing unidentified fonts and captured under real-world conditions may be input for predicting the contained fonts (as illustrated in
Referring to
Referring to
The training data 506 may also include segments of one training image. For example, one image may be segmented into five separate images that focus on different areas of the original image. Such image segmenting can be used when the machine learning system predicts a font from an input image. For prediction operations, a prediction result (e.g., a 133,000 element output vector) can be attained for each segment and an overall result determined (e.g., by averaging the individual results) to improve prediction accuracy. One image may be cropped from the original image to focus upon the upper left quadrant of the original image while three other segments may be cropped to focus on the upper right, lower left, and lower right portions of the original image, respectively. A fifth image segment may be producing by cropping the original image to focus upon the central portion of the original image. Various sizes and shapes may be used to create these segments; for example the original image may be of a particular size (e.g., 224 by 224 pixels, 120 by 120 pixels, etc.) while the segments are of lesser size (e.g., 105 by 105 pixels). In some arrangements, the segments may include overlapping content or non-overlapping content may be included in each segment. While the original image and the cropped segments may be square shaped, in some instances the images may be rectangular or have another type of shape.
In one arrangement, after initial training with the first set of fonts (e.g., 14,000 fonts), for each new font used in the subsequent training (each remaining of the 133,000 fonts), operations are executed (by the font identifier 204) to determine the most similar font from the first set initially used to train the system (e.g., the most similar font present in the 14,000 fonts). To determine which font is most similar, one or more techniques may be employed, for example, techniques including machine learning system based techniques may be used as described in U.S. patent application Ser. No. 14/694,494, filed Apr. 23, 2015, entitled “Using Similarity for Grouping Fonts and Individuals for Recommendations”, U.S. patent application Ser. No. 14/690,260, filed Apr. 17, 2015 entitled “Pairing Fonts for Presentation”, U.S. Pat. No. 9,317,777, issued Apr. 19, 2016, entitled “Analyzing Font Similarity for Presentation”, and U.S. Pat. No. 9,805,288, to be issued Oct. 31, 2017, entitled “Analyzing Font Similarity for Presentation”, each of which are incorporated by reference in the entirety herein. Upon determining which font from the initial set is most similar, associated information (e.g., weights of the last layer for this identified font) are used for establishing this new font in the machine learning system (e.g., copy the weights to a newly added connection for this new font). By determining a similar font, and using information to assist the additional training (e.g., random weights are not employed), the font machine learning system 310 continues its training in an expedited manner. Other types of similarity techniques can be employed by the system. For example, comparisons (e.g., distance calculations) may be performed on one or more layers before the output layer. In one arrangement, the layer located before the output layer can be considered as feature space (e.g., a 1000 dimension feature space) and executing comparisons for different system inputs can provide a similarity measure. Along with distance measurements between the two feature spaces (for two inputs) other types of calculations such as measure cosine similarity can be employed.
Similarity calculations may be used for other operations associated with the font machine learning system 310. In some instances, accuracy may degrade when scaling the training from a first set of training fonts (e.g., 14,000) to the full font complement (e.g., of remainder of the 133,000 fonts). This accuracy drop may be caused by having a significant number of similar fonts being used for training. For example, many font variants (e.g., hundreds) of one font (e.g., Helvetica) may be represented in the system, and a newly introduced font may appear associated with a number of the font variants. By employing one or more mathematical metrics, convergence can be gauged. For example, similarity accuracy can be measured by using similarity techniques such as techniques incorporated by reference above. In one arrangement, accuracy can be calculated using the similarity techniques to determine the similarity of predicted font (provided as the system output 504) and an identified (labeled) font (used to train the system). If similar, the prediction can be considered as being correct. In another situation, only a limited number of training fonts that are provided in distorted imagery (e.g., captured in real-world conditions) are identified (e.g., only 500 fonts are identified—or labeled). Due to this limitation, system accuracy may decrease (e.g., for the 133,000 possible prediction outputs from the machine). To improve accuracy, only the limited number of labeled fonts (e.g., the 500 labeled fonts) are considered active and all other possible predictions are not considered active (e.g., and are assigned prediction values of zero). Using this technique, accuracy can improve as the training of the machine learning system scales up. In some implementations the result with the highest probability is the expected result (Top-1 accuracy). In some cases, the Top-1 accuracy can be based on synthetic data. Implementing a model in which the five highest probabilities match the expected result can also be employed (Top-5 accuracy). In some cases, the Top-5 accuracy can be based on synthetic data.
The similarity techniques may also be used for measuring the quality of segmenting an input image (e.g., quality of cropping of an input image). For example, upon receiving a page of text, graphics, etc., the page can be cropped into segments (e.g., rectangular shaped segments) by the font identifier 204 such that each segment contains the text of the page. In many cases, the text of the segments contain similar fonts, if not the same font. Each text segment is input into the font machine learning system 310 and a number of predicted fonts is output (K predicted fonts). For two crops, distance values can be calculated between the predicted fonts (the K predicted fonts). An estimated value (e.g., mean value) for the distance values (K*K values) is calculated to identify a threshold value. In some instances, the estimated value is multiplied by a constant (e.g., value 1.0, etc.) for the threshold value. If the top predictions for two crops have a distance value less than this threshold value, the crops can be considered as containing similar fonts. If the distance value is above the threshold value, the two crops can be considered as containing different fonts.
Similarity calculations can also be executed to determine the quality of a crop. For example, a segment of text attained through the cropping of an image can be input into the machine learning system. Techniques such as Fast Region-based Convolutional Network method (Fast R-CNN), Faster Region-based Convolutional Network method (Faster R-CNN), etc. can be used to classify objects (e.g., detect rectangular regions that contain text). The output of the system provides a number of predicted fonts (e.g., K predicted fonts) for the cropped segment. Similarity calculations may be executed among the K predicted fonts. If the calculations report that the K predicted fonts are similar, the segment can be considered as being attained from a good quality crop. If the K predicted fonts lack similarity, the cropping operations used to attain the segment can be considered poor. If the similarity calculations report that non-similar fonts are present, corrective operations may be executed (e.g., cropping operations may be repeated to attain another segment for re-testing for similarity of predicted fonts). In some arrangements, a numerical value may be assigned to the crop quality; for example, a value of one may indicate that a good segment has been attained from the cropping operations, and a value of zero may indicate poor cropping operations may have produced a segment with dissimilar predicted fonts.
As described above, the font machine learning system 310 outputs prediction values for each of the potential fonts (e.g., each of the 133,000 fonts represented as elements of an output vector). Typically, the numerical values are assigned to each potential font to represent the prediction. Additionally, these numerical values are scaled so the sum has a value of one. However, give the considerably large number of potential fonts (e.g., again, 133,000 fonts), each individual value can be rather small and difficult to interpret (e.g., identify differences from other values). Further, even values that represent the top predicted values can be small and difficult to distinguish one from another. In some arrangements, a software function (e.g., a Softmax function) causes the sum of the prediction values to equal a value of one. One or more techniques may be provided by the font machine learning system 310 to address these numerical values and improve their interpretation. For example, only a predefined number of top predicted fonts are assigned a numerical value to represent the level of confidence. In one arrangement, the top 500 predicted fonts are assigned a numerical value and the remaining fonts (e.g., 133,000−500=132,500 fonts) are assigned a numerical value of zero. Further, the numerical values are assigned to the top 500 predicted fonts such that the sum of the numerical values has a value of one. In one implementation, the lower font predictions (e.g., 133,000−500=132,500 fonts) are zeroed out before the Softmax function is applied. In effect, the top N (e.g., 500) predicted fonts are boosted in value to assist with further processing (e.g., identifying top predictions, prediction distributions, etc.). In some arrangements, corresponding elements of multiple output vectors are summed (in which each output vector represents a different input image, a different portion of an image, etc.). Through the summing operation, fonts common among the images can be identified, for example.
As mentioned above, various techniques may be employed to distort images for increasing the robustness of the font machine learning system 310 (e.g., the system trains on less than pristine images of fonts to improve the system's ability to detect the fonts in “real world” images such as photographs). Since the fonts are known prior to being distorted through the synthetic techniques, each of the underlying fonts is known and can be identified to the font machine learning system 310. Along with these synthetically distorted fonts, the robustness of the font machine learning system 310 can be increased by providing actual real-world images of fonts (e.g., from captured images provided by end users, etc.). In many cases, the underlying fonts present in these real-world images are unknown or at the least not identified when provided to the system for training. As such, the system will develop its own identity of these fonts. To improve robustness, various amounts of these unlabeled real-world font images may be provided during training of the system. For example, in some training techniques a particular number of images are provided for each training session. Image batches of 16 images, 32 images, 64 images, etc. can be input for a training session. Of these batches, a percentage of the images have identified fonts (e.g., be labeled) and the images may be in pristine condition or synthetically distorted. Font identification is not provided with another percentage of the images; for example, these images may be distorted by real-world condition and the font is unknown. For this later percentage of images, the learning machine system defines its own identity of the font (e.g., via a process known as pseudo labeling). For example, 75% of the images may be provided with font identification (e.g., a pristine image or a synthetically distorted image in which the base font is known) and 25% may be images with unlabeled fonts for which the machine learning system defines a label for the represented font. Other percentages of these two types of labeled and pseudo labeled images may also be employed to increase system robustness along with improving overall decision making by the system.
System variations may also include different hardware implementations and the different uses of the system hardware. For example, multiple instances of the font machine learning system 310 may be executed through the use of a single graphical processing unit (GPU). In such an implementations, multiple system clients (each operating with one machine learning system) may be served by a single GPU. In other arrangements, multiple GPU's may be used. Similarly, under some conditions, a single instance of the machine learning system may be capable of serving multiple clients. Based upon changing conditions, multiple instances of a machine learning system may be employed to handle an increased workload from multiple clients. For example, environmental conditions (e.g., system throughput), client based conditions (e.g., number of requests received per client), hardware conditions (e.g., GPU usage, memory use, etc.) can trigger multiple instances of the system to be employed, increase the number of GPU's being used, etc. Similar to taking steps to react to an increase in processing capability, adjustments can be made when less processing is needed. For example, the number of instances of a machine learning system being used may be decreased along with the number of GPU's needed to service the clients. Other types of processors may be used in place of the GPU's or in concert with them (e.g., combinations of different types of processors). For example, central processing units (CPU's), processors developed for machine learning use (e.g., an application-specific integrated circuit (ASIC) developed for machine learning and known as a tensor processing unit (TPU)), etc. may be employed. Similar to GPU's one or more models may be provided by these other types of processors, either independently or in concert with other processors.
One or more techniques can be employed to improve the training of the font machine learning system 310. For example, one improvement that results in higher font identifying accuracy is provided by synthetically generating training images that include some amount of distortion. For example, after a training image is provided to the machine leaning system, one or more distorted versions of the image may also be provided to the system during the training cycle. For some fonts, which can be considered lighter in color or having hollow features can be used for training without being distorted. As such, any font considered has having these features can used for training without further alternating. For other types of training fonts, along with using the font unaltered version of the font, a synthetically distorted version of the font can be used for training the font machine learning system 310. Various types of distortions can be applied to the fonts; for example, compression techniques (e.g., JPEG compression) can be applied. One or more levels of shadowing can be applied to a training font sample. Manipulating an image of a training font such that shapes of the font are significantly distorted can be used to define one or more training images. Blurring can be applied to imagery to create distortions; for example, a Gaussian blur can give an overall smoothing appearance to an image of a font. Motion blurring, can also be applied in another example, in which streaking appears in the imagery to present the effect of rapid object movement. For still another feature, Gaussian noise can be applied as a type of distortion and cause the blurring of fine-scaled image edges and details. Other types of image adjustments may be applied as a type of visual distortion; for example, images may be rotated about one or more axis (e.g., about the x, y, and/or z-axis). Skewing an image in one or more manners so the underlying image appears to be misaligned in one or multiple directions (e.g., slanted) can provide another type of distortion. The aspect ratio of an image, in which the ratio of the width to the height of the image is adjusted, can provide a number of different type of images of a font to assist with training. Distortion may also be applied by filtering all or a portion of an image and using one or more filtered version of the image for system training. For example, edge detection may be performed on an image, for example, to retain or remove high spatial frequency content of an image. Other types of image processing may also be executed; perspective transformation can be employed, which is associated with converting 3D imagery into 2D imagery such that objects that are represented as being closer to the viewer appear larger than an object represented as being further from the viewer.
In some arrangements, data processing (e.g., image processing) libraries may be employed for distorting the training images. For example, some libraries may provide functions that adjust the shape, geometry, etc. of text (e.g., position the text to appear in a circular formation). Different coloring schemes may also be applied to create additional training images; for example, color substitution techniques, introducing and applying gradients to one or more colors, etc. can be executed through the use of libraries. Through the use of libraries, different types of fonts may be introduced into training imagery. For example, hollow fonts and outline fonts may be introduced to assist with training. Different attributes of font glyphs, characters, etc. may be adjust to provide distortion. For example, random stroke widths may be applied to portions (e.g., stems) characters or entire characters to introduce distortion. From the different types of distortions described above, each may be used to create a training image. To further increase the accuracy of the machine learning system, two or more of the distortion techniques may be used in concert to create additional training imagery.
Similar to using distortion to create additional training imagery, other types of content may be employed. For example, different types of background imagery may be used to create imagery that includes different text (e.g., using different fonts) in the foreground. Real world photographic background images may be used as backgrounds and distorted text (represented in one or more fonts) can be used for image creation. Text may be positioned various location in images including on image borders. In some training images, portions of text may be clipped so only a portion of the text (e.g., part of a character, word, phrase, etc.) is present. As such, different cropping schemes may be utilized for training the machine learning system. As mentioned above, for some training images, text is distorted in one manner or multiple manners. In a similar fashion, other portions of the images such as background imagery (e.g., photographic imagery) may be distorted once or in multiple instances. Further, for some examples, the distortion may take a two-step process, first an image is created that includes distorted text (and used to train the system), and then the image (e.g., background image) is distorted using one or more image processing techniques (e.g., JPEG compression, applying Gaussian noise, etc.).
To implement the font machine learning system 310, one or more machine learning techniques may be employed. For example, supervised learning techniques may be implemented in which training is based on a desired output that is known for an input. Supervised learning can be considered an attempt to map inputs to outputs and then estimate outputs for previously unseen inputs (a newly introduced input). Unsupervised learning techniques may also be employed in which training is provided from known inputs but unknown outputs. Reinforcement learning techniques may also be used in which the system can be considered as learning from consequences of actions taken (e.g., inputs values are known). In some arrangements, the implemented technique may employ two or more of these methodologies.
In some arrangements, neural network techniques may be implemented using the data representing the images (e.g., a matrix of numerical values that represent visual elements such as pixels of an image, etc.) to invoke training algorithms for automatically learning the images and related information. Such neural networks typically employ a number of layers. Once the layers and number of units for each layer is defined, weights and thresholds of the neural network are typically set to minimize the prediction error through training of the network. Such techniques for minimizing error can be considered as fitting a model (represented by the network) to training data. By using the image data (e.g., attribute vectors), a function may be defined that quantifies error (e.g., a squared error function used in regression techniques). By minimizing error, a neural network may be developed that is capable of determining attributes for an input image. One or more techniques may be employed by the machine learning system, for example, backpropagation techniques can be used to calculate the error contribution of each neuron after a batch of images is processed. Stochastic gradient descent, also known as incremental gradient descent, can be used by the machine learning system as a stochastic approximation of the gradient descent optimization and iterative method to minimize an objective function. Other factors may also be accounted for during neutral network development. For example, a model may too closely attempt to fit data (e.g., fitting a curve to the extent that the modeling of an overall function is degraded). Such overfitting of a neural network may occur during the model training and one or more techniques may be implements to reduce its effects.
One type of machine learning referred to as deep learning may be utilized in which a set of algorithms attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple non-linear transformations. Such deep learning techniques can be considered as being based on learning representations of data. In general, deep learning techniques can be considered as using a cascade of many layers of nonlinear processing units for feature extraction and transformation. The next layer uses the output from the previous layer as input. In some arrangements, a layer can look back one or multiple layers for its input. The algorithms may be supervised, unsupervised, combinations of supervised and unsupervised, etc. The techniques are based on the learning of multiple levels of features or representations of the data (e.g., image attributes). As such, multiple layers of nonlinear processing units along with supervised or unsupervised learning of representations can be employed at each layer, with the layers forming a hierarchy from low-level to high-level features. By employing such layers, a number of parameterized transformations are used as data propagates from the input layer to the output layer. In one example, the font machine learning system 310 uses one or more convolutional neural networks (CNN), which when trained can output a font classification for an input image that includes a font. Various types of CCN based systems can be used that have different number of layers; for example the font machine learning system 310 can a fifty-layer deep neutral network architecture (e.g., a ResNet50 architecture) or architectures that employ a different number of layers (e.g., ResNet150, ResNet 152, VGGNet 16, VGGNet 19, InceptionNet V3, etc.) that trained can output a font classification for an input image that includes a font.
One type of machine learning referred to as deep learning may be utilized in which a set of algorithms attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple non-linear transformations. Such deep learning techniques can be considered as being based on learning representations of data. In general, deep learning techniques can be considered as using a cascade of many layers of nonlinear processing units for feature extraction and transformation. The next layer uses the output from the previous layer as input. The algorithms may be supervised, unsupervised, combinations of supervised and unsupervised, etc. The techniques are based on the learning of multiple levels of features or representations of the data (e.g., image attributes). As such, multiple layers of nonlinear processing units along with supervised or unsupervised learning of representations can be employed at each layer, with the layers forming a hierarchy from low-level to high-level features. By employing such layers, a number of parameterized transformations are used as data propagates from the input layer to the output layer.
Other types of artificial intelligence techniques may be employed about the font identifier 204 (shown in
One or more metrics may be employed to determine if the generator neural network 512 has reach an improved state (e.g., an optimized state). Upon reaching this state, the generator 512 may be used to train the font machine learning system 310. For example, the generator can used to train one or more classifiers included in the font machine learning system 310. Using input 502, training data 506, etc., the generator 512 can produce a large variety of imagery (e.g., distorted images that contain one or more fonts) to increase the capability of the font machine learning system.
Various implementations for GAN generators and discriminators may be used; for example, the discriminator neural network 512 can use a convolutional neural network that categorizes input images with a binomial classifier that labels the images as genuine or not. The generator neural network 514 can use an inverse convolutional (or deconvolutional) neural network that takes a vector of random noise and upsamples the vector data to an image to augment the image.
Referring to
Operations of the font identifier 204 may include receiving 602 an image that includes textual content in at least one font. For example, an image may be received that is represented by a two-dimensional matrix of numerical values and each value represents a visual property (e.g., color) that can be assigned to a pixel of a display. Various file formats (e.g., “jpeg”, “.pdf”, etc.) may be employed to receive the image data. Operations of the font identifier may also include identifying 604 the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts. A portion of the training images includes text located in the foreground and being positioned over captured background imagery. For example, a collection of images including training fonts (e.g., synthetically distorted fonts, undistorted fonts, etc.) can be positioned over images that have been captured (e.g., by an image capture device such as a camera). The captured imagery may be distorted due to image capture conditions, capture equipment, etc., may be used for training a machine learning system such as the font machine learning system 310. Trained with such data, the machine learning system can efficiently identify fonts in images that are in less than pristine condition.
Referring to
Operations of the font identifier 204 may include receiving 702 an image that includes textual content in at least one font. For example, an image may be received that is represented by a two-dimensional matrix of numerical values and each value represents a visual property (e.g., color) that can be assigned to a pixel of a display. Various file formats (e.g., “jpeg”, “.pdf”, etc.) may be employed to receive the image data. Operations of the font identifier may also include identifying 704 the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts. A portion of the training images is produced by a generator neural network. A generator neural network of a GAN may be used to augment (e.g., distort) imagery a textual characters represented in a font. This augmented imagery may be provided to a discriminator neural network (of the GAN). Using these images, the discriminator can evaluate the augmented imagery and attempt to determine if the imagery is real (e.g., captured) or synthetic (e.g., prepared by a generator neural network). These determinations (whether correct or incorrect), can be used to improve the generator neural network (e.g., to produce augmented imagery to further test the discriminator) and improve the discriminator neural network (e.g., assist the discriminator is making correct determinations about future augmented imagery provided by the generator). The improved generator (e.g., an optimized generator) can then be used to provide imagery for training a machine learning system, for example, to identify one or more fonts in various types of images that are in less than pristine condition (e.g., capture images that are distorted).
Computing device 800 includes processor 802, memory 804, storage device 806, high-speed interface 808 connecting to memory 804 and high-speed expansion ports 810, and low speed interface 812 connecting to low speed bus 814 and storage device 806. Each of components 802, 804, 806, 808, 810, and 812, are interconnected using various busses, and can be mounted on a common motherboard or in other manners as appropriate. Processor 802 can process instructions for execution within computing device 800, including instructions stored in memory 804 or on storage device 806 to display graphical data for a GUI on an external input/output device, including, e.g., display 816 coupled to high speed interface 808. In other implementations, multiple processors and/or multiple busses can be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices 800 can be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system).
Memory 804 stores data within computing device 800. In one implementation, memory 804 is a volatile memory unit or units. In another implementation, memory 804 is a non-volatile memory unit or units. Memory 804 also can be another form of computer-readable medium (e.g., a magnetic or optical disk. Memory 804 may be non-transitory.)
Storage device 806 is capable of providing mass storage for computing device 700. In one implementation, storage device 806 can be or contain a computer-readable medium (e.g., a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, such as devices in a storage area network or other configurations.) A computer program product can be tangibly embodied in a data carrier. The computer program product also can contain instructions that, when executed, perform one or more methods (e.g., those described above.) The data carrier is a computer- or machine-readable medium, (e.g., memory 804, storage device 806, memory on processor 802, and the like.)
High-speed controller 808 manages bandwidth-intensive operations for computing device 800, while low speed controller 812 manages lower bandwidth-intensive operations. Such allocation of functions is an example only. In one implementation, high-speed controller 808 is coupled to memory 804, display 816 (e.g., through a graphics processor or accelerator), and to high-speed expansion ports 810, which can accept various expansion cards (not shown). In the implementation, low-speed controller 812 is coupled to storage device 806 and low-speed expansion port 814. The low-speed expansion port, which can include various communication ports (e.g., USB, Bluetooth®, Ethernet, wireless Ethernet), can be coupled to one or more input/output devices, (e.g., a keyboard, a pointing device, a scanner, or a networking device including a switch or router, e.g., through a network adapter.)
Computing device 800 can be implemented in a number of different forms, as shown in the figure. For example, it can be implemented as standard server 820, or multiple times in a group of such servers. It also can be implemented as part of rack server system 824. In addition or as an alternative, it can be implemented in a personal computer (e.g., laptop computer 822.) In some examples, components from computing device 800 can be combined with other components in a mobile device (not shown), e.g., device 850. Each of such devices can contain one or more of computing device 800, 850, and an entire system can be made up of multiple computing devices 800, 850 communicating with each other.
Computing device 850 includes processor 852, memory 864, an input/output device (e.g., display 854, communication interface 866, and transceiver 868) among other components. Device 850 also can be provided with a storage device, (e.g., a microdrive or other device) to provide additional storage. Each of components 850, 852, 864, 854, 866, and 868, are interconnected using various buses, and several of the components can be mounted on a common motherboard or in other manners as appropriate.
Processor 852 can execute instructions within computing device 850, including instructions stored in memory 864. The processor can be implemented as a chipset of chips that include separate and multiple analog and digital processors. The processor can provide, for example, for coordination of the other components of device 850, e.g., control of user interfaces, applications run by device 850, and wireless communication by device 850.
Processor 852 can communicate with a user through control interface 858 and display interface 856 coupled to display 854. Display 854 can be, for example, a TFT LCD (Thin-Film-Transistor Liquid Crystal Display) or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology. Display interface 856 can comprise appropriate circuitry for driving display 854 to present graphical and other data to a user. Control interface 858 can receive commands from a user and convert them for submission to processor 852. In addition, external interface 862 can communicate with processor 842, so as to enable near area communication of device 850 with other devices. External interface 862 can provide, for example, for wired communication in some implementations, or for wireless communication in other implementations, and multiple interfaces also can be used.
Memory 864 stores data within computing device 850. Memory 864 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memory unit or units. Expansion memory 874 also can be provided and connected to device 850 through expansion interface 872, which can include, for example, a SIMM (Single In Line Memory Module) card interface. Such expansion memory 874 can provide extra storage space for device 850, or also can store applications or other data for device 850. Specifically, expansion memory 874 can include instructions to carry out or supplement the processes described above, and can include secure data also. Thus, for example, expansion memory 874 can be provided as a security module for device 850, and can be programmed with instructions that permit secure use of device 850. In addition, secure applications can be provided through the SIMM cards, along with additional data, (e.g., placing identifying data on the SIMM card in a non-hackable manner.)
The memory can include, for example, flash memory and/or NVRAM memory, as discussed below. In one implementation, a computer program product is tangibly embodied in a data carrier. The computer program product contains instructions that, when executed, perform one or more methods, e.g., those described above. The data carrier is a computer- or machine-readable medium (e.g., memory 864, expansion memory 874, and/or memory on processor 852), which can be received, for example, over transceiver 868 or external interface 862.
Device 850 can communicate wirelessly through communication interface 866, which can include digital signal processing circuitry where necessary. Communication interface 866 can provide for communications under various modes or protocols (e.g., GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, among others.) Such communication can occur, for example, through radio-frequency transceiver 868. In addition, short-range communication can occur, e.g., using a Bluetooth®, WiFi, or other such transceiver (not shown). In addition, GPS (Global Positioning System) receiver module 870 can provide additional navigation- and location-related wireless data to device 850, which can be used as appropriate by applications running on device 850. Sensors and modules such as cameras, microphones, compasses, accelerators (for orientation sensing), etc. may be included in the device.
Device 850 also can communicate audibly using audio codec 860, which can receive spoken data from a user and convert it to usable digital data. Audio codec 860 can likewise generate audible sound for a user, (e.g., through a speaker in a handset of device 850.) Such sound can include sound from voice telephone calls, can include recorded sound (e.g., voice messages, music files, and the like) and also can include sound generated by applications operating on device 850.
Computing device 850 can be implemented in a number of different forms, as shown in the figure. For example, it can be implemented as cellular telephone 880. It also can be implemented as part of smartphone 882, a personal digital assistant, or other similar mobile device.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor. The programmable processor can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms machine-readable medium and computer-readable medium refer to a computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a device for displaying data to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor), and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be a form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in a form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a backend component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a frontend component (e.g., a client computer having a user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or a combination of such back end, middleware, or frontend components. The components of the system can be interconnected by a form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (LAN), a wide area network (WAN), and the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
In some implementations, the engines described herein can be separated, combined or incorporated into a single or combined engine. The engines depicted in the figures are not intended to limit the systems described here to the software architectures shown in the figures.
A number of embodiments have been described. Nevertheless, it will be understood that various modifications can be made without departing from the spirit and scope of the processes and techniques described herein. In addition, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. In addition, other steps can be provided, or steps can be eliminated, from the described flows, and other components can be added to, or removed from, the described systems. Accordingly, other embodiments are within the scope of the following claims.
This application claims priority under 35 USC 119(e) to U.S. Provisional application No. 62/578,939, filed on Oct. 30, 2017. The entire disclosure of this application is incorporated by reference herein.
Number | Name | Date | Kind |
---|---|---|---|
4244657 | Wasylyk | Jan 1981 | A |
4998210 | Kadono et al. | Mar 1991 | A |
5263132 | Parker et al. | Nov 1993 | A |
5347266 | Bauman et al. | Sep 1994 | A |
5412771 | Fenwick | May 1995 | A |
5416898 | Opstad et al. | May 1995 | A |
5444829 | Kawabata et al. | Aug 1995 | A |
5453938 | Gohara et al. | Sep 1995 | A |
5526477 | McConnell et al. | Jun 1996 | A |
5528742 | Moore et al. | Jun 1996 | A |
5533174 | Flowers et al. | Jul 1996 | A |
5586242 | McQueen et al. | Dec 1996 | A |
5606649 | Tai | Feb 1997 | A |
5619721 | Maruko | Apr 1997 | A |
5630028 | DeMeo | May 1997 | A |
5737599 | Rowe et al. | Apr 1998 | A |
5748975 | Van De Vanter | May 1998 | A |
5757384 | Ikeda | May 1998 | A |
5761395 | Miyazaki et al. | Jun 1998 | A |
5781714 | Collins et al. | Jul 1998 | A |
5877776 | Beaman et al. | Mar 1999 | A |
5926189 | Beaman et al. | Jul 1999 | A |
5940581 | Lipton | Aug 1999 | A |
5995718 | Hiraike | Nov 1999 | A |
6012071 | Krishna et al. | Jan 2000 | A |
6016142 | Chang | Jan 2000 | A |
6031549 | Hayes-Roth | Feb 2000 | A |
6044205 | Reed et al. | Mar 2000 | A |
6065008 | Simon et al. | May 2000 | A |
6073147 | Chan et al. | Jun 2000 | A |
6111654 | Cartier | Aug 2000 | A |
6141002 | Kanungo et al. | Oct 2000 | A |
6167441 | Himmel | Dec 2000 | A |
6249908 | Stamm | Jun 2001 | B1 |
6252671 | Peng et al. | Jun 2001 | B1 |
6282327 | Betrisey | Aug 2001 | B1 |
6313920 | Dresevic et al. | Nov 2001 | B1 |
6320587 | Funyu | Nov 2001 | B1 |
6323864 | Kaul et al. | Nov 2001 | B1 |
6330577 | Kim | Dec 2001 | B1 |
6343301 | Halt et al. | Jan 2002 | B1 |
6426751 | Patel | Jul 2002 | B1 |
6490051 | Nguyen et al. | Dec 2002 | B1 |
6512531 | Gartland | Jan 2003 | B1 |
6522330 | Kobayashi | Feb 2003 | B2 |
6522347 | Tsuji | Feb 2003 | B1 |
6583789 | Carlson et al. | Jun 2003 | B1 |
6601009 | Florschuetz | Jul 2003 | B2 |
6657625 | Chik et al. | Dec 2003 | B1 |
6675358 | Kido | Jan 2004 | B1 |
6678688 | Unruh | Jan 2004 | B1 |
6687879 | Teshima | Feb 2004 | B1 |
6704116 | Abulhab | Mar 2004 | B1 |
6704648 | Naik et al. | Mar 2004 | B1 |
6718519 | Taieb | Apr 2004 | B1 |
6738526 | Betrisey | May 2004 | B1 |
6754875 | Paradies | Jun 2004 | B1 |
6760029 | Phinney et al. | Jul 2004 | B1 |
6771267 | Muller | Aug 2004 | B1 |
6810504 | Cooper et al. | Oct 2004 | B2 |
6813747 | Taieb | Nov 2004 | B1 |
6853980 | Ying et al. | Feb 2005 | B1 |
6856317 | Konsella et al. | Feb 2005 | B2 |
6882344 | Hayes et al. | Apr 2005 | B1 |
6901427 | Teshima | May 2005 | B2 |
6907444 | Narasimhan et al. | Jun 2005 | B2 |
6952210 | Renner et al. | Oct 2005 | B1 |
6992671 | Corona | Jan 2006 | B1 |
6993538 | Gray | Jan 2006 | B2 |
7050079 | Estrada et al. | May 2006 | B1 |
7064757 | Opstad et al. | Jun 2006 | B1 |
7064758 | Chik et al. | Jun 2006 | B2 |
7155672 | Adler et al. | Dec 2006 | B1 |
7184046 | Hawkins | Feb 2007 | B1 |
7188313 | Hughes et al. | Mar 2007 | B2 |
7228501 | Brown et al. | Jun 2007 | B2 |
7231602 | Truelove et al. | Jun 2007 | B1 |
7346845 | Teshima et al. | Mar 2008 | B2 |
7373140 | Matsumoto | May 2008 | B1 |
7477988 | Dorum | Jan 2009 | B2 |
7492365 | Corbin et al. | Feb 2009 | B2 |
7505040 | Stamm et al. | Mar 2009 | B2 |
7539939 | Schomer | May 2009 | B1 |
7552008 | Newstrom et al. | Jun 2009 | B2 |
7580038 | Chik et al. | Aug 2009 | B2 |
7583397 | Smith | Sep 2009 | B2 |
7636885 | Merz et al. | Dec 2009 | B2 |
7701458 | Sahuc et al. | Apr 2010 | B2 |
7735020 | Chaudhri | Jun 2010 | B2 |
7752222 | Cierniak | Jul 2010 | B1 |
7768513 | Klassen | Aug 2010 | B2 |
7836094 | Ornstein et al. | Nov 2010 | B2 |
7882432 | Nishikawa et al. | Feb 2011 | B2 |
7937658 | Lunde | May 2011 | B1 |
7944447 | Clegg et al. | May 2011 | B2 |
7958448 | Fattic et al. | Jun 2011 | B2 |
7987244 | Lewis et al. | Jul 2011 | B1 |
8098250 | Clegg et al. | Jan 2012 | B2 |
8116791 | Agiv | Feb 2012 | B2 |
8201088 | Levantovsky et al. | Jun 2012 | B2 |
8201093 | Tuli | Jun 2012 | B2 |
8306356 | Bever | Nov 2012 | B1 |
8381115 | Tranchant et al. | Feb 2013 | B2 |
8413051 | Bacus et al. | Apr 2013 | B2 |
8464318 | Hallak | Jun 2013 | B1 |
8601374 | Parham | Dec 2013 | B2 |
8643542 | Wendel | Feb 2014 | B2 |
8643652 | Kaplan | Feb 2014 | B2 |
8644810 | Boyle | Feb 2014 | B1 |
8689101 | Fux et al. | Apr 2014 | B2 |
8707208 | DiCamillo | Apr 2014 | B2 |
8731905 | Tsang | May 2014 | B1 |
9063682 | Bradshaw | Jun 2015 | B1 |
9317777 | Kaasila et al. | Apr 2016 | B2 |
9319444 | Levantovsky | Apr 2016 | B2 |
9432671 | Campanelli et al. | Aug 2016 | B2 |
9449126 | Genoni | Sep 2016 | B1 |
9483445 | Joshi et al. | Nov 2016 | B1 |
9569865 | Kaasila et al. | Feb 2017 | B2 |
9576196 | Natarajan | Feb 2017 | B1 |
9626337 | Kaasila et al. | Apr 2017 | B2 |
9691169 | Kaasila et al. | Jun 2017 | B2 |
9805288 | Kaasila et al. | Oct 2017 | B2 |
9817615 | Seguin et al. | Nov 2017 | B2 |
10007863 | Pereira et al. | Jun 2018 | B1 |
10032072 | Tran | Jul 2018 | B1 |
10115215 | Matteson et al. | Oct 2018 | B2 |
10140261 | Yang | Nov 2018 | B2 |
10157332 | Gray | Dec 2018 | B1 |
10733529 | Tran et al. | Aug 2020 | B1 |
10867241 | Rogers et al. | Dec 2020 | B1 |
11334750 | Arilla et al. | May 2022 | B2 |
11537262 | Kaasila et al. | Dec 2022 | B1 |
11587342 | Arilla et al. | Feb 2023 | B2 |
20010021937 | Cicchitelli et al. | Sep 2001 | A1 |
20010052901 | Kawabata et al. | Dec 2001 | A1 |
20020010725 | Mo | Jan 2002 | A1 |
20020029232 | Bobrow et al. | Mar 2002 | A1 |
20020033824 | Stamm | Mar 2002 | A1 |
20020052916 | Kloba et al. | May 2002 | A1 |
20020057853 | Usami | May 2002 | A1 |
20020059344 | Britton et al. | May 2002 | A1 |
20020087702 | Mori | Jul 2002 | A1 |
20020093506 | Hobson | Jul 2002 | A1 |
20020120721 | Eilers et al. | Aug 2002 | A1 |
20020122594 | Goldberg et al. | Sep 2002 | A1 |
20020174186 | Hashimoto et al. | Nov 2002 | A1 |
20020194261 | Teshima | Dec 2002 | A1 |
20030014545 | Broussard et al. | Jan 2003 | A1 |
20030076350 | Vu | Apr 2003 | A1 |
20030197698 | Perry et al. | Oct 2003 | A1 |
20040025118 | Renner | Feb 2004 | A1 |
20040088657 | Brown et al. | May 2004 | A1 |
20040119714 | Everett et al. | Jun 2004 | A1 |
20040177056 | Davis et al. | Sep 2004 | A1 |
20040189643 | Frisken et al. | Sep 2004 | A1 |
20040207627 | Konsella et al. | Oct 2004 | A1 |
20040233198 | Kubo | Nov 2004 | A1 |
20050015307 | Simpson et al. | Jan 2005 | A1 |
20050033814 | Ota | Feb 2005 | A1 |
20050094173 | Engelman et al. | May 2005 | A1 |
20050111045 | Imai | May 2005 | A1 |
20050128508 | Greef et al. | Jun 2005 | A1 |
20050149942 | Venkatraman | Jul 2005 | A1 |
20050190186 | Klassen | Sep 2005 | A1 |
20050193336 | Fux et al. | Sep 2005 | A1 |
20050200871 | Miyata | Sep 2005 | A1 |
20050264570 | Stamm | Dec 2005 | A1 |
20050270553 | Kawara | Dec 2005 | A1 |
20050275656 | Corbin et al. | Dec 2005 | A1 |
20060010371 | Shur et al. | Jan 2006 | A1 |
20060017731 | Matskewich et al. | Jan 2006 | A1 |
20060061790 | Miura | Mar 2006 | A1 |
20060072136 | Hodder et al. | Apr 2006 | A1 |
20060072137 | Nishikawa et al. | Apr 2006 | A1 |
20060072162 | Nakamura | Apr 2006 | A1 |
20060103653 | Chik et al. | May 2006 | A1 |
20060103654 | Chik et al. | May 2006 | A1 |
20060168639 | Gan | Jul 2006 | A1 |
20060241861 | Takashima | Oct 2006 | A1 |
20060245727 | Nakano et al. | Nov 2006 | A1 |
20060253395 | Corbell | Nov 2006 | A1 |
20060267986 | Bae et al. | Nov 2006 | A1 |
20060269137 | Evans | Nov 2006 | A1 |
20060285138 | Merz et al. | Dec 2006 | A1 |
20070002016 | Cho et al. | Jan 2007 | A1 |
20070006076 | Cheng | Jan 2007 | A1 |
20070008309 | Sahuc et al. | Jan 2007 | A1 |
20070024626 | Kagle et al. | Feb 2007 | A1 |
20070050419 | Weyl et al. | Mar 2007 | A1 |
20070055931 | Zaima | Mar 2007 | A1 |
20070139412 | Stamm | Jun 2007 | A1 |
20070139413 | Stamm et al. | Jun 2007 | A1 |
20070159646 | Adelberg et al. | Jul 2007 | A1 |
20070172199 | Kobayashi | Jul 2007 | A1 |
20070211062 | Engleman et al. | Sep 2007 | A1 |
20070283047 | Theis et al. | Dec 2007 | A1 |
20080028304 | Levantovsky et al. | Jan 2008 | A1 |
20080030502 | Chapman | Feb 2008 | A1 |
20080115046 | Yamaguchi | May 2008 | A1 |
20080118151 | Bouguet et al. | May 2008 | A1 |
20080154911 | Cheng | Jun 2008 | A1 |
20080222734 | Redlich et al. | Sep 2008 | A1 |
20080243837 | Davis | Oct 2008 | A1 |
20080282186 | Basavaraju | Nov 2008 | A1 |
20080303822 | Taylor | Dec 2008 | A1 |
20080306916 | Gonzalez et al. | Dec 2008 | A1 |
20090031220 | Tranchant | Jan 2009 | A1 |
20090037492 | Baitalmal et al. | Feb 2009 | A1 |
20090037523 | Kolke et al. | Feb 2009 | A1 |
20090063964 | Huang | Mar 2009 | A1 |
20090070128 | McCauley et al. | Mar 2009 | A1 |
20090097049 | Cho | Apr 2009 | A1 |
20090100074 | Joung et al. | Apr 2009 | A1 |
20090119678 | Shih | May 2009 | A1 |
20090158134 | Wang | Jun 2009 | A1 |
20090171766 | Schiff et al. | Jul 2009 | A1 |
20090183069 | Duggan et al. | Jul 2009 | A1 |
20090275351 | Jeung et al. | Nov 2009 | A1 |
20090287998 | Kalra | Nov 2009 | A1 |
20090290813 | He | Nov 2009 | A1 |
20090303241 | Priyadarshi et al. | Dec 2009 | A1 |
20090307585 | Tranchant et al. | Dec 2009 | A1 |
20100014104 | Soord | Jan 2010 | A1 |
20100066763 | MacDougall | Mar 2010 | A1 |
20100088606 | Kanno | Apr 2010 | A1 |
20100088694 | Peng | Apr 2010 | A1 |
20100091024 | Myadam | Apr 2010 | A1 |
20100115454 | Tuli | May 2010 | A1 |
20100164984 | Rane | Jul 2010 | A1 |
20100218086 | Howell et al. | Aug 2010 | A1 |
20100231598 | Hernandez et al. | Sep 2010 | A1 |
20100275161 | DiCamillo | Oct 2010 | A1 |
20100321393 | Levantovsky | Dec 2010 | A1 |
20110029103 | Mann et al. | Feb 2011 | A1 |
20110032074 | Novack et al. | Feb 2011 | A1 |
20110090229 | Bacus et al. | Apr 2011 | A1 |
20110090230 | Bacus et al. | Apr 2011 | A1 |
20110093565 | Bacus et al. | Apr 2011 | A1 |
20110115797 | Kaplan | May 2011 | A1 |
20110131153 | Grim, III | Jun 2011 | A1 |
20110188761 | Boutros et al. | Aug 2011 | A1 |
20110203000 | Bacus et al. | Aug 2011 | A1 |
20110238495 | Kang | Sep 2011 | A1 |
20110258535 | Adler, III et al. | Oct 2011 | A1 |
20110271180 | Lee | Nov 2011 | A1 |
20110276872 | Kataria | Nov 2011 | A1 |
20110289407 | Naik | Nov 2011 | A1 |
20110310432 | Waki | Dec 2011 | A1 |
20120001922 | Escher et al. | Jan 2012 | A1 |
20120016964 | Veen et al. | Jan 2012 | A1 |
20120033874 | Perronnin et al. | Feb 2012 | A1 |
20120066590 | Harris et al. | Mar 2012 | A1 |
20120072978 | DeLuca et al. | Mar 2012 | A1 |
20120092345 | Joshi et al. | Apr 2012 | A1 |
20120102176 | Lee et al. | Apr 2012 | A1 |
20120102391 | Lee et al. | Apr 2012 | A1 |
20120127069 | Santhiveeran et al. | May 2012 | A1 |
20120134590 | Petrou | May 2012 | A1 |
20120215640 | Ramer et al. | Aug 2012 | A1 |
20120269425 | Marchesotti | Oct 2012 | A1 |
20120269441 | Marchesotti et al. | Oct 2012 | A1 |
20120288190 | Tang | Nov 2012 | A1 |
20120306852 | Taylor | Dec 2012 | A1 |
20120307263 | Ichikawa et al. | Dec 2012 | A1 |
20120323694 | Lita et al. | Dec 2012 | A1 |
20120323971 | Pasupuleti | Dec 2012 | A1 |
20130033498 | Linnerud et al. | Feb 2013 | A1 |
20130067319 | Olszewski et al. | Mar 2013 | A1 |
20130120396 | Kaplan | May 2013 | A1 |
20130127872 | Kaplan | May 2013 | A1 |
20130156302 | Rodriguez Serrano et al. | Jun 2013 | A1 |
20130163027 | Shustef | Jun 2013 | A1 |
20130179761 | Cho | Jul 2013 | A1 |
20130185028 | Sullivan | Jul 2013 | A1 |
20130215126 | Roberts | Aug 2013 | A1 |
20130215133 | Gould et al. | Aug 2013 | A1 |
20130321617 | Lehmann | Dec 2013 | A1 |
20130326348 | Ip et al. | Dec 2013 | A1 |
20140025756 | Kamens | Jan 2014 | A1 |
20140047329 | Levantovsky et al. | Feb 2014 | A1 |
20140052801 | Zuo et al. | Feb 2014 | A1 |
20140059054 | Liu et al. | Feb 2014 | A1 |
20140089348 | Vollmert | Mar 2014 | A1 |
20140136957 | Kaasila et al. | May 2014 | A1 |
20140153012 | Seguin | Jun 2014 | A1 |
20140176563 | Kaasila et al. | Jun 2014 | A1 |
20140195903 | Kaasila et al. | Jul 2014 | A1 |
20140279039 | Systrom et al. | Sep 2014 | A1 |
20140282055 | Engel et al. | Sep 2014 | A1 |
20140358802 | Biswas | Dec 2014 | A1 |
20150020212 | Demaree | Jan 2015 | A1 |
20150030238 | Yang et al. | Jan 2015 | A1 |
20150036919 | Bourdev et al. | Feb 2015 | A1 |
20150055855 | Rodriguez et al. | Feb 2015 | A1 |
20150062140 | Levantovsky et al. | Mar 2015 | A1 |
20150074522 | Harned, III et al. | Mar 2015 | A1 |
20150097842 | Kaasila et al. | Apr 2015 | A1 |
20150100882 | Severenuk | Apr 2015 | A1 |
20150146020 | Imaizumi | May 2015 | A1 |
20150154002 | Weinstein et al. | Jun 2015 | A1 |
20150178476 | Horton | Jun 2015 | A1 |
20150193386 | Wurtz | Jul 2015 | A1 |
20150220494 | Qin et al. | Aug 2015 | A1 |
20150278167 | Arnold et al. | Oct 2015 | A1 |
20150339273 | Yang et al. | Nov 2015 | A1 |
20150339276 | Bloem et al. | Nov 2015 | A1 |
20150339543 | Campanelli et al. | Nov 2015 | A1 |
20150348297 | Kaasila et al. | Dec 2015 | A1 |
20160078656 | Borson et al. | Mar 2016 | A1 |
20160092439 | Ichimi | Mar 2016 | A1 |
20160140952 | Graham | May 2016 | A1 |
20160170940 | Levantovsky | Jun 2016 | A1 |
20160171343 | Kaasila et al. | Jun 2016 | A1 |
20160182606 | Kaasila et al. | Jun 2016 | A1 |
20160246762 | Eaton | Aug 2016 | A1 |
20160307156 | Burner | Oct 2016 | A1 |
20160307347 | Matteson et al. | Oct 2016 | A1 |
20160314377 | Vieira et al. | Oct 2016 | A1 |
20160321217 | Ikemoto et al. | Nov 2016 | A1 |
20160344282 | Hausler | Nov 2016 | A1 |
20160344828 | Hausler | Nov 2016 | A1 |
20160350336 | Checka | Dec 2016 | A1 |
20160371232 | Ellis et al. | Dec 2016 | A1 |
20170011279 | Soldevila | Jan 2017 | A1 |
20170017778 | Ford et al. | Jan 2017 | A1 |
20170024641 | Wierzynski | Jan 2017 | A1 |
20170039445 | Tredoux et al. | Feb 2017 | A1 |
20170098138 | Wang | Apr 2017 | A1 |
20170098140 | Wang et al. | Apr 2017 | A1 |
20170124503 | Bastide | May 2017 | A1 |
20170237723 | Gupta et al. | Aug 2017 | A1 |
20170357877 | Lin | Dec 2017 | A1 |
20180039605 | Pao et al. | Feb 2018 | A1 |
20180075455 | Kumnick et al. | Mar 2018 | A1 |
20180082156 | Jin et al. | Mar 2018 | A1 |
20180097763 | Garcia et al. | Apr 2018 | A1 |
20180109368 | Johnson et al. | Apr 2018 | A1 |
20180144256 | Saxena et al. | May 2018 | A1 |
20180165554 | Zhang | Jun 2018 | A1 |
20180203851 | Wu et al. | Jul 2018 | A1 |
20180253988 | Kanayama et al. | Sep 2018 | A1 |
20180285696 | Eigen et al. | Oct 2018 | A1 |
20180285965 | Kaasila et al. | Oct 2018 | A1 |
20180300592 | Jin | Oct 2018 | A1 |
20180332140 | Bullock | Nov 2018 | A1 |
20180341907 | Tucker et al. | Nov 2018 | A1 |
20180349527 | Li | Dec 2018 | A1 |
20180373921 | Di Carlo | Dec 2018 | A1 |
20190019087 | Fukui | Jan 2019 | A1 |
20190073537 | Arilla et al. | Mar 2019 | A1 |
20190095763 | Arilla et al. | Mar 2019 | A1 |
20200219274 | Afridi et al. | Jul 2020 | A1 |
Number | Date | Country |
---|---|---|
0949574 | Oct 1999 | EP |
2166488 | Mar 2010 | EP |
2857983 | Apr 2015 | EP |
06-258982 | Sep 1994 | JP |
H10-124030 | May 1998 | JP |
2002-507289 | Mar 2002 | JP |
2003-288184 | Oct 2003 | JP |
05-215915 | Aug 2005 | JP |
05-217816 | Aug 2005 | JP |
2007-011733 | Jan 2007 | JP |
2009-545064 | Dec 2009 | JP |
5140997 | Nov 2012 | JP |
544595 | Aug 2003 | TW |
200511041 | Mar 2005 | TW |
WO 9423379 | Oct 1994 | WO |
WO 9900747 | Jan 1999 | WO |
WO 0191088 | Nov 2001 | WO |
WO 03023614 | Mar 2003 | WO |
WO 04012099 | Feb 2004 | WO |
WO 05001675 | Jan 2005 | WO |
WO 2008013720 | Jan 2008 | WO |
Entry |
---|
Chen et al., “Detecting and reading text in natural scenes,” Proceedings of the 2004 IEEE Computer Society Conference Vision and Pattern Recognition; Publication [online]. 2004 [retrieved Dec. 16, 2018]. Retrieved from the Internet: <URL:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.334.2715&rep=rep1&type=pdf>; pp. 1-8. |
International Search Report & Written Opinion in International Application No. PCT/US18/58191, dated Feb. 19, 2019, 17 pages. |
Koren et al., “Visualization of labeled data using linear transformations.” IEEE Symposium on Information Visualization, 2003 (IEEE Cat. No. 03TH8714). |
Liu, “Visual Exploration and Comparative Analytics of Multidimensional Data Sets”, Graduate Program in Computer Science and Engineering, The Ohio State University, 2016, 210 pages. |
Shusen, et al. “Distortion-Guided Structure-Driven Interactive Exploration of High-Dimensional Data,” Computer Graphics Forum., 2014, 33(3):101-110. |
Wu et al., “Stochastic neighbor projection on manifold for feature extraction.” Neurocomputing, 2011,74(17):780-2789. |
“A first experiment with multicoloured web fonts,” Manufactura Independente website, Feb. 28, 2011, Retrieved from the internet: http://blog.manufacturaindependente.org/2011/02/a-first-experiment-with-multicoloured-web-fonts/. |
“Announcing Speakeasy: A new open-source language tool from Typekit,” Oct. 28, 2010, on-line http://blog.typekit.com/2010/10/28/announcing-speakeasy-a-new-open-source-language-tool-from-typekit/. |
“colorfont/v1,” Feb. 28, 2011, retrieved from the internet: http://manufacturaindependente.com/colorfont/v1/. |
“Flash CS4 Professional ActionScript 2.0”, 2007, retrieved on http://help.adobe.com/en_US/AS2LCR/Flash_10.0/help.html?content=00000284.html on Aug. 31, 2015. |
“photofont.com—Use photofonts,” Sep. 2, 2012, retrieved from the internet: http://web.archive.org/web/20120902021143/http://photofont.com/photofont/use/web. |
“Saffron Type System”, retrieved from the internet Nov. 12, 2014, 7 pages. |
Adobe Systems Incorporated, “PostScript Language Reference—Third Edition,” Feb. 1999, pp. 313-390. |
Adobe Systems Incorporated, “The Type 42 Font Format Specification,” Technical Note #5012, Jul. 31, 1998, pp. 1-24. |
Adobe Systems Incorporated, “To Unicode Mapping File Tutorial,” Adobe Technical Note, XP002348387, May 2003. |
Apple Computers, “The True Type Font File,” Oct. 27, 2000, pp. 1-17. |
Celik et al., “W3C, CSS3 Module: Fonts,” W3C Working Draft, Jul. 31, 2001, pp. 1-30. |
Doughty, Mike, “Using OpenType® Fonts with Adobe® InDesign®,” Jun. 11, 2012 retrieved from the internet: http://webarchive.org/web/20121223032924/http://www.sketchpad.net/opentype-indesign.htm (retrieved Sep. 22, 2014), 2 pages. |
European Search Report, 13179728.4, dated Sep. 10, 2015, 3 pages. |
European Search Report, 14184499.3, dated Jul. 13, 2015, 7 pages. |
European Search Report, 14187549.2, dated Jul. 30, 2015, 7 pages. |
European Search Report in European Application No. 18193233.6, dated Nov. 11, 2018, 8 pages. |
European Search Report in European Application No. 18197313.2, dated Nov. 30, 2018, 7 pages. |
Extensis, Suitcase 10.2, Quick Start Guide for Macintosh, 2001, 23 pgs. |
Font Pair, [online]. “Font Pair”, Jan. 20, 2015, Retrieved from URL: http://web.archive.org/web/20150120231122/http://fontpair.co/, 31 pages. |
Forums.macrumors.com' [online]. “which one is the main FONTS folder ?” May 19, 2006, [retrieved on Jun. 19, 2017]. Retrieved from the Internet: URL<http://forums.macrumors.com/threads/which-one-is-the-main-fontsfolder.202284/>, 7 pages. |
George Margulis, “Optical Character Recognition: Classification of Handwritten Digits and Computer Fonts”, Aug. 1, 2014, URL: https://web.archive.org/web/20140801114017/http://cs229.stanford.edu/proj2011/Margulis-OpticalCharacterRecognition.pdf. |
Goswami, Gautum, “Quite ‘Writly’ Said!,” One Brick at a Time, Aug. 21, 2006, Retrieved from the internet: :http://gautamg.wordpress.com/2006/08/21/quj.te-writely-said/ (retrieved on Sep. 23, 2013), 3 pages. |
International Preliminary Report on Patentability issued in PCT application No. PCT/US2013/071519 dated Jun. 9, 2015, 8 pages. |
International Preliminary Report on Patentability issued in PCT application No. PCT/US2015/066145 dated Jun. 20, 2017, 7 pages. |
International Preliminary Report on Patentability issued in PCT application No. PCT/US2016/023282, dated Oct. 26, 2017, 9 pages. |
International Search Report & Written Opinion issued in PCT application No. PCT/US10/01272, dated Jun. 15, 2010, 6 pages. |
International Search Report & Written Opinion issued in PCT application No. PCT/US2011/034050 dated Jul. 15, 2011, 13 pages. |
International Search Report & Written Opinion, PCT/US2013/026051, dated Jun. 5, 2013, 9 pages. |
International Search Report & Written Opinion, PCT/US2013/071519, dated Mar. 5, 2014, 12 pages. |
International Search Report & Written Opinion, PCT/US2013/076917, dated Jul. 9, 2014, 11 pages. |
International Search Report & Written Opinion, PCT/US2014/010786, dated Sep. 30, 2014, 9 pages. |
International Search Report & Written Opinion, PCT/US2016/023282, dated Oct. 7, 2016, 16 pages. |
Japanese Office Action, 2009-521768, dated Aug. 28, 2012. |
Japanese Office Action, 2013-508184, dated Apr. 1, 2015. |
Ma Wei-Ying et al., “Framework for adaptive content delivery in heterogeneous network environments”, Jan. 24, 2000, Retrieved from the Internet: http://www.cooltown.hp.com/papers/adcon/MMCN2000. |
Open Text Exceed, User's Guide, Version 14, Nov. 2009, 372 pgs. |
Saurabh, Kataria et al., “Font retrieval on a large scale: An experimental study”, 2010 17th IEEE International Conference on Image Processing (ICIP 2010); Sep. 26-29, 2010; Hong Kong, China, IEEE, Piscataway, NJ, USA, Sep. 26, 2010, pp. 2177-2180. |
Supplementary European Search Report, European Patent Office, European patent application No. EP 07796924, dated Dec. 27, 2010, 8 pages. |
TrueType Fundamentals, Microsoft Typography, Nov. 1997, pp. 1-17. |
Typeconnection, [online]. “typeconnection”, Feb. 26, 2015, Retrieved from URL: http://web.archive.org/web/20150226074717/http://www.typeconnection.com/stepl.php, 4 pages. |
Universal Type Server, Upgrading from Suitcase Server, Sep. 29, 2009, 18 pgs. |
Wenzel, Martin, “An Introduction to OpenType Substitution Features,” Dec. 26, 2012, Retrieved from the internet: http://web.archive.org/web/20121226233317/http://ilovetypography.com/OpenType/opentype-features. Html (retrieved on Sep. 18, 2014), 12 pages. |
Written Opposition to the grant of Japanese Patent No. 6097214 by Masatake Fujii, dated Sep. 12, 2017, 97 pages, with partial English translation. |
International Preliminary Report on Patentability in International Appln. No. PCT/US2018/058191, dated May 5, 2020, 10 pages. |
Rannanathan et al. “A Novel Technique for English Font Recognition Using Support Vector Machines,” 2009 International Conference on Advances in Recent Technologies in Communication and Computing, 2009, pp. 766-769. |
Rannanathan et al., “Tamil Font Recognition Using Gabor Filters and Support Vector Machines,” 2009 International Conference on Advances in Computing, Control and Telecommunication Technologies, 2009, pp. 613-615. |
O'Donovan et al., “Exploratory Font Selection Using Crowdsourced Attributes,” ACT TOG, Jul. 2014, 33(4): 9 pages. |
Wikipedia.com [online], “Feature selection,” Wikipedia, Sep. 19, 2017, retrieved on Oct. 19, 2021, retrieved from URL<https://en.wikipedia.org/w/index.php?title=Feature selection&oldid=801416585>, 15 pages. |
www.dgp.toronto.edu [online], “Supplemental Material: Exploratory Font Selection Using Crowdsourced Attributes,” available on or before May 12, 2014, via Internet Archive: Wayback Machine URL <https://web.archive.org/web/20140512101752/http://www.dgp.toronto.edu/˜donovan/font/supplemental.pdf>, retrieved on Jun. 28, 2021, URL<http://www.dgp.toronto.edu/˜donovan/font/supplemental.pdf>, 9 pages. |
Number | Date | Country | |
---|---|---|---|
20190130232 A1 | May 2019 | US |
Number | Date | Country | |
---|---|---|---|
62578939 | Oct 2017 | US |