Font identification from imagery

Information

  • Patent Grant
  • 11657602
  • Patent Number
    11,657,602
  • Date Filed
    Tuesday, October 30, 2018
    6 years ago
  • Date Issued
    Tuesday, May 23, 2023
    a year ago
  • CPC
    • G06V10/82
    • G06F40/109
    • G06N20/00
    • G06V20/62
    • G06V30/19173
    • G06V30/245
    • G06V30/413
  • Field of Search
    • CPC
    • G06K9/6828
    • G06K9/325
    • G06K9/00456
    • G06K9/6271
    • G06F40/109
    • G06N20/00
    • G06N20/20
    • G06N20/10
    • G06N5/003
    • G06N3/0472
    • G06N3/0454
    • G06N3/084
  • International Classifications
    • G06K9/68
    • G06V10/82
    • G06N20/00
    • G06F40/109
    • G06V30/244
    • G06V20/62
    • G06V30/413
    • G06V30/19
Abstract
A system includes a computing device that includes a memory configured to store instructions. The system also includes a processor to execute the instructions to perform operations that include receiving an image that includes textual content in at least one font. Operations also include identifying the at least one font represented in the received image using a machine learning system. The machine learning system being trained using images representing a plurality of training fonts. A portion of the training images includes text located in the foreground and being positioned over captured background imagery.
Description
BACKGROUND

This description relates to identifying individual fonts present in images by using one or more techniques such as artificial intelligence.


Graphic designers along with other professionals are often interested in identifying fonts noticed in various media (e.g., appearing on signs, books, periodicals, etc.) for later use. Some may take a photo of the text represented in the font of interest and later attempt to manually identify the font, which can be an extremely laborious and tedious task. To identify the font, the individual may need to exhaustively explore a seemingly endless list of hundreds or even thousands of alphabetically ordered fonts.


SUMMARY

The described systems and techniques are capable of effectively identifying fonts in an automatic manner from an image (e.g., a photograph) by using artificial intelligence. By extensive training of a machine learning system, font identification can be achieved with a high probability of success. Along with using a relative large font sample set, training the machine learning system can include using different image types and augmented images (e.g., distorted images) of fonts so the system is capable of recognizing fonts presented in less than pristine imagery.


In one aspect, a computing device implemented method includes receiving an image that includes textual content in at least one font. The method also includes identifying the at least one font represented in the received image using a machine learning system. The machine learning system being trained using images representing a plurality of training fonts. A portion of the training images includes text located in the foreground and being positioned over captured background imagery.


Implementations may include one or more of the following features. The text located in the foreground may be synthetically augmented. Synthetic augmentation may be provided in a two-step process. The text may be synthetically augmented based upon one or more predefined conditions. The text located in the foreground may be undistorted. The text may be included in captured imagery. The captured background imagery may be predominately absent text. The text located in the foreground may be randomly positioned in the portion of training images. Prior to the text being located in the foreground, a portion of the text may be removed. The captured background imagery may be distorted when captured. Font similarity may be used to identify the at least one font. Similarity of fonts in multiple image segments may be used to identify the at least one font. The machine learning system may be trained by using transfer learning. An output of the machine learning system may represent each font used to train the machine learning system. The output of the machine learning system may provide a level of confidence for each font used to train the machine learning system. A subset of the output of the machine learning system may be scaled and a remainder of the output is removed. Some of the training images may be absent identification. Identifying the at least one font represented in the received image using the machine learning system may include using additional images received by the machine learning system. Outputs of the machine learning system for the received image and the additional images may be combined to identify the at least one font.


In another aspect, a system includes a computing device that includes a memory configured to store instructions. The system also includes a processor to execute the instructions to perform operations that include receiving an image that includes textual content in at least one font. Operations also include identifying the at least one font represented in the received image using a machine learning system. The machine learning system being trained using images representing a plurality of training fonts. A portion of the training images includes text located in the foreground and being positioned over captured background imagery.


Implementations may include one or more of the following features. The text located in the foreground may be synthetically augmented. Synthetic augmentation may be provided in a two-step process. The text may be synthetically augmented based upon one or more predefined conditions. The text located in the foreground may be undistorted. The text may be included in captured imagery. The captured background imagery may be predominately absent text. The text located in the foreground may be randomly positioned in the portion of training images. Prior to the text being located in the foreground, a portion of the text may be removed. The captured background imagery may be distorted when captured. Font similarity may be used to identify the at least one font. Similarity of fonts in multiple image segments may be used to identify the at least one font. The machine learning system may be trained by using transfer learning. An output of the machine learning system may represent each font used to train the machine learning system. The output of the machine learning system may provide a level of confidence for each font used to train the machine learning system. A subset of the output of the machine learning system may be scaled and a remainder of the output is removed. Some of the training images may be absent identification. Identifying the at least one font represented in the received image using the machine learning system may include using additional images received by the machine learning system. Outputs of the machine learning system for the received image and the additional images may be combined to identify the at least one font.


In another aspect, one or more computer readable media store instructions that are executable by a processing device, and upon such execution cause the processing device to perform operations including receiving an image that includes textual content in at least one font. Operations also include identifying the at least one font represented in the received image using a machine learning system. The machine learning system being trained using images representing a plurality of training fonts. A portion of the training images includes text located in the foreground and being positioned over captured background imagery.


Implementations may include one or more of the following features. The text located in the foreground may be synthetically augmented. Synthetic augmentation may be provided in a two-step process. The text may be synthetically augmented based upon one or more predefined conditions. The text located in the foreground may be undistorted. The text may be included in captured imagery. The captured background imagery may be predominately absent text. The text located in the foreground may be randomly positioned in the portion of training images. Prior to the text being located in the foreground, a portion of the text may be removed. The captured background imagery may be distorted when captured. Font similarity may be used to identify the at least one font. Similarity of fonts in multiple image segments may be used to identify the at least one font. The machine learning system may be trained by using transfer learning. An output of the machine learning system may represent each font used to train the machine learning system. The output of the machine learning system may provide a level of confidence for each font used to train the machine learning system. A subset of the output of the machine learning system may be scaled and a remainder of the output is removed. Some of the training images may be absent identification. Identifying the at least one font represented in the received image using the machine learning system may include using additional images received by the machine learning system. Outputs of the machine learning system for the received image and the additional images may be combined to identify the at least one font.


In another aspect, a computing device implemented method includes receiving an image that includes textual content in at least one font, and, identifying the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts, wherein a portion of the training images is produced by a generator neural network.


Implementations may include one or more of the following features. The generator neural network may provide augmented imagery to a discriminator neural network for preparing the generator neural network. The augmented imagery produced by the generator neural network may include a distorted version of a font to train the machine learning system. Determinations produced by the discriminator neural network may be used to improve operations of the discriminator neural network. Determinations produced by the discriminator neural network may be used to improve operations of the generator neural network. Font similarity may be used to identify the at least one font. Similarity of fonts in multiple image segments may be used to identify the at least one font. The machine learning system may be trained by using transfer learning. An output of the machine learning system may represent each font used to train the machine learning system. The output of the machine learning system may provide a level of confidence for each font used to train the machine learning system. A subset of the output of the machine learning system may be scaled and a remainder of the output is removed. Some of the training images may be absent identification. Identifying the at least one font represented in the received image using the machine learning system may include using additional images received by the machine learning system. Outputs of the machine learning system for the received image and the additional received images may be combined to identify the at least one font.


In another aspect, a system includes a computing device that includes a memory configured to store instructions. The system also includes a processor to execute the instructions to perform operations that include receiving an image that includes textual content in at least one font. Operations also include identifying the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts, wherein a portion of the training images is produced by a generator neural network.


Implementations may include one or more of the following features. The generator neural network may provide augmented imagery to a discriminator neural network for preparing the generator neural network. The augmented imagery produced by the generator neural network may include a distorted version of a font to train the machine learning system. Determinations produced by the discriminator neural network may be used to improve operations of the discriminator neural network. Determinations produced by the discriminator neural network may be used to improve operations of the generator neural network. Font similarity may be used to identify the at least one font. Similarity of fonts in multiple image segments may be used to identify the at least one font. The machine learning system may be trained by using transfer learning. An output of the machine learning system may represent each font used to train the machine learning system. The output of the machine learning system may provide a level of confidence for each font used to train the machine learning system. A subset of the output of the machine learning system may be scaled and a remainder of the output is removed. Some of the training images may be absent identification. Identifying the at least one font represented in the received image using the machine learning system may include using additional images received by the machine learning system. Outputs of the machine learning system for the received image and the additional received images may be combined to identify the at least one font.


In another aspect, one or more computer readable media store instructions that are executable by a processing device, and upon such execution cause the processing device to perform operations including receiving an image that includes textual content in at least one font. Operations also include identifying the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts, wherein a portion of the training images is produced by a generator neural network.


Implementations may include one or more of the following features. The generator neural network may provide augmented imagery to a discriminator neural network for preparing the generator neural network. The augmented imagery produced by the generator neural network may include a distorted version of a font to train the machine learning system. Determinations produced by the discriminator neural network may be used to improve operations of the discriminator neural network. Determinations produced by the discriminator neural network may be used to improve operations of the generator neural network. Font similarity may be used to identify the at least one font. Similarity of fonts in multiple image segments may be used to identify the at least one font. The machine learning system may be trained by using transfer learning. An output of the machine learning system may represent each font used to train the machine learning system. The output of the machine learning system may provide a level of confidence for each font used to train the machine learning system. A subset of the output of the machine learning system may be scaled and a remainder of the output is removed. Some of the training images may be absent identification. Identifying the at least one font represented in the received image using the machine learning system may include using additional images received by the machine learning system. Outputs of the machine learning system for the received image and the additional received images may be combined to identify the at least one font.


These and other aspects, features, and various combinations may be expressed as methods, apparatus, systems, means for performing functions, program products, etc.


Other features and advantages will be apparent from the description and the claims.





DESCRIPTION OF DRAWINGS


FIG. 1 illustrates a computer system attempting to identify a font.



FIG. 2 illustrates a computer system presenting a listing for identifying a font.



FIG. 3 is a block diagram of the font identifier shown in FIG. 2.



FIG. 4 is an architectural diagram of a computational environment for identifying fonts.



FIG. 5 is a dataflow diagram that includes a machine learning system.



FIGS. 6 and 7 are flowcharts of operations of a font identifier.



FIG. 8 illustrates an example of a computing device and a mobile computing device that can be used to implement the techniques described here.





DETAILED DESCRIPTION

Referring to FIG. 1, a computing device (e.g., a computer system 100) includes a display 102 that allows a user to view a list of fonts generated by the computing device. When operating with pristine imagery, predicting the font or fonts present in such images can be achieved using one or more convention techniques (e.g., searching font libraries, pattern matching, etc.). However, attempting to detect and identify one or more fonts from a less than pristine images (e.g., referred to as real world images) can result in a low probability of success. For example, an individual (e.g., graphic designer) may be interested in identifying a font present in a street sign and capture of picture of the sign in less than ideal environment conditions (e.g., low lighting, poor weather, etc.). As illustrated, the captured image may also include other content that can hinder operations to identify the font. In this example, via an interface 103, an image 104 containing text in a distinct font also includes other content that is separate from the text (e.g., the text is printed on a graphic of a star). Due to this additional content, operations of the computer system can have difficulty in separating the text (in the font of interest) from the background graphic. Based upon the combined contents of the image 104, a list of possible matching fonts 106 generated by the computer system 100 includes entries that are less than accurate matches. For example, the top prediction 108 is a font that contains different graphics as elements. Other predicted fonts 110, 112, 114, and 116 included in the list 106 similarly present fonts that are far from matching the font present in the image 104. Presented with such results, the individual interested in identifying the font captured in image 104 may need to manually search through hundreds if not thousands of fonts in multiple libraries. As such, tens of hours may be lost through the search or the individual may abandon the task and never identify this font of interest.


Referring to FIG. 2, another computing device (e.g., a computer system 200) also includes a display 202 that allows a user to view imagery, for example, to identify one or more fonts of interest. Computer system 200 executes a font identifier 204 that employs artificial intelligence to identify one or more fonts present in images or other types of media. By using artificial intelligence, the font identifier 204 can detect and identify fonts present in images captured under less than optimum conditions. For example, the font identifier 204 can include a machine learning system that is trained with pristine images of fonts and many distorted representations of fonts. By using such training data sets, the font identifier 204 is capable of detecting fonts represented in many types of images. Using this capability, the font identifier 204 is able to identify a list of potentially matching fonts that have a higher level of confidence (compared to the system shown in FIG. 1). As illustrated in this example, an interface 206 presented on the display 202 includes an input image 208 (which is equivalent to the image 104 shown in FIG. 1). After analyzing the complex content of the image 208, the machine learning system of the font identifier 204 identifies and presents potentially matching candidates in an ordered list 210 that includes a font 212 with the highest level of confidence (for being a match) at the highest position of the list. The list 210 also includes other fonts 214-220 identified as possible matches but not having the same level of confidence as the font 212 in the upper most position on the list. Compared to the list of candidates presented in FIG. 1, the machine learning system employed by the font identifier 204 provides closer matches to the font present input image 208. As such, the individual attempting to identify the font is not only provided a near matching font (if not an exactly matching font) but also a number of closely matching fonts are identified from an image that contains content not related to the textual content of the image.


To provide this functionality, the font identifier 204 may use various machine learning techniques such as deep learning to improve the identification processes through training the system (e.g., expose multilayer neural networks to training data, feedback, etc.). Through such machine learning techniques, the font identifier 204 uses artificial intelligence to automatically learn and improve from experience without being explicitly programmed. Once trained (e.g., from images of identified fonts, distorted images of identified fonts, images if unidentified fonts, etc.), one or more images, representation of images, etc. can be input into the font identifier 204 to yield an output. Further, by returning information about the output (e.g., feedback), the machine learning technique can use the output as additional training information. Other training data can also be provided for further training. By using increased amounts of training data (e.g., images of identified fonts, unidentified fonts, etc.), feedback data (e.g., data representing user confirmation of identified fonts), etc. the accuracy of the system can be improved (e.g., to predict matching fonts).


Other forms of artificial intelligence techniques may be used by the font identifier 204. For example, to process information (e.g., images, image representations, etc.) to identify fonts, etc., the architecture may employ decision tree learning that uses one or more decision trees (as a predictive model) to progress from observations about an item (represented in the branches) to conclusions about the item's target (represented in the leaves). In some arrangements, random forests or random decision forests are used and can be considered as an ensemble learning method for classification, regression and other tasks. Such techniques generally operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Support vector machines (SVMs) can be used that are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.


Ensemble learning systems may also be used for font prediction in which multiple system members independently arrive at a result. System members can be of the same type (e.g., each is a decision tree learning machine, etc.) or members can be of different types (e.g., one Deep CNN system such as a ResNet50, one SVM system, one decision tree system, etc.). Upon each system member determining a result, a majority vote among the system members is used (or other type of voting technique) to determine an overall prediction result. In some arrangements, one or more knowledge-based systems such as an expert systems may be employed. In general, such expert systems are designed by solving relatively complex problems by using reasoning techniques that may employ conditional statements (e.g., if-then rules). In some arrangements such expert systems may use multiple systems such as a two sub-system design, in which one system component stores structured and/or unstructured information (e.g., a knowledge base) and a second system component applies rules, etc. to the stored information (e.g., an inference engine) to determine results of interest (e.g., select images likely to be presented).


Referring to FIG. 3, the font identifier 204 (which is executed by the computer system 200, e.g., a server, etc.), is illustrated as containing a number of modules. In this arrangement, the font identifier 204 includes an image collector 300 that is capable of receiving data that represents a variety of images. For example, images can be provided in one or more formats (e.g., .jpeg, .pdf, etc.) that provide a visual element representation (e.g., a pixel representation) of a corresponding image. In some cases, additional information can be provided with the imagery; for example, one or more attributes that reflect aspects of an image. For example, data may be included that identifies any font or fonts represented in the image. For instances where one or more fonts are identified, the image can be considered as being labeled. Attributes can also be provided that represent visual aspects of imagery (e.g., resolution, identify region where text is located—location of the rectangle that contains the text, color(s) of the text, color(s) of the image's background, etc.), content aspects (e.g., information about the text such as the font category being used by the text—what type of san serif font is being used by the text), etc. Such attributes can be represented in various forms; for example, each attribute may be represented by one or more numerical values (e.g., Boolean values, fix point values, floating point values, etc.) and all of the attributes may be provided in single form (e.g., a vector of numerical values) to the font identifier 204.


In this arrangement, such image data may be collected by an image collector 300 and stored (e.g., in a collected image database 302) on a storage device 304 for later retrieval. In some arrangements, information associated with images (e.g., font information, image attributes, etc.) may be provided and stored in an image information database 306. Retrieving the image data (stored in database 302) and/or image information (stored in the database 306), a trainer 308 is provided the data to train a font machine learning system 310. Various type of data may be used for training the system; for example, images (e.g., thousands of images, millions of images) can be used by the trainer 308. For example, pristine images of fonts (e.g., portions of font characters, font characters, phrases using a font), distorted images of fonts (e.g., synthetically altered versions of fonts), real-world images of fonts (e.g., images captured by individuals in real-world conditions that include one or more fonts) may be used to train the font machine learning system 310. For some images of fonts (e.g., images of pristine fonts, synthetically altered versions of fonts, etc.) information that identifies each included font (e.g., labels) may be provided for training. Alternatively, for some images (e.g., captured under real-world conditions), identifying information (of included fonts) may be absent.


Once trained, the font machine learning system 310 may be provided input data such as one or more images to identify the font or fonts present in the images. For example, after being trained using pristine, distorted, and real-world images of fonts, images containing unidentified fonts and captured under real-world conditions may be input for predicting the contained fonts (as illustrated in FIG. 2). The font identifier 204 may output data that represents the predicted font or fonts determined through an analysis of the input image. For example, a vector may be output in which each vector element represents one potentially matching font. In one arrangement, this vector may include a considerable number of elements (e.g., 133,000 elements), one for each font used to train the system. Various types of data may be provided by each element to reflect how well the font representing that particular element matches the font present in the input. For example, each element of the vector may include a floating-point number that represents a level of confidence that the corresponding font (represented by the vector element) matches a font included in the input. In some arrangements, the sum of these vector quantities represent a predefined amount (e.g., a value of one) to assist comparing confidence levels and determining which fonts are closer matches. In this example, the output vector (e.g., 133,000 element vector) from the font learning machine system 310 is stored in an output data database 312. A font analyzer 314 can retrieve the data from the database 312 and determine which font or fonts is the closest match to the input (e.g., by reviewing the level of confidence in the stored vector elements). The results determined by the font analyzer 314 (e.g., an ordered list of fonts) can be stored on the storage device 304 (e.g., in a font identification database 316) for later retrieval and use. For example, the input images (captured under real-world conditions) and correspondingly identified fonts be further used to train the font machine learning system 310 or other artificial intelligence based systems.


Referring to FIG. 4, various type of computing device architectures may be employed to collect, process, and output information associated with identifying fonts through a machine learning system. For example, an Internet based system 400 (e.g., a cloud based system) maybe used in which operations are distributed to a collection of devices. Such architectures may operate by using one or more systems; for example, the system 400 may use a network of remote servers hosted on the Internet to store, manage, and process data. In some arrangements, the system may employ local servers in concert with the network of remote servers.


Referring to FIG. 5, a block diagram 500 is presented that provides a graphical representation of the functionality of the font machine learning system 310 (shown in FIG. 3). Prior to using the learning system 310 to process an input 502 (e.g., an image that includes a font to be identified) to produce an output 504 (e.g., a vector of 133,000 elements representing the level of confidence that a corresponding font matches the input font), the learning system needs to be trained. Various types of training data 506 may be used to prepare the font machine learning system 310 to identify fonts of interest to an end user (e.g., potential licensees of the identified font or fonts). For example, images of fonts in pristine condition and images of fonts distorted (e.g., by one or more synthetic distortion techniques, real world conditions, etc.) may be employed. In some arrangements, an initial set of font images (e.g., that represent 14,000 fonts) are used to start the training of the system and then images representing the remainder fonts (e.g., 133,000−14,000=119,000 fonts) are used via a transfer learning technique to scale-up learning of the system (e.g., by adjusting a classifier of the system, fine-tuning weights of the trained system through backpropagation, etc.). In some instances, images may be used multiple times for system training; for example, an image may present a font (e.g., font character) in a pristine condition and then be distorted (using one or more synthetic distortion techniques) to provide the font in one or more other forms. In some arrangements feedback data 508 can also be provided to the font machine learning system to further improve training. In some arrangements, font imagery may be augmented (e.g., distorted) based on one or more conditions (e.g., predefined conditions) such as characteristics of the fonts. For example, a font that visually represents characters with thin line strokes (e.g., a light weight font) may be augmented with relatively minor visual adjustments. Alternatively, a font that presents characters with thick bold lines may be augmented (e.g., distorted) by introducing more bold and easily noticeable visual adjustments (e.g., drastically increasing the darkness or thickness of character lines). In another example, a font that presents characters as having visually hollow segments can be augmented (e.g., distorted) differently that a font that presents characters with completely visually solid segments. Other types of conditions (e.g., predefined conditions) may be employed for directing synthetic augmentation. For example, the content presented by one or more fonts, characteristics of one or more fonts (e.g., character size, style of the font such as bold, italic, etc.), etc. may be used. The use of the font (or fonts) within the environment being presented may also provide a condition or conditions for augmentation. The location, position, orientation, etc. of a font within an environment (e.g., positioned in the foreground, background, etc.) can be used to define on or more conditions. The content of the imagery separate from the font can also be used to define one of more conditions; for example, contrast, brightness, color differences, between the a font and the surround imagery may be used to determine conditions.


The training data 506 may also include segments of one training image. For example, one image may be segmented into five separate images that focus on different areas of the original image. Such image segmenting can be used when the machine learning system predicts a font from an input image. For prediction operations, a prediction result (e.g., a 133,000 element output vector) can be attained for each segment and an overall result determined (e.g., by averaging the individual results) to improve prediction accuracy. One image may be cropped from the original image to focus upon the upper left quadrant of the original image while three other segments may be cropped to focus on the upper right, lower left, and lower right portions of the original image, respectively. A fifth image segment may be producing by cropping the original image to focus upon the central portion of the original image. Various sizes and shapes may be used to create these segments; for example the original image may be of a particular size (e.g., 224 by 224 pixels, 120 by 120 pixels, etc.) while the segments are of lesser size (e.g., 105 by 105 pixels). In some arrangements, the segments may include overlapping content or non-overlapping content may be included in each segment. While the original image and the cropped segments may be square shaped, in some instances the images may be rectangular or have another type of shape.


In one arrangement, after initial training with the first set of fonts (e.g., 14,000 fonts), for each new font used in the subsequent training (each remaining of the 133,000 fonts), operations are executed (by the font identifier 204) to determine the most similar font from the first set initially used to train the system (e.g., the most similar font present in the 14,000 fonts). To determine which font is most similar, one or more techniques may be employed, for example, techniques including machine learning system based techniques may be used as described in U.S. patent application Ser. No. 14/694,494, filed Apr. 23, 2015, entitled “Using Similarity for Grouping Fonts and Individuals for Recommendations”, U.S. patent application Ser. No. 14/690,260, filed Apr. 17, 2015 entitled “Pairing Fonts for Presentation”, U.S. Pat. No. 9,317,777, issued Apr. 19, 2016, entitled “Analyzing Font Similarity for Presentation”, and U.S. Pat. No. 9,805,288, to be issued Oct. 31, 2017, entitled “Analyzing Font Similarity for Presentation”, each of which are incorporated by reference in the entirety herein. Upon determining which font from the initial set is most similar, associated information (e.g., weights of the last layer for this identified font) are used for establishing this new font in the machine learning system (e.g., copy the weights to a newly added connection for this new font). By determining a similar font, and using information to assist the additional training (e.g., random weights are not employed), the font machine learning system 310 continues its training in an expedited manner. Other types of similarity techniques can be employed by the system. For example, comparisons (e.g., distance calculations) may be performed on one or more layers before the output layer. In one arrangement, the layer located before the output layer can be considered as feature space (e.g., a 1000 dimension feature space) and executing comparisons for different system inputs can provide a similarity measure. Along with distance measurements between the two feature spaces (for two inputs) other types of calculations such as measure cosine similarity can be employed.


Similarity calculations may be used for other operations associated with the font machine learning system 310. In some instances, accuracy may degrade when scaling the training from a first set of training fonts (e.g., 14,000) to the full font complement (e.g., of remainder of the 133,000 fonts). This accuracy drop may be caused by having a significant number of similar fonts being used for training. For example, many font variants (e.g., hundreds) of one font (e.g., Helvetica) may be represented in the system, and a newly introduced font may appear associated with a number of the font variants. By employing one or more mathematical metrics, convergence can be gauged. For example, similarity accuracy can be measured by using similarity techniques such as techniques incorporated by reference above. In one arrangement, accuracy can be calculated using the similarity techniques to determine the similarity of predicted font (provided as the system output 504) and an identified (labeled) font (used to train the system). If similar, the prediction can be considered as being correct. In another situation, only a limited number of training fonts that are provided in distorted imagery (e.g., captured in real-world conditions) are identified (e.g., only 500 fonts are identified—or labeled). Due to this limitation, system accuracy may decrease (e.g., for the 133,000 possible prediction outputs from the machine). To improve accuracy, only the limited number of labeled fonts (e.g., the 500 labeled fonts) are considered active and all other possible predictions are not considered active (e.g., and are assigned prediction values of zero). Using this technique, accuracy can improve as the training of the machine learning system scales up. In some implementations the result with the highest probability is the expected result (Top-1 accuracy). In some cases, the Top-1 accuracy can be based on synthetic data. Implementing a model in which the five highest probabilities match the expected result can also be employed (Top-5 accuracy). In some cases, the Top-5 accuracy can be based on synthetic data.


The similarity techniques may also be used for measuring the quality of segmenting an input image (e.g., quality of cropping of an input image). For example, upon receiving a page of text, graphics, etc., the page can be cropped into segments (e.g., rectangular shaped segments) by the font identifier 204 such that each segment contains the text of the page. In many cases, the text of the segments contain similar fonts, if not the same font. Each text segment is input into the font machine learning system 310 and a number of predicted fonts is output (K predicted fonts). For two crops, distance values can be calculated between the predicted fonts (the K predicted fonts). An estimated value (e.g., mean value) for the distance values (K*K values) is calculated to identify a threshold value. In some instances, the estimated value is multiplied by a constant (e.g., value 1.0, etc.) for the threshold value. If the top predictions for two crops have a distance value less than this threshold value, the crops can be considered as containing similar fonts. If the distance value is above the threshold value, the two crops can be considered as containing different fonts.


Similarity calculations can also be executed to determine the quality of a crop. For example, a segment of text attained through the cropping of an image can be input into the machine learning system. Techniques such as Fast Region-based Convolutional Network method (Fast R-CNN), Faster Region-based Convolutional Network method (Faster R-CNN), etc. can be used to classify objects (e.g., detect rectangular regions that contain text). The output of the system provides a number of predicted fonts (e.g., K predicted fonts) for the cropped segment. Similarity calculations may be executed among the K predicted fonts. If the calculations report that the K predicted fonts are similar, the segment can be considered as being attained from a good quality crop. If the K predicted fonts lack similarity, the cropping operations used to attain the segment can be considered poor. If the similarity calculations report that non-similar fonts are present, corrective operations may be executed (e.g., cropping operations may be repeated to attain another segment for re-testing for similarity of predicted fonts). In some arrangements, a numerical value may be assigned to the crop quality; for example, a value of one may indicate that a good segment has been attained from the cropping operations, and a value of zero may indicate poor cropping operations may have produced a segment with dissimilar predicted fonts.


As described above, the font machine learning system 310 outputs prediction values for each of the potential fonts (e.g., each of the 133,000 fonts represented as elements of an output vector). Typically, the numerical values are assigned to each potential font to represent the prediction. Additionally, these numerical values are scaled so the sum has a value of one. However, give the considerably large number of potential fonts (e.g., again, 133,000 fonts), each individual value can be rather small and difficult to interpret (e.g., identify differences from other values). Further, even values that represent the top predicted values can be small and difficult to distinguish one from another. In some arrangements, a software function (e.g., a Softmax function) causes the sum of the prediction values to equal a value of one. One or more techniques may be provided by the font machine learning system 310 to address these numerical values and improve their interpretation. For example, only a predefined number of top predicted fonts are assigned a numerical value to represent the level of confidence. In one arrangement, the top 500 predicted fonts are assigned a numerical value and the remaining fonts (e.g., 133,000−500=132,500 fonts) are assigned a numerical value of zero. Further, the numerical values are assigned to the top 500 predicted fonts such that the sum of the numerical values has a value of one. In one implementation, the lower font predictions (e.g., 133,000−500=132,500 fonts) are zeroed out before the Softmax function is applied. In effect, the top N (e.g., 500) predicted fonts are boosted in value to assist with further processing (e.g., identifying top predictions, prediction distributions, etc.). In some arrangements, corresponding elements of multiple output vectors are summed (in which each output vector represents a different input image, a different portion of an image, etc.). Through the summing operation, fonts common among the images can be identified, for example.


As mentioned above, various techniques may be employed to distort images for increasing the robustness of the font machine learning system 310 (e.g., the system trains on less than pristine images of fonts to improve the system's ability to detect the fonts in “real world” images such as photographs). Since the fonts are known prior to being distorted through the synthetic techniques, each of the underlying fonts is known and can be identified to the font machine learning system 310. Along with these synthetically distorted fonts, the robustness of the font machine learning system 310 can be increased by providing actual real-world images of fonts (e.g., from captured images provided by end users, etc.). In many cases, the underlying fonts present in these real-world images are unknown or at the least not identified when provided to the system for training. As such, the system will develop its own identity of these fonts. To improve robustness, various amounts of these unlabeled real-world font images may be provided during training of the system. For example, in some training techniques a particular number of images are provided for each training session. Image batches of 16 images, 32 images, 64 images, etc. can be input for a training session. Of these batches, a percentage of the images have identified fonts (e.g., be labeled) and the images may be in pristine condition or synthetically distorted. Font identification is not provided with another percentage of the images; for example, these images may be distorted by real-world condition and the font is unknown. For this later percentage of images, the learning machine system defines its own identity of the font (e.g., via a process known as pseudo labeling). For example, 75% of the images may be provided with font identification (e.g., a pristine image or a synthetically distorted image in which the base font is known) and 25% may be images with unlabeled fonts for which the machine learning system defines a label for the represented font. Other percentages of these two types of labeled and pseudo labeled images may also be employed to increase system robustness along with improving overall decision making by the system.


System variations may also include different hardware implementations and the different uses of the system hardware. For example, multiple instances of the font machine learning system 310 may be executed through the use of a single graphical processing unit (GPU). In such an implementations, multiple system clients (each operating with one machine learning system) may be served by a single GPU. In other arrangements, multiple GPU's may be used. Similarly, under some conditions, a single instance of the machine learning system may be capable of serving multiple clients. Based upon changing conditions, multiple instances of a machine learning system may be employed to handle an increased workload from multiple clients. For example, environmental conditions (e.g., system throughput), client based conditions (e.g., number of requests received per client), hardware conditions (e.g., GPU usage, memory use, etc.) can trigger multiple instances of the system to be employed, increase the number of GPU's being used, etc. Similar to taking steps to react to an increase in processing capability, adjustments can be made when less processing is needed. For example, the number of instances of a machine learning system being used may be decreased along with the number of GPU's needed to service the clients. Other types of processors may be used in place of the GPU's or in concert with them (e.g., combinations of different types of processors). For example, central processing units (CPU's), processors developed for machine learning use (e.g., an application-specific integrated circuit (ASIC) developed for machine learning and known as a tensor processing unit (TPU)), etc. may be employed. Similar to GPU's one or more models may be provided by these other types of processors, either independently or in concert with other processors.


One or more techniques can be employed to improve the training of the font machine learning system 310. For example, one improvement that results in higher font identifying accuracy is provided by synthetically generating training images that include some amount of distortion. For example, after a training image is provided to the machine leaning system, one or more distorted versions of the image may also be provided to the system during the training cycle. For some fonts, which can be considered lighter in color or having hollow features can be used for training without being distorted. As such, any font considered has having these features can used for training without further alternating. For other types of training fonts, along with using the font unaltered version of the font, a synthetically distorted version of the font can be used for training the font machine learning system 310. Various types of distortions can be applied to the fonts; for example, compression techniques (e.g., JPEG compression) can be applied. One or more levels of shadowing can be applied to a training font sample. Manipulating an image of a training font such that shapes of the font are significantly distorted can be used to define one or more training images. Blurring can be applied to imagery to create distortions; for example, a Gaussian blur can give an overall smoothing appearance to an image of a font. Motion blurring, can also be applied in another example, in which streaking appears in the imagery to present the effect of rapid object movement. For still another feature, Gaussian noise can be applied as a type of distortion and cause the blurring of fine-scaled image edges and details. Other types of image adjustments may be applied as a type of visual distortion; for example, images may be rotated about one or more axis (e.g., about the x, y, and/or z-axis). Skewing an image in one or more manners so the underlying image appears to be misaligned in one or multiple directions (e.g., slanted) can provide another type of distortion. The aspect ratio of an image, in which the ratio of the width to the height of the image is adjusted, can provide a number of different type of images of a font to assist with training. Distortion may also be applied by filtering all or a portion of an image and using one or more filtered version of the image for system training. For example, edge detection may be performed on an image, for example, to retain or remove high spatial frequency content of an image. Other types of image processing may also be executed; perspective transformation can be employed, which is associated with converting 3D imagery into 2D imagery such that objects that are represented as being closer to the viewer appear larger than an object represented as being further from the viewer.


In some arrangements, data processing (e.g., image processing) libraries may be employed for distorting the training images. For example, some libraries may provide functions that adjust the shape, geometry, etc. of text (e.g., position the text to appear in a circular formation). Different coloring schemes may also be applied to create additional training images; for example, color substitution techniques, introducing and applying gradients to one or more colors, etc. can be executed through the use of libraries. Through the use of libraries, different types of fonts may be introduced into training imagery. For example, hollow fonts and outline fonts may be introduced to assist with training. Different attributes of font glyphs, characters, etc. may be adjust to provide distortion. For example, random stroke widths may be applied to portions (e.g., stems) characters or entire characters to introduce distortion. From the different types of distortions described above, each may be used to create a training image. To further increase the accuracy of the machine learning system, two or more of the distortion techniques may be used in concert to create additional training imagery.


Similar to using distortion to create additional training imagery, other types of content may be employed. For example, different types of background imagery may be used to create imagery that includes different text (e.g., using different fonts) in the foreground. Real world photographic background images may be used as backgrounds and distorted text (represented in one or more fonts) can be used for image creation. Text may be positioned various location in images including on image borders. In some training images, portions of text may be clipped so only a portion of the text (e.g., part of a character, word, phrase, etc.) is present. As such, different cropping schemes may be utilized for training the machine learning system. As mentioned above, for some training images, text is distorted in one manner or multiple manners. In a similar fashion, other portions of the images such as background imagery (e.g., photographic imagery) may be distorted once or in multiple instances. Further, for some examples, the distortion may take a two-step process, first an image is created that includes distorted text (and used to train the system), and then the image (e.g., background image) is distorted using one or more image processing techniques (e.g., JPEG compression, applying Gaussian noise, etc.).


To implement the font machine learning system 310, one or more machine learning techniques may be employed. For example, supervised learning techniques may be implemented in which training is based on a desired output that is known for an input. Supervised learning can be considered an attempt to map inputs to outputs and then estimate outputs for previously unseen inputs (a newly introduced input). Unsupervised learning techniques may also be employed in which training is provided from known inputs but unknown outputs. Reinforcement learning techniques may also be used in which the system can be considered as learning from consequences of actions taken (e.g., inputs values are known). In some arrangements, the implemented technique may employ two or more of these methodologies.


In some arrangements, neural network techniques may be implemented using the data representing the images (e.g., a matrix of numerical values that represent visual elements such as pixels of an image, etc.) to invoke training algorithms for automatically learning the images and related information. Such neural networks typically employ a number of layers. Once the layers and number of units for each layer is defined, weights and thresholds of the neural network are typically set to minimize the prediction error through training of the network. Such techniques for minimizing error can be considered as fitting a model (represented by the network) to training data. By using the image data (e.g., attribute vectors), a function may be defined that quantifies error (e.g., a squared error function used in regression techniques). By minimizing error, a neural network may be developed that is capable of determining attributes for an input image. One or more techniques may be employed by the machine learning system, for example, backpropagation techniques can be used to calculate the error contribution of each neuron after a batch of images is processed. Stochastic gradient descent, also known as incremental gradient descent, can be used by the machine learning system as a stochastic approximation of the gradient descent optimization and iterative method to minimize an objective function. Other factors may also be accounted for during neutral network development. For example, a model may too closely attempt to fit data (e.g., fitting a curve to the extent that the modeling of an overall function is degraded). Such overfitting of a neural network may occur during the model training and one or more techniques may be implements to reduce its effects.


One type of machine learning referred to as deep learning may be utilized in which a set of algorithms attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple non-linear transformations. Such deep learning techniques can be considered as being based on learning representations of data. In general, deep learning techniques can be considered as using a cascade of many layers of nonlinear processing units for feature extraction and transformation. The next layer uses the output from the previous layer as input. In some arrangements, a layer can look back one or multiple layers for its input. The algorithms may be supervised, unsupervised, combinations of supervised and unsupervised, etc. The techniques are based on the learning of multiple levels of features or representations of the data (e.g., image attributes). As such, multiple layers of nonlinear processing units along with supervised or unsupervised learning of representations can be employed at each layer, with the layers forming a hierarchy from low-level to high-level features. By employing such layers, a number of parameterized transformations are used as data propagates from the input layer to the output layer. In one example, the font machine learning system 310 uses one or more convolutional neural networks (CNN), which when trained can output a font classification for an input image that includes a font. Various types of CCN based systems can be used that have different number of layers; for example the font machine learning system 310 can a fifty-layer deep neutral network architecture (e.g., a ResNet50 architecture) or architectures that employ a different number of layers (e.g., ResNet150, ResNet 152, VGGNet 16, VGGNet 19, InceptionNet V3, etc.) that trained can output a font classification for an input image that includes a font.


One type of machine learning referred to as deep learning may be utilized in which a set of algorithms attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple non-linear transformations. Such deep learning techniques can be considered as being based on learning representations of data. In general, deep learning techniques can be considered as using a cascade of many layers of nonlinear processing units for feature extraction and transformation. The next layer uses the output from the previous layer as input. The algorithms may be supervised, unsupervised, combinations of supervised and unsupervised, etc. The techniques are based on the learning of multiple levels of features or representations of the data (e.g., image attributes). As such, multiple layers of nonlinear processing units along with supervised or unsupervised learning of representations can be employed at each layer, with the layers forming a hierarchy from low-level to high-level features. By employing such layers, a number of parameterized transformations are used as data propagates from the input layer to the output layer.


Other types of artificial intelligence techniques may be employed about the font identifier 204 (shown in FIG. 2 and FIG. 3). For example, the font machine learning system 310 can use neural networks such as a generative adversarial networks (GANs) in its machine learning architecture (e.g., an unsupervised machine learning architecture). In general, a GAN includes a generator neural network 512 that generates data (e.g., an augmented image such as a distorted image that includes one or more fonts input into the generator) that is evaluated by a discriminator neural network 514 for authenticity (e.g., determine if the imagery is real or synthetic). In other words, the discriminator neural network 514 attempts to determine if input imagery is synthetically created (provided by the generator 512) or real imagery (e.g., a captured image). In some arrangements, font imagery from the training data 506 is used by the generator 512 to produce augmented imagery. The discriminator 514 then evaluates the augmented image and produces an output that represents if the discriminator considers the augmented imagery to be synthetically produced or real (e.g., captured imagery). In one example, the output of the discriminator 514 produces a level that represents a probability value that ranges from 0 to 1; in which 1 represents that the discriminator considers the imagery to be real (e.g., captured imagery) and 0 which represents when the discriminator considers the input imagery to synthetically produced (by the generator 512). This output of the discriminator 514 can then be analyzed (e.g., by the font machine learning system 310 or another system) to determine if the analysis of the discriminator 514 is correct. By including these determinations in the feedback data 508, the accuracy of the font machine learning system 310 can be improved. For example, this determination information can be provided to the generator neural network 512, for example, to identify instances where the discriminator 514 had difficulties and thereby cause more augmented imagery in this area to be produced by the generator for improving operations of the discriminator. The feedback information can also be provided to the discriminator 514, thereby allowing the accuracy of the discriminator to improve through learning if its determination were correct or incorrect.


One or more metrics may be employed to determine if the generator neural network 512 has reach an improved state (e.g., an optimized state). Upon reaching this state, the generator 512 may be used to train the font machine learning system 310. For example, the generator can used to train one or more classifiers included in the font machine learning system 310. Using input 502, training data 506, etc., the generator 512 can produce a large variety of imagery (e.g., distorted images that contain one or more fonts) to increase the capability of the font machine learning system.


Various implementations for GAN generators and discriminators may be used; for example, the discriminator neural network 512 can use a convolutional neural network that categorizes input images with a binomial classifier that labels the images as genuine or not. The generator neural network 514 can use an inverse convolutional (or deconvolutional) neural network that takes a vector of random noise and upsamples the vector data to an image to augment the image.


Referring to FIG. 6, a flowchart 600 represents operations of an image selector (e.g., the font identifier 204 shown in FIG. 2 and FIG. 3) being executed by a computing device (e.g., the computer system 200). Operations of the font identifier 204 are typically executed by a single computing device; however, operations may be executed by multiple computing devices. Along with being executed at a single site, the execution of operations may be distributed among two or more locations. For example, a portion of the operations may be executed at a location remote from the location of the computer system 200, etc.


Operations of the font identifier 204 may include receiving 602 an image that includes textual content in at least one font. For example, an image may be received that is represented by a two-dimensional matrix of numerical values and each value represents a visual property (e.g., color) that can be assigned to a pixel of a display. Various file formats (e.g., “jpeg”, “.pdf”, etc.) may be employed to receive the image data. Operations of the font identifier may also include identifying 604 the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts. A portion of the training images includes text located in the foreground and being positioned over captured background imagery. For example, a collection of images including training fonts (e.g., synthetically distorted fonts, undistorted fonts, etc.) can be positioned over images that have been captured (e.g., by an image capture device such as a camera). The captured imagery may be distorted due to image capture conditions, capture equipment, etc., may be used for training a machine learning system such as the font machine learning system 310. Trained with such data, the machine learning system can efficiently identify fonts in images that are in less than pristine condition.


Referring to FIG. 7, a flowchart 700 represents operations of an image selector (e.g., the font identifier 204 shown in FIG. 2 and FIG. 3) being executed by a computing device (e.g., the computer system 200). Operations of the font identifier 204 are typically executed by a single computing device; however, operations may be executed by multiple computing devices. Along with being executed at a single site, the execution of operations may be distributed among two or more locations. For example, a portion of the operations may be executed at a location remote from the location of the computer system 200, etc.


Operations of the font identifier 204 may include receiving 702 an image that includes textual content in at least one font. For example, an image may be received that is represented by a two-dimensional matrix of numerical values and each value represents a visual property (e.g., color) that can be assigned to a pixel of a display. Various file formats (e.g., “jpeg”, “.pdf”, etc.) may be employed to receive the image data. Operations of the font identifier may also include identifying 704 the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts. A portion of the training images is produced by a generator neural network. A generator neural network of a GAN may be used to augment (e.g., distort) imagery a textual characters represented in a font. This augmented imagery may be provided to a discriminator neural network (of the GAN). Using these images, the discriminator can evaluate the augmented imagery and attempt to determine if the imagery is real (e.g., captured) or synthetic (e.g., prepared by a generator neural network). These determinations (whether correct or incorrect), can be used to improve the generator neural network (e.g., to produce augmented imagery to further test the discriminator) and improve the discriminator neural network (e.g., assist the discriminator is making correct determinations about future augmented imagery provided by the generator). The improved generator (e.g., an optimized generator) can then be used to provide imagery for training a machine learning system, for example, to identify one or more fonts in various types of images that are in less than pristine condition (e.g., capture images that are distorted).



FIG. 8 shows an example of example computing device 800 and example mobile computing device 850, which can be used to implement the techniques described herein. For example, a portion or all of the operations of font identifier 204 (shown in FIG. 2) may be executed by the computing device 800 and/or the mobile computing device 850. Computing device 800 is intended to represent various forms of digital computers, including, e.g., laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Computing device 850 is intended to represent various forms of mobile devices, including, e.g., personal digital assistants, tablet computing devices, cellular telephones, smartphones, and other similar computing devices. The components shown here, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the techniques described and/or claimed in this document.


Computing device 800 includes processor 802, memory 804, storage device 806, high-speed interface 808 connecting to memory 804 and high-speed expansion ports 810, and low speed interface 812 connecting to low speed bus 814 and storage device 806. Each of components 802, 804, 806, 808, 810, and 812, are interconnected using various busses, and can be mounted on a common motherboard or in other manners as appropriate. Processor 802 can process instructions for execution within computing device 800, including instructions stored in memory 804 or on storage device 806 to display graphical data for a GUI on an external input/output device, including, e.g., display 816 coupled to high speed interface 808. In other implementations, multiple processors and/or multiple busses can be used, as appropriate, along with multiple memories and types of memory. Also, multiple computing devices 800 can be connected, with each device providing portions of the necessary operations (e.g., as a server bank, a group of blade servers, or a multi-processor system).


Memory 804 stores data within computing device 800. In one implementation, memory 804 is a volatile memory unit or units. In another implementation, memory 804 is a non-volatile memory unit or units. Memory 804 also can be another form of computer-readable medium (e.g., a magnetic or optical disk. Memory 804 may be non-transitory.)


Storage device 806 is capable of providing mass storage for computing device 700. In one implementation, storage device 806 can be or contain a computer-readable medium (e.g., a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory or other similar solid state memory device, or an array of devices, such as devices in a storage area network or other configurations.) A computer program product can be tangibly embodied in a data carrier. The computer program product also can contain instructions that, when executed, perform one or more methods (e.g., those described above.) The data carrier is a computer- or machine-readable medium, (e.g., memory 804, storage device 806, memory on processor 802, and the like.)


High-speed controller 808 manages bandwidth-intensive operations for computing device 800, while low speed controller 812 manages lower bandwidth-intensive operations. Such allocation of functions is an example only. In one implementation, high-speed controller 808 is coupled to memory 804, display 816 (e.g., through a graphics processor or accelerator), and to high-speed expansion ports 810, which can accept various expansion cards (not shown). In the implementation, low-speed controller 812 is coupled to storage device 806 and low-speed expansion port 814. The low-speed expansion port, which can include various communication ports (e.g., USB, Bluetooth®, Ethernet, wireless Ethernet), can be coupled to one or more input/output devices, (e.g., a keyboard, a pointing device, a scanner, or a networking device including a switch or router, e.g., through a network adapter.)


Computing device 800 can be implemented in a number of different forms, as shown in the figure. For example, it can be implemented as standard server 820, or multiple times in a group of such servers. It also can be implemented as part of rack server system 824. In addition or as an alternative, it can be implemented in a personal computer (e.g., laptop computer 822.) In some examples, components from computing device 800 can be combined with other components in a mobile device (not shown), e.g., device 850. Each of such devices can contain one or more of computing device 800, 850, and an entire system can be made up of multiple computing devices 800, 850 communicating with each other.


Computing device 850 includes processor 852, memory 864, an input/output device (e.g., display 854, communication interface 866, and transceiver 868) among other components. Device 850 also can be provided with a storage device, (e.g., a microdrive or other device) to provide additional storage. Each of components 850, 852, 864, 854, 866, and 868, are interconnected using various buses, and several of the components can be mounted on a common motherboard or in other manners as appropriate.


Processor 852 can execute instructions within computing device 850, including instructions stored in memory 864. The processor can be implemented as a chipset of chips that include separate and multiple analog and digital processors. The processor can provide, for example, for coordination of the other components of device 850, e.g., control of user interfaces, applications run by device 850, and wireless communication by device 850.


Processor 852 can communicate with a user through control interface 858 and display interface 856 coupled to display 854. Display 854 can be, for example, a TFT LCD (Thin-Film-Transistor Liquid Crystal Display) or an OLED (Organic Light Emitting Diode) display, or other appropriate display technology. Display interface 856 can comprise appropriate circuitry for driving display 854 to present graphical and other data to a user. Control interface 858 can receive commands from a user and convert them for submission to processor 852. In addition, external interface 862 can communicate with processor 842, so as to enable near area communication of device 850 with other devices. External interface 862 can provide, for example, for wired communication in some implementations, or for wireless communication in other implementations, and multiple interfaces also can be used.


Memory 864 stores data within computing device 850. Memory 864 can be implemented as one or more of a computer-readable medium or media, a volatile memory unit or units, or a non-volatile memory unit or units. Expansion memory 874 also can be provided and connected to device 850 through expansion interface 872, which can include, for example, a SIMM (Single In Line Memory Module) card interface. Such expansion memory 874 can provide extra storage space for device 850, or also can store applications or other data for device 850. Specifically, expansion memory 874 can include instructions to carry out or supplement the processes described above, and can include secure data also. Thus, for example, expansion memory 874 can be provided as a security module for device 850, and can be programmed with instructions that permit secure use of device 850. In addition, secure applications can be provided through the SIMM cards, along with additional data, (e.g., placing identifying data on the SIMM card in a non-hackable manner.)


The memory can include, for example, flash memory and/or NVRAM memory, as discussed below. In one implementation, a computer program product is tangibly embodied in a data carrier. The computer program product contains instructions that, when executed, perform one or more methods, e.g., those described above. The data carrier is a computer- or machine-readable medium (e.g., memory 864, expansion memory 874, and/or memory on processor 852), which can be received, for example, over transceiver 868 or external interface 862.


Device 850 can communicate wirelessly through communication interface 866, which can include digital signal processing circuitry where necessary. Communication interface 866 can provide for communications under various modes or protocols (e.g., GSM voice calls, SMS, EMS, or MMS messaging, CDMA, TDMA, PDC, WCDMA, CDMA2000, or GPRS, among others.) Such communication can occur, for example, through radio-frequency transceiver 868. In addition, short-range communication can occur, e.g., using a Bluetooth®, WiFi, or other such transceiver (not shown). In addition, GPS (Global Positioning System) receiver module 870 can provide additional navigation- and location-related wireless data to device 850, which can be used as appropriate by applications running on device 850. Sensors and modules such as cameras, microphones, compasses, accelerators (for orientation sensing), etc. may be included in the device.


Device 850 also can communicate audibly using audio codec 860, which can receive spoken data from a user and convert it to usable digital data. Audio codec 860 can likewise generate audible sound for a user, (e.g., through a speaker in a handset of device 850.) Such sound can include sound from voice telephone calls, can include recorded sound (e.g., voice messages, music files, and the like) and also can include sound generated by applications operating on device 850.


Computing device 850 can be implemented in a number of different forms, as shown in the figure. For example, it can be implemented as cellular telephone 880. It also can be implemented as part of smartphone 882, a personal digital assistant, or other similar mobile device.


Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor. The programmable processor can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.


These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms machine-readable medium and computer-readable medium refer to a computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions.


To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a device for displaying data to the user (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor), and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be a form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in a form, including acoustic, speech, or tactile input.


The systems and techniques described here can be implemented in a computing system that includes a backend component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a frontend component (e.g., a client computer having a user interface or a Web browser through which a user can interact with an implementation of the systems and techniques described here), or a combination of such back end, middleware, or frontend components. The components of the system can be interconnected by a form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (LAN), a wide area network (WAN), and the Internet.


The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.


In some implementations, the engines described herein can be separated, combined or incorporated into a single or combined engine. The engines depicted in the figures are not intended to limit the systems described here to the software architectures shown in the figures.


A number of embodiments have been described. Nevertheless, it will be understood that various modifications can be made without departing from the spirit and scope of the processes and techniques described herein. In addition, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. In addition, other steps can be provided, or steps can be eliminated, from the described flows, and other components can be added to, or removed from, the described systems. Accordingly, other embodiments are within the scope of the following claims.

Claims
  • 1. A computing device implemented method comprising: receiving an image that includes textual content in at least one font; andidentifying the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts, wherein a portion of the training images includes synthetic text located in the foreground and being positioned over captured background imagery, and a portion of the training images is distorted when captured by at least one of image capture conditions and capture equipment, wherein the identified at least one font is represented by one element of a plurality of elements of a data vector provided by the machine learning system.
  • 2. The computing device implemented method of claim 1, wherein the text located in the foreground is synthetically augmented.
  • 3. The computing device implemented method of claim 2, wherein synthetic augmentation is provided in a two-step process.
  • 4. The computing device implemented method of claim 2, wherein the text is synthetically augmented based upon one or more predefined conditions.
  • 5. The computing device implemented method of claim 1, wherein the text located in the foreground is undistorted.
  • 6. The computing device implemented method of claim 1, wherein the captured background imagery is predominately absent text.
  • 7. The computing device implemented method of claim 1, wherein the text located in the foreground is randomly positioned in the portion of training images.
  • 8. The computing device implemented method of claim 1, wherein prior to the text being located in the foreground, a portion of the text is removed.
  • 9. The computing device implemented method of claim 1, wherein the captured background imagery is distorted when captured.
  • 10. The computing device implemented method of claim 1, wherein font similarity is used to identify the at least one font.
  • 11. The computing device implemented method of claim 1, wherein similarity of fonts in multiple image segments is used to identify the at least one font.
  • 12. The computing device implemented method of claim 1, wherein the machine learning system is trained by using transfer learning.
  • 13. The computing device implemented method of claim 1, wherein an output of the machine learning system represents each font used to train the machine learning system.
  • 14. The computing device implemented method of claim 13, wherein the output of the machine learning system provides a level of confidence for each font used to train the machine learning system.
  • 15. The computing device implemented method of claim 1, wherein a subset of the output of the machine learning system is scaled and a remainder of the output is removed.
  • 16. The computing device implemented method of claim 1, wherein some of the training images are absent identification.
  • 17. The computing device implemented method of claim 1, wherein identifying the at least one font represented in the received image using the machine learning system includes using additional images received by the machine learning system.
  • 18. The computing device implemented method of claim 17, wherein outputs of the machine learning system for the received image and the additional images are combined to identify the at least one font.
  • 19. The computing device implemented method of claim 1, wherein the machine learning system comprises a generative adversarial network (GAN).
  • 20. The computing device implemented method of claim 19, wherein the generative adversarial network (GAN) comprises a generator neural network and a discriminator neural network.
  • 21. A system comprising: a computing device comprising:a memory configured to store instructions; anda processor to execute the instructions to perform operations comprising:receiving an image that includes textual content in at least one font; andidentifying the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts, wherein a portion of the training images includes synthetic text located in the foreground and being positioned over captured background imagery, and a portion of the training images is distorted when captured by at least one of image capture conditions and capture equipment, wherein the identified at least one font is represented by one element of a plurality of elements of a data vector provided by the machine learning system.
  • 22. The system of claim 21, wherein the text located in the foreground is synthetically augmented.
  • 23. The system of claim 22, wherein synthetic augmentation is provided in a two-step process.
  • 24. The computing device implemented method of claim 22, wherein the text is synthetically augmented based upon one or more predefined conditions.
  • 25. The system of claim 21, wherein the text located in the foreground is undistorted.
  • 26. The system of claim 21, wherein the captured background imagery is predominately absent text.
  • 27. The system of claim 21, wherein the text located in the foreground is randomly positioned in the portion of training images.
  • 28. The system of claim 21, wherein prior to the text being located in the foreground, a portion of the text is removed.
  • 29. The system of claim 21, wherein the captured background imagery is distorted when captured.
  • 30. The system of claim 21, wherein font similarity is used to identify the at least one font.
  • 31. The system of claim 21, wherein similarity of fonts in multiple image segments is used to identify the at least one font.
  • 32. The system of claim 21, wherein the machine learning system is trained by using transfer learning.
  • 33. The system of claim 21, wherein an output of the machine learning system represents each font used to train the machine learning system.
  • 34. The system of claim 33, wherein the output of the machine learning system provides a level of confidence for each font used to train the machine learning system.
  • 35. The system of claim 21, wherein a subset of the output of the machine learning system is scaled and a remainder of the output is removed.
  • 36. The system of claim 21, wherein some of the training images are absent identification.
  • 37. The system of claim 21, wherein identifying the at least one font represented in the received image using the machine learning system includes using additional images received by the machine learning system.
  • 38. The system of claim 37, wherein outputs of the machine learning system for the received image and the additional images are combined to identify the at least one font.
  • 39. The system of claim 21, wherein the machine learning system comprises a generative adversarial network (GAN).
  • 40. The system of claim 39, wherein the generative adversarial network (GAN) comprises a generator neural network and a discriminator neural network.
  • 41. One or more non-transitory computer readable media storing instructions that are executable by a processing device, and upon such execution cause the processing device to perform operations comprising: receiving an image that includes textual content in at least one font; andidentifying the at least one font represented in the received image using a machine learning system, the machine learning system being trained using images representing a plurality of training fonts, wherein a portion of the training images includes synthetic text located in the foreground and being positioned over captured background imagery, and a portion of the training images is distorted when captured by at least one of image capture conditions and capture equipment, wherein the identified at least one font is represented by one element of a plurality of elements of a data vector provided by the machine learning system.
  • 42. The non-transitory computer readable media of claim 41, wherein the text located in the foreground is synthetically augmented.
  • 43. The non-transitory computer readable media of claim 42, wherein synthetic augmentation is provided in a two-step process.
  • 44. The non-transitory computer readable media of claim 42, wherein the text is synthetically augmented based upon one or more predefined conditions.
  • 45. The non-transitory computer readable media of claim 41, wherein the text located in the foreground is undistorted.
  • 46. The non-transitory computer readable media of claim 41, wherein the captured background imagery is predominately absent text.
  • 47. The non-transitory computer readable media of claim 41, wherein the text located in the foreground is randomly positioned in the portion of training images.
  • 48. The non-transitory computer readable media of claim 41, wherein prior to the text being located in the foreground, a portion of the text is removed.
  • 49. The non-transitory computer readable media of claim 41, wherein the captured background imagery is distorted when captured.
  • 50. The non-transitory computer readable media of claim 41, wherein font similarity is used to identify the at least one font.
  • 51. The non-transitory computer readable media of claim 41, wherein similarity of fonts in multiple image segments is used to identify the at least one font.
  • 52. The non-transitory computer readable media of claim 41, wherein the machine learning system is trained by using transfer learning.
  • 53. The non-transitory computer readable media of claim 41, wherein an output of the machine learning system represents each font used to train the machine learning system.
  • 54. The non-transitory computer readable media of claim 53, wherein the output of the machine learning system provides a level of confidence for each font used to train the machine learning system.
  • 55. The non-transitory computer readable media of claim 41, wherein a subset of the output of the machine learning system is scaled and a remainder of the output is removed.
  • 56. The non-transitory computer readable media of claim 41, wherein some of the training images are absent identification.
  • 57. The non-transitory computer readable media of claim 41, wherein identifying the at least one font represented in the received image using the machine learning system includes using additional images received by the machine learning system.
  • 58. The non-transitory computer readable media of claim 41, wherein outputs of the machine learning system for the received image and the additional images are combined to identify the at least one font.
  • 59. The non-transitory computer readable media of claim 41, wherein the machine learning system comprises a generative adversarial network (GAN).
  • 60. The non-transitory computer readable media of claim 59, wherein the generative adversarial network (GAN) comprises a generator neural network and a discriminator neural network.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority under 35 USC 119(e) to U.S. Provisional application No. 62/578,939, filed on Oct. 30, 2017. The entire disclosure of this application is incorporated by reference herein.

US Referenced Citations (353)
Number Name Date Kind
4244657 Wasylyk Jan 1981 A
4998210 Kadono et al. Mar 1991 A
5263132 Parker et al. Nov 1993 A
5347266 Bauman et al. Sep 1994 A
5412771 Fenwick May 1995 A
5416898 Opstad et al. May 1995 A
5444829 Kawabata et al. Aug 1995 A
5453938 Gohara et al. Sep 1995 A
5526477 McConnell et al. Jun 1996 A
5528742 Moore et al. Jun 1996 A
5533174 Flowers et al. Jul 1996 A
5586242 McQueen et al. Dec 1996 A
5606649 Tai Feb 1997 A
5619721 Maruko Apr 1997 A
5630028 DeMeo May 1997 A
5737599 Rowe et al. Apr 1998 A
5748975 Van De Vanter May 1998 A
5757384 Ikeda May 1998 A
5761395 Miyazaki et al. Jun 1998 A
5781714 Collins et al. Jul 1998 A
5877776 Beaman et al. Mar 1999 A
5926189 Beaman et al. Jul 1999 A
5940581 Lipton Aug 1999 A
5995718 Hiraike Nov 1999 A
6012071 Krishna et al. Jan 2000 A
6016142 Chang Jan 2000 A
6031549 Hayes-Roth Feb 2000 A
6044205 Reed et al. Mar 2000 A
6065008 Simon et al. May 2000 A
6073147 Chan et al. Jun 2000 A
6111654 Cartier Aug 2000 A
6141002 Kanungo et al. Oct 2000 A
6167441 Himmel Dec 2000 A
6249908 Stamm Jun 2001 B1
6252671 Peng et al. Jun 2001 B1
6282327 Betrisey Aug 2001 B1
6313920 Dresevic et al. Nov 2001 B1
6320587 Funyu Nov 2001 B1
6323864 Kaul et al. Nov 2001 B1
6330577 Kim Dec 2001 B1
6343301 Halt et al. Jan 2002 B1
6426751 Patel Jul 2002 B1
6490051 Nguyen et al. Dec 2002 B1
6512531 Gartland Jan 2003 B1
6522330 Kobayashi Feb 2003 B2
6522347 Tsuji Feb 2003 B1
6583789 Carlson et al. Jun 2003 B1
6601009 Florschuetz Jul 2003 B2
6657625 Chik et al. Dec 2003 B1
6675358 Kido Jan 2004 B1
6678688 Unruh Jan 2004 B1
6687879 Teshima Feb 2004 B1
6704116 Abulhab Mar 2004 B1
6704648 Naik et al. Mar 2004 B1
6718519 Taieb Apr 2004 B1
6738526 Betrisey May 2004 B1
6754875 Paradies Jun 2004 B1
6760029 Phinney et al. Jul 2004 B1
6771267 Muller Aug 2004 B1
6810504 Cooper et al. Oct 2004 B2
6813747 Taieb Nov 2004 B1
6853980 Ying et al. Feb 2005 B1
6856317 Konsella et al. Feb 2005 B2
6882344 Hayes et al. Apr 2005 B1
6901427 Teshima May 2005 B2
6907444 Narasimhan et al. Jun 2005 B2
6952210 Renner et al. Oct 2005 B1
6992671 Corona Jan 2006 B1
6993538 Gray Jan 2006 B2
7050079 Estrada et al. May 2006 B1
7064757 Opstad et al. Jun 2006 B1
7064758 Chik et al. Jun 2006 B2
7155672 Adler et al. Dec 2006 B1
7184046 Hawkins Feb 2007 B1
7188313 Hughes et al. Mar 2007 B2
7228501 Brown et al. Jun 2007 B2
7231602 Truelove et al. Jun 2007 B1
7346845 Teshima et al. Mar 2008 B2
7373140 Matsumoto May 2008 B1
7477988 Dorum Jan 2009 B2
7492365 Corbin et al. Feb 2009 B2
7505040 Stamm et al. Mar 2009 B2
7539939 Schomer May 2009 B1
7552008 Newstrom et al. Jun 2009 B2
7580038 Chik et al. Aug 2009 B2
7583397 Smith Sep 2009 B2
7636885 Merz et al. Dec 2009 B2
7701458 Sahuc et al. Apr 2010 B2
7735020 Chaudhri Jun 2010 B2
7752222 Cierniak Jul 2010 B1
7768513 Klassen Aug 2010 B2
7836094 Ornstein et al. Nov 2010 B2
7882432 Nishikawa et al. Feb 2011 B2
7937658 Lunde May 2011 B1
7944447 Clegg et al. May 2011 B2
7958448 Fattic et al. Jun 2011 B2
7987244 Lewis et al. Jul 2011 B1
8098250 Clegg et al. Jan 2012 B2
8116791 Agiv Feb 2012 B2
8201088 Levantovsky et al. Jun 2012 B2
8201093 Tuli Jun 2012 B2
8306356 Bever Nov 2012 B1
8381115 Tranchant et al. Feb 2013 B2
8413051 Bacus et al. Apr 2013 B2
8464318 Hallak Jun 2013 B1
8601374 Parham Dec 2013 B2
8643542 Wendel Feb 2014 B2
8643652 Kaplan Feb 2014 B2
8644810 Boyle Feb 2014 B1
8689101 Fux et al. Apr 2014 B2
8707208 DiCamillo Apr 2014 B2
8731905 Tsang May 2014 B1
9063682 Bradshaw Jun 2015 B1
9317777 Kaasila et al. Apr 2016 B2
9319444 Levantovsky Apr 2016 B2
9432671 Campanelli et al. Aug 2016 B2
9449126 Genoni Sep 2016 B1
9483445 Joshi et al. Nov 2016 B1
9569865 Kaasila et al. Feb 2017 B2
9576196 Natarajan Feb 2017 B1
9626337 Kaasila et al. Apr 2017 B2
9691169 Kaasila et al. Jun 2017 B2
9805288 Kaasila et al. Oct 2017 B2
9817615 Seguin et al. Nov 2017 B2
10007863 Pereira et al. Jun 2018 B1
10032072 Tran Jul 2018 B1
10115215 Matteson et al. Oct 2018 B2
10140261 Yang Nov 2018 B2
10157332 Gray Dec 2018 B1
10733529 Tran et al. Aug 2020 B1
10867241 Rogers et al. Dec 2020 B1
11334750 Arilla et al. May 2022 B2
11537262 Kaasila et al. Dec 2022 B1
11587342 Arilla et al. Feb 2023 B2
20010021937 Cicchitelli et al. Sep 2001 A1
20010052901 Kawabata et al. Dec 2001 A1
20020010725 Mo Jan 2002 A1
20020029232 Bobrow et al. Mar 2002 A1
20020033824 Stamm Mar 2002 A1
20020052916 Kloba et al. May 2002 A1
20020057853 Usami May 2002 A1
20020059344 Britton et al. May 2002 A1
20020087702 Mori Jul 2002 A1
20020093506 Hobson Jul 2002 A1
20020120721 Eilers et al. Aug 2002 A1
20020122594 Goldberg et al. Sep 2002 A1
20020174186 Hashimoto et al. Nov 2002 A1
20020194261 Teshima Dec 2002 A1
20030014545 Broussard et al. Jan 2003 A1
20030076350 Vu Apr 2003 A1
20030197698 Perry et al. Oct 2003 A1
20040025118 Renner Feb 2004 A1
20040088657 Brown et al. May 2004 A1
20040119714 Everett et al. Jun 2004 A1
20040177056 Davis et al. Sep 2004 A1
20040189643 Frisken et al. Sep 2004 A1
20040207627 Konsella et al. Oct 2004 A1
20040233198 Kubo Nov 2004 A1
20050015307 Simpson et al. Jan 2005 A1
20050033814 Ota Feb 2005 A1
20050094173 Engelman et al. May 2005 A1
20050111045 Imai May 2005 A1
20050128508 Greef et al. Jun 2005 A1
20050149942 Venkatraman Jul 2005 A1
20050190186 Klassen Sep 2005 A1
20050193336 Fux et al. Sep 2005 A1
20050200871 Miyata Sep 2005 A1
20050264570 Stamm Dec 2005 A1
20050270553 Kawara Dec 2005 A1
20050275656 Corbin et al. Dec 2005 A1
20060010371 Shur et al. Jan 2006 A1
20060017731 Matskewich et al. Jan 2006 A1
20060061790 Miura Mar 2006 A1
20060072136 Hodder et al. Apr 2006 A1
20060072137 Nishikawa et al. Apr 2006 A1
20060072162 Nakamura Apr 2006 A1
20060103653 Chik et al. May 2006 A1
20060103654 Chik et al. May 2006 A1
20060168639 Gan Jul 2006 A1
20060241861 Takashima Oct 2006 A1
20060245727 Nakano et al. Nov 2006 A1
20060253395 Corbell Nov 2006 A1
20060267986 Bae et al. Nov 2006 A1
20060269137 Evans Nov 2006 A1
20060285138 Merz et al. Dec 2006 A1
20070002016 Cho et al. Jan 2007 A1
20070006076 Cheng Jan 2007 A1
20070008309 Sahuc et al. Jan 2007 A1
20070024626 Kagle et al. Feb 2007 A1
20070050419 Weyl et al. Mar 2007 A1
20070055931 Zaima Mar 2007 A1
20070139412 Stamm Jun 2007 A1
20070139413 Stamm et al. Jun 2007 A1
20070159646 Adelberg et al. Jul 2007 A1
20070172199 Kobayashi Jul 2007 A1
20070211062 Engleman et al. Sep 2007 A1
20070283047 Theis et al. Dec 2007 A1
20080028304 Levantovsky et al. Jan 2008 A1
20080030502 Chapman Feb 2008 A1
20080115046 Yamaguchi May 2008 A1
20080118151 Bouguet et al. May 2008 A1
20080154911 Cheng Jun 2008 A1
20080222734 Redlich et al. Sep 2008 A1
20080243837 Davis Oct 2008 A1
20080282186 Basavaraju Nov 2008 A1
20080303822 Taylor Dec 2008 A1
20080306916 Gonzalez et al. Dec 2008 A1
20090031220 Tranchant Jan 2009 A1
20090037492 Baitalmal et al. Feb 2009 A1
20090037523 Kolke et al. Feb 2009 A1
20090063964 Huang Mar 2009 A1
20090070128 McCauley et al. Mar 2009 A1
20090097049 Cho Apr 2009 A1
20090100074 Joung et al. Apr 2009 A1
20090119678 Shih May 2009 A1
20090158134 Wang Jun 2009 A1
20090171766 Schiff et al. Jul 2009 A1
20090183069 Duggan et al. Jul 2009 A1
20090275351 Jeung et al. Nov 2009 A1
20090287998 Kalra Nov 2009 A1
20090290813 He Nov 2009 A1
20090303241 Priyadarshi et al. Dec 2009 A1
20090307585 Tranchant et al. Dec 2009 A1
20100014104 Soord Jan 2010 A1
20100066763 MacDougall Mar 2010 A1
20100088606 Kanno Apr 2010 A1
20100088694 Peng Apr 2010 A1
20100091024 Myadam Apr 2010 A1
20100115454 Tuli May 2010 A1
20100164984 Rane Jul 2010 A1
20100218086 Howell et al. Aug 2010 A1
20100231598 Hernandez et al. Sep 2010 A1
20100275161 DiCamillo Oct 2010 A1
20100321393 Levantovsky Dec 2010 A1
20110029103 Mann et al. Feb 2011 A1
20110032074 Novack et al. Feb 2011 A1
20110090229 Bacus et al. Apr 2011 A1
20110090230 Bacus et al. Apr 2011 A1
20110093565 Bacus et al. Apr 2011 A1
20110115797 Kaplan May 2011 A1
20110131153 Grim, III Jun 2011 A1
20110188761 Boutros et al. Aug 2011 A1
20110203000 Bacus et al. Aug 2011 A1
20110238495 Kang Sep 2011 A1
20110258535 Adler, III et al. Oct 2011 A1
20110271180 Lee Nov 2011 A1
20110276872 Kataria Nov 2011 A1
20110289407 Naik Nov 2011 A1
20110310432 Waki Dec 2011 A1
20120001922 Escher et al. Jan 2012 A1
20120016964 Veen et al. Jan 2012 A1
20120033874 Perronnin et al. Feb 2012 A1
20120066590 Harris et al. Mar 2012 A1
20120072978 DeLuca et al. Mar 2012 A1
20120092345 Joshi et al. Apr 2012 A1
20120102176 Lee et al. Apr 2012 A1
20120102391 Lee et al. Apr 2012 A1
20120127069 Santhiveeran et al. May 2012 A1
20120134590 Petrou May 2012 A1
20120215640 Ramer et al. Aug 2012 A1
20120269425 Marchesotti Oct 2012 A1
20120269441 Marchesotti et al. Oct 2012 A1
20120288190 Tang Nov 2012 A1
20120306852 Taylor Dec 2012 A1
20120307263 Ichikawa et al. Dec 2012 A1
20120323694 Lita et al. Dec 2012 A1
20120323971 Pasupuleti Dec 2012 A1
20130033498 Linnerud et al. Feb 2013 A1
20130067319 Olszewski et al. Mar 2013 A1
20130120396 Kaplan May 2013 A1
20130127872 Kaplan May 2013 A1
20130156302 Rodriguez Serrano et al. Jun 2013 A1
20130163027 Shustef Jun 2013 A1
20130179761 Cho Jul 2013 A1
20130185028 Sullivan Jul 2013 A1
20130215126 Roberts Aug 2013 A1
20130215133 Gould et al. Aug 2013 A1
20130321617 Lehmann Dec 2013 A1
20130326348 Ip et al. Dec 2013 A1
20140025756 Kamens Jan 2014 A1
20140047329 Levantovsky et al. Feb 2014 A1
20140052801 Zuo et al. Feb 2014 A1
20140059054 Liu et al. Feb 2014 A1
20140089348 Vollmert Mar 2014 A1
20140136957 Kaasila et al. May 2014 A1
20140153012 Seguin Jun 2014 A1
20140176563 Kaasila et al. Jun 2014 A1
20140195903 Kaasila et al. Jul 2014 A1
20140279039 Systrom et al. Sep 2014 A1
20140282055 Engel et al. Sep 2014 A1
20140358802 Biswas Dec 2014 A1
20150020212 Demaree Jan 2015 A1
20150030238 Yang et al. Jan 2015 A1
20150036919 Bourdev et al. Feb 2015 A1
20150055855 Rodriguez et al. Feb 2015 A1
20150062140 Levantovsky et al. Mar 2015 A1
20150074522 Harned, III et al. Mar 2015 A1
20150097842 Kaasila et al. Apr 2015 A1
20150100882 Severenuk Apr 2015 A1
20150146020 Imaizumi May 2015 A1
20150154002 Weinstein et al. Jun 2015 A1
20150178476 Horton Jun 2015 A1
20150193386 Wurtz Jul 2015 A1
20150220494 Qin et al. Aug 2015 A1
20150278167 Arnold et al. Oct 2015 A1
20150339273 Yang et al. Nov 2015 A1
20150339276 Bloem et al. Nov 2015 A1
20150339543 Campanelli et al. Nov 2015 A1
20150348297 Kaasila et al. Dec 2015 A1
20160078656 Borson et al. Mar 2016 A1
20160092439 Ichimi Mar 2016 A1
20160140952 Graham May 2016 A1
20160170940 Levantovsky Jun 2016 A1
20160171343 Kaasila et al. Jun 2016 A1
20160182606 Kaasila et al. Jun 2016 A1
20160246762 Eaton Aug 2016 A1
20160307156 Burner Oct 2016 A1
20160307347 Matteson et al. Oct 2016 A1
20160314377 Vieira et al. Oct 2016 A1
20160321217 Ikemoto et al. Nov 2016 A1
20160344282 Hausler Nov 2016 A1
20160344828 Hausler Nov 2016 A1
20160350336 Checka Dec 2016 A1
20160371232 Ellis et al. Dec 2016 A1
20170011279 Soldevila Jan 2017 A1
20170017778 Ford et al. Jan 2017 A1
20170024641 Wierzynski Jan 2017 A1
20170039445 Tredoux et al. Feb 2017 A1
20170098138 Wang Apr 2017 A1
20170098140 Wang et al. Apr 2017 A1
20170124503 Bastide May 2017 A1
20170237723 Gupta et al. Aug 2017 A1
20170357877 Lin Dec 2017 A1
20180039605 Pao et al. Feb 2018 A1
20180075455 Kumnick et al. Mar 2018 A1
20180082156 Jin et al. Mar 2018 A1
20180097763 Garcia et al. Apr 2018 A1
20180109368 Johnson et al. Apr 2018 A1
20180144256 Saxena et al. May 2018 A1
20180165554 Zhang Jun 2018 A1
20180203851 Wu et al. Jul 2018 A1
20180253988 Kanayama et al. Sep 2018 A1
20180285696 Eigen et al. Oct 2018 A1
20180285965 Kaasila et al. Oct 2018 A1
20180300592 Jin Oct 2018 A1
20180332140 Bullock Nov 2018 A1
20180341907 Tucker et al. Nov 2018 A1
20180349527 Li Dec 2018 A1
20180373921 Di Carlo Dec 2018 A1
20190019087 Fukui Jan 2019 A1
20190073537 Arilla et al. Mar 2019 A1
20190095763 Arilla et al. Mar 2019 A1
20200219274 Afridi et al. Jul 2020 A1
Foreign Referenced Citations (21)
Number Date Country
0949574 Oct 1999 EP
2166488 Mar 2010 EP
2857983 Apr 2015 EP
06-258982 Sep 1994 JP
H10-124030 May 1998 JP
2002-507289 Mar 2002 JP
2003-288184 Oct 2003 JP
05-215915 Aug 2005 JP
05-217816 Aug 2005 JP
2007-011733 Jan 2007 JP
2009-545064 Dec 2009 JP
5140997 Nov 2012 JP
544595 Aug 2003 TW
200511041 Mar 2005 TW
WO 9423379 Oct 1994 WO
WO 9900747 Jan 1999 WO
WO 0191088 Nov 2001 WO
WO 03023614 Mar 2003 WO
WO 04012099 Feb 2004 WO
WO 05001675 Jan 2005 WO
WO 2008013720 Jan 2008 WO
Non-Patent Literature Citations (55)
Entry
Chen et al., “Detecting and reading text in natural scenes,” Proceedings of the 2004 IEEE Computer Society Conference Vision and Pattern Recognition; Publication [online]. 2004 [retrieved Dec. 16, 2018]. Retrieved from the Internet: <URL:http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.334.2715&rep=rep1&type=pdf>; pp. 1-8.
International Search Report & Written Opinion in International Application No. PCT/US18/58191, dated Feb. 19, 2019, 17 pages.
Koren et al., “Visualization of labeled data using linear transformations.” IEEE Symposium on Information Visualization, 2003 (IEEE Cat. No. 03TH8714).
Liu, “Visual Exploration and Comparative Analytics of Multidimensional Data Sets”, Graduate Program in Computer Science and Engineering, The Ohio State University, 2016, 210 pages.
Shusen, et al. “Distortion-Guided Structure-Driven Interactive Exploration of High-Dimensional Data,” Computer Graphics Forum., 2014, 33(3):101-110.
Wu et al., “Stochastic neighbor projection on manifold for feature extraction.” Neurocomputing, 2011,74(17):780-2789.
“A first experiment with multicoloured web fonts,” Manufactura Independente website, Feb. 28, 2011, Retrieved from the internet: http://blog.manufacturaindependente.org/2011/02/a-first-experiment-with-multicoloured-web-fonts/.
“Announcing Speakeasy: A new open-source language tool from Typekit,” Oct. 28, 2010, on-line http://blog.typekit.com/2010/10/28/announcing-speakeasy-a-new-open-source-language-tool-from-typekit/.
“colorfont/v1,” Feb. 28, 2011, retrieved from the internet: http://manufacturaindependente.com/colorfont/v1/.
“Flash CS4 Professional ActionScript 2.0”, 2007, retrieved on http://help.adobe.com/en_US/AS2LCR/Flash_10.0/help.html?content=00000284.html on Aug. 31, 2015.
“photofont.com—Use photofonts,” Sep. 2, 2012, retrieved from the internet: http://web.archive.org/web/20120902021143/http://photofont.com/photofont/use/web.
“Saffron Type System”, retrieved from the internet Nov. 12, 2014, 7 pages.
Adobe Systems Incorporated, “PostScript Language Reference—Third Edition,” Feb. 1999, pp. 313-390.
Adobe Systems Incorporated, “The Type 42 Font Format Specification,” Technical Note #5012, Jul. 31, 1998, pp. 1-24.
Adobe Systems Incorporated, “To Unicode Mapping File Tutorial,” Adobe Technical Note, XP002348387, May 2003.
Apple Computers, “The True Type Font File,” Oct. 27, 2000, pp. 1-17.
Celik et al., “W3C, CSS3 Module: Fonts,” W3C Working Draft, Jul. 31, 2001, pp. 1-30.
Doughty, Mike, “Using OpenType® Fonts with Adobe® InDesign®,” Jun. 11, 2012 retrieved from the internet: http://webarchive.org/web/20121223032924/http://www.sketchpad.net/opentype-indesign.htm (retrieved Sep. 22, 2014), 2 pages.
European Search Report, 13179728.4, dated Sep. 10, 2015, 3 pages.
European Search Report, 14184499.3, dated Jul. 13, 2015, 7 pages.
European Search Report, 14187549.2, dated Jul. 30, 2015, 7 pages.
European Search Report in European Application No. 18193233.6, dated Nov. 11, 2018, 8 pages.
European Search Report in European Application No. 18197313.2, dated Nov. 30, 2018, 7 pages.
Extensis, Suitcase 10.2, Quick Start Guide for Macintosh, 2001, 23 pgs.
Font Pair, [online]. “Font Pair”, Jan. 20, 2015, Retrieved from URL: http://web.archive.org/web/20150120231122/http://fontpair.co/, 31 pages.
Forums.macrumors.com' [online]. “which one is the main FONTS folder ?” May 19, 2006, [retrieved on Jun. 19, 2017]. Retrieved from the Internet: URL<http://forums.macrumors.com/threads/which-one-is-the-main-fontsfolder.202284/>, 7 pages.
George Margulis, “Optical Character Recognition: Classification of Handwritten Digits and Computer Fonts”, Aug. 1, 2014, URL: https://web.archive.org/web/20140801114017/http://cs229.stanford.edu/proj2011/Margulis-OpticalCharacterRecognition.pdf.
Goswami, Gautum, “Quite ‘Writly’ Said!,” One Brick at a Time, Aug. 21, 2006, Retrieved from the internet: :http://gautamg.wordpress.com/2006/08/21/quj.te-writely-said/ (retrieved on Sep. 23, 2013), 3 pages.
International Preliminary Report on Patentability issued in PCT application No. PCT/US2013/071519 dated Jun. 9, 2015, 8 pages.
International Preliminary Report on Patentability issued in PCT application No. PCT/US2015/066145 dated Jun. 20, 2017, 7 pages.
International Preliminary Report on Patentability issued in PCT application No. PCT/US2016/023282, dated Oct. 26, 2017, 9 pages.
International Search Report & Written Opinion issued in PCT application No. PCT/US10/01272, dated Jun. 15, 2010, 6 pages.
International Search Report & Written Opinion issued in PCT application No. PCT/US2011/034050 dated Jul. 15, 2011, 13 pages.
International Search Report & Written Opinion, PCT/US2013/026051, dated Jun. 5, 2013, 9 pages.
International Search Report & Written Opinion, PCT/US2013/071519, dated Mar. 5, 2014, 12 pages.
International Search Report & Written Opinion, PCT/US2013/076917, dated Jul. 9, 2014, 11 pages.
International Search Report & Written Opinion, PCT/US2014/010786, dated Sep. 30, 2014, 9 pages.
International Search Report & Written Opinion, PCT/US2016/023282, dated Oct. 7, 2016, 16 pages.
Japanese Office Action, 2009-521768, dated Aug. 28, 2012.
Japanese Office Action, 2013-508184, dated Apr. 1, 2015.
Ma Wei-Ying et al., “Framework for adaptive content delivery in heterogeneous network environments”, Jan. 24, 2000, Retrieved from the Internet: http://www.cooltown.hp.com/papers/adcon/MMCN2000.
Open Text Exceed, User's Guide, Version 14, Nov. 2009, 372 pgs.
Saurabh, Kataria et al., “Font retrieval on a large scale: An experimental study”, 2010 17th IEEE International Conference on Image Processing (ICIP 2010); Sep. 26-29, 2010; Hong Kong, China, IEEE, Piscataway, NJ, USA, Sep. 26, 2010, pp. 2177-2180.
Supplementary European Search Report, European Patent Office, European patent application No. EP 07796924, dated Dec. 27, 2010, 8 pages.
TrueType Fundamentals, Microsoft Typography, Nov. 1997, pp. 1-17.
Typeconnection, [online]. “typeconnection”, Feb. 26, 2015, Retrieved from URL: http://web.archive.org/web/20150226074717/http://www.typeconnection.com/stepl.php, 4 pages.
Universal Type Server, Upgrading from Suitcase Server, Sep. 29, 2009, 18 pgs.
Wenzel, Martin, “An Introduction to OpenType Substitution Features,” Dec. 26, 2012, Retrieved from the internet: http://web.archive.org/web/20121226233317/http://ilovetypography.com/OpenType/opentype-features. Html (retrieved on Sep. 18, 2014), 12 pages.
Written Opposition to the grant of Japanese Patent No. 6097214 by Masatake Fujii, dated Sep. 12, 2017, 97 pages, with partial English translation.
International Preliminary Report on Patentability in International Appln. No. PCT/US2018/058191, dated May 5, 2020, 10 pages.
Rannanathan et al. “A Novel Technique for English Font Recognition Using Support Vector Machines,” 2009 International Conference on Advances in Recent Technologies in Communication and Computing, 2009, pp. 766-769.
Rannanathan et al., “Tamil Font Recognition Using Gabor Filters and Support Vector Machines,” 2009 International Conference on Advances in Computing, Control and Telecommunication Technologies, 2009, pp. 613-615.
O'Donovan et al., “Exploratory Font Selection Using Crowdsourced Attributes,” ACT TOG, Jul. 2014, 33(4): 9 pages.
Wikipedia.com [online], “Feature selection,” Wikipedia, Sep. 19, 2017, retrieved on Oct. 19, 2021, retrieved from URL<https://en.wikipedia.org/w/index.php?title=Feature selection&oldid=801416585>, 15 pages.
www.dgp.toronto.edu [online], “Supplemental Material: Exploratory Font Selection Using Crowdsourced Attributes,” available on or before May 12, 2014, via Internet Archive: Wayback Machine URL <https://web.archive.org/web/20140512101752/http://www.dgp.toronto.edu/˜donovan/font/supplemental.pdf>, retrieved on Jun. 28, 2021, URL<http://www.dgp.toronto.edu/˜donovan/font/supplemental.pdf>, 9 pages.
Related Publications (1)
Number Date Country
20190130232 A1 May 2019 US
Provisional Applications (1)
Number Date Country
62578939 Oct 2017 US