Embodiments described herein generally relate to data processing and in particular to methods and apparatus for using optical character recognition to provide augmented reality.
A data processing system may include features which allow the user of the data processing system to capture and display video. After video has been captured, video editing software may be used to alter the contents of the video, for instance by superimposing a title. Furthermore, recent developments have led to the emergence of a field known as augmented reality (AR). As explained by the “Augmented reality” entry in the online encyclopedia provided under the “WIKIPEDIA” trademark, AR “is a live, direct or indirect, view of a physical, real-world environment whose elements are augmented by computer-generated sensory input such as sound, video, graphics or GPS data.” Typically, with AR, video is modified in real time. For instance, when a television (TV) station is broadcasting live video of an American football game, the TV station may use a data processing system to modify the video in real time. For example, the data processing system may superimpose a yellow line across the football field to show how far the offensive team must move the ball to earn a first down.
In addition, some companies are working on technology that allows AR to be used on a more personal level. For instance, some companies are developing technology to enable a smart phone to provide AR, based on video captured by the smart phone. This type of AR may be considered an example of mobile AR. The mobile AR world consists largely of two different types of experiences: geolocation-based AR and vision-based AR. Geolocation-based AR uses global positioning system (GPS) sensors, compass sensors, cameras, and/or other sensors in the user's mobile device to provide a “heads-up” display with AR content that depicts various geolocated points of interest. Vision-based AR may use some the same kinds of sensors to display AR content in context with real-world objects (e.g., magazines, postcards, product packaging) by tracking the visual features of these objects. AR content may also be referred to as digital content, computer-generated content, virtual content, virtual objects, etc.
However, it is unlikely that vision-based AR will become ubiquitous before many associated challenges are overcome.
Typically, before a data processing system can provide vision-based AR, the data processing system must detect something in the video scene that, in effect, tells the data processing system that the current video scene is suitable for AR. For instance, if the intended AR experience involves adding a particular virtual object to a video scene whenever the scene includes a particular physical object or image, the system must first detect the physical object or image in the video scene. The first object may be referred to as an “AR-recognizable image” or simply as an “AR marker” or an “AR target.”
One of the challenges in the field of vision-based AR is that it is still relatively difficult for developers to create images or objects that are suitable as AR targets. An effective AR target contains a high level of visual complexity and asymmetry. And if the AR system is to support more than one AR target, each AR target must be sufficiently distinct from all of the other AR targets. Many images or objects that might at first seem usable as AR targets actually lack one or more of the above characteristics.
Furthermore, as an AR application supports greater numbers of different AR targets, the image recognizing portion of the AR application may require greater amounts of processing resources (e.g., memory and processor cycles) and/or the AR application may take more time to recognize images. Thus, scalability can be a problem.
As indicated above, an AR system may use an AR target to determine that a corresponding AR object should be added to a video scene. If the AR system can be made to recognize many different AR targets, the AR system can be made to provide many different AR objects. However, as indicated above, it is not easy for developers to create suitable AR targets. In addition, with conventional AR technology, it could be necessary to create many different unique targets to provide a sufficiently useful AR experience.
Some of the challenges associated with creating numerous different AR targets may be illustrated in the context of a hypothetical application that uses AR to provide information to people using a public bus system. The operator of the bus system may want to place unique AR targets on hundreds of bus stop signs, and the operator may want an AR application to use AR to notify riders at each bus stop when the next bus is expected to arrive at that stop. In addition, the operator may want the AR targets to serve as a recognizable mark to the riders, more or less like a trademark. In other words, the operator may want the AR targets to have a recognizable look that is common to all the AR targets for that operator while also being easily distinguished by the human viewer from marks, logos, or designs used by other entities.
According to the present disclosure, instead of requiring a different AR target for each different AR object, the AR system may associate an optical character recognition (OCR) zone with an AR target, and the system may use OCR to extract text from the OCR zone. According to one embodiment, the system uses the AR target and results from the OCR to determine an AR object to be added to the video. Further details about OCR may be found on the website for Quest Visual, Inc. at questvisual.com/us/, with regard to the application known as Word Lens. Further details about AR may be found on the website for the ARToolKit software library at www.hit1.washington.edu/artoolkit/documentation.
As used herein, the terms “processing system” and “data processing system” are intended to broadly encompass a single machine, or a system of communicatively coupled machines or devices operating together. For instance, two or more machines may cooperate using one or more variations on a peer-to-peer model, a client/server model, or a cloud computing model to provide some or all of the functionality described herein. In the embodiment of
For ease of reference, the local processing device 21 may be referred to as “the mobile device,” “the personal device,” “the AR client,” or simply “the consumer.” Similarly, the remote processing device 12 may be referred to as “the AR broker,” the remote processing device 16 may be referred to as “the AR target creator,” and the remote processing device 18 may be referred to as “the AR content provider.” As described in greater detail below, the AR broker may help the AR target creator, the AR content provider, and the AR browser to cooperate. The AR browser, the AR broker, the AR content provider, and the AR target creator may be referred to collectively as the AR system. Further details about AR brokers, AR browsers, and other components of one of more AR systems may be found on the website of the Layar company at www.layar.com and/or on the website of metaio GmbH/metaio Inc. (“the metaio company”) at www.metaio.com.
In the embodiment of
The data storage contains an operating system (OS) 40 and an AR browser 42. The AR browser may be an application that enables the mobile device to provide an AR experience for the user. The AR browser may be implemented as an application that is designed to provide AR services for only a single AR content provider, or the AR browser may be capable of providing AR services for multiple AR content providers. The mobile device may copy some or all of the OS and some or all of the AR browser to RAM for execution, particularly when using the AR browser to provide AR. In addition, the data storage includes an AR database 44, some or all of which may also be copied to RAM to facilitate operation of the AR browser. The AR browser may use the display panel to display a video image 25 and/or other output. The display panel may also be touch sensitive, in which case the display panel may also be used for input.
The processing devices for the AR broker, the AR mark creator, and the AR content provider may include features like those described above with regard to the mobile device. In addition, as described in greater detail below, the AR broker may contain an AR broker application 50 and a broker database 51, the AR target creator (TC) may contain a TC application 52 and a TC database 53, and the AR content provider (CP) may contain a CP application 54 and a CP database 55. The AR database 44 in the mobile computer may also be referred to as a client database 44.
As described in greater detail below, in addition to creating an AR target, an AR target creator may define one or more OCR zones and one or more AR content zones, relative to the AR target. For purposes of this disclosure, an OCR zone is an area or space within a video scene from which text is to be extracted, and an AR content zone is an area or space within a video scene where AR content is to be presented. An AR content zone may also be referred to simply as an AR zone. In one embodiment, the AR target creator defines the AR zone or zones. In another embodiment, the AR content provider defines the AR zone or zones. As described in greater detail below, a coordinate system may be used to define an AR zone relative to an AR target.
Also, as shown at block 252, the AR broker application may assign a label or identifier (ID) to the target, to facilitate future reference. The AR broker may then return the vision data and the target ID to the AR target creator.
As shown at block 212, the AR target creator may then define the AR coordinate system for the AR target, and the AR target creator may use that coordinate system to specify the bounds of an OCR zone, relative to the AR target. In other words, the AR target creator may define boundaries for an area expected to contain text that can be recognized using OCR, and the results of the OCR can be used to distinguish between different instances of the target. In one embodiment, the AR target creator specifies the OCR zone with regard to a model video frame that models or simulates a head-on view of the AR target. The OCR zone constitutes an area within a video frame from which text is to be extracted using OCR. Thus, the AR target may serve as a high-level classifier for identifying the relevant AR content, and text from the OCR zone may serve as a low-level classifier for identifying the relevant AR content. The embodiment of
The AR target creator may specify the bounds of the OCR zone relative to the location of the target or particular features of the target. For instance, for the target shown in
As shown at block 254, the AR broker may then send the target ID, the vision data, the OCR zone definition, and the ARCS to the CP application.
The AR content provider may then use the CP application to specify one or more zones within the scene where AR content should be added, as shown at block 214. In other words, the CP application may be used to define an AR zone, such as the AR zone 86 of
The AR broker may save the target ID, the vision data, the OCR zone definition, the AR zone definition, and the ARCS in the broker database, as shown at block 256. The target ID, the zone definitions, the vision data, the ARCS, and any other predefined data for an AR target may be referred to as the AR configuration data for that target. The TC application and the CP application may also save some or all of the AR configuration data in TC database and the CP database, respectively.
In one embodiment, the target creator uses the TC application to create the target image and the OCR zone or zones in the context of a model video frame configured as if the camera pose is oriented head on to the target. Likewise, the CP application may define the AR zone or zones in the context of a model video frame configured as if the camera pose is oriented head on to the target. The vision data may allow the AR browser to detect the target even if the live scene received by the AR browser does not have the camera pose oriented head on to the target.
As shown at block 220, after one or more AR targets have been created, a person or “consumer” may then use the AR browser to subscribe to AR services from the AR broker. In response, the AR broker may automatically send the AR configuration data to the AR browser, as shown at block 260. The AR browser may then save that configuration data in the client database, as shown at block 222. If the consumer is only registering for access to AR from a single content provider, the AR broker may send only configuration data for that content provider to the AR browser application. Alternatively, the registration may not be limited to a single content provider, and the AR broker may send AR configuration data for multiple content providers to the AR browser, to be saved in the client database.
In addition, as shown at block 230, the content provider may create AR content. And as shown at block 232, the content provider may link that content with a particular AR target and particular text associated with that target. In particular, the text may correspond to the results to be obtained when OCR is performed on the OCR zone associated with that target. The content provider may send the target ID, the text, and the corresponding AR content to the AR broker. The AR broker may save that data in the broker database, as shown at block 270. In addition or alternatively, as described in greater detail below, the content provider may provide AR content dynamically, after the AR browser has detected a target and contacted the AR content provider, possibly via the AR broker.
As indicated at blocks 320 and 350, the AR browser may then send the target ID and the OCR results to the AR broker. For example, referring again to
As shown at block 352, the AR broker application may then use the target ID and the OCR results to retrieve corresponding AR content. If the corresponding AR content has already been provided to the AR broker by the content provider, the AR broker application may simply send that content to the AR browser. Alternatively, the AR broker application may dynamically retrieve the AR content from the content provider in response to receiving the target ID and the OCR results from the AR browser.
Although
Referring again to
In
An advantage of one embodiment is that the disclosed technology makes it easier for content providers to deliver different AR content for different situations. For example, if the AR content provider is the operator of a bus system, the content provider may be able to provide different AR content for each different bus stop without using a different AR target for each bus stop. Instead, the content provider can use a single AR target along with text (e.g., a bus stop number) positioned within a predetermined zone relative to the target. Consequently, the AR target may serve as a high-level classifier, the text may serve as a low level classifier, and both levels of classifiers may be used to determine the AR content to be provided in any particular situation. For instance, the AR target may indicate that, as a high-level category, the relevant AR content for a particular scene is content from a particular content provider. The text in the OCR zone may indicate that, as a low level category, the AR content for the scene is AR content relevant to a particular location. Thus, the AR target may identify a high-level category of AR content, and the text on the OCR zone may identify a low-level category of AR content. And it may be very easy for the content provider to create new low-level classifiers, to provide customized AR content for new situations or locations (e.g., in case more bus stops are added to the system).
Since the AR browser uses both the AR target (or the target ID) and the OCR results (e.g., some or all of the text from the OCR zone) to obtain AR content, the AR target (or target ID) and the OCR results may be referred to collectively as a multi-level AR content trigger.
Another advantage is that an AR target may also be suitable for use as a trademark for the content provider, and the text on the OCR zone may also be legible to, and useful for, the customers of the content provider.
In one embodiment, the content provider or target creator may define multiple OCR zones for each AR target. This set of OCR zones may enable the use of signs with different shapes and/or different arrangements of content, for instance. For example, the target creator may define a first OCR zone located to the right of an AR target, and a second OCR zone located below the AR target. Accordingly, when an AR browser detects an AR target, the AR browser may then automatically perform OCR on multiple zones, and the AR browser may send some or all of those OCR results to the AR broker, to be used to retrieve AR content. Also, the AR coordinate system enables the content provider to provide whatever content in whatever media and position relative to the AR Target is appropriate.
In light of the principles and example embodiments described and illustrated herein, it will be recognized that the illustrated embodiments can be modified in arrangement and detail without departing from such principles. For instance, some of the paragraphs above refer to vision-based AR. However, the teachings herein may also be used to advantage with other types of AR experiences. For instance, the present teaching may be used with so-called Simultaneous Location And Mapping (SLAM) AR, and the AR marker may be a three-dimensional physical object, rather than a two-dimensional image. For example, a distinctive doorway or figure (e.g., a bust of Mickey Mouse or Isaac Newton) may be used as a three-dimensional AR target. Further information about SLAM AR may be found in the article about the metaio company at http://techcrunch.com/2012/10/18/metaios-new-sdk-allows-slam-mapping-from-1000-feet/.
Also, some of the paragraphs above refer to an AR browser and an AR broker that are relatively independent from the AR content provider. However, in other embodiments, the AR browser may communicate directly with the AR content provider. For example, the AR content provider may supply the mobile device with a custom AR application, and that application may serve as the AR browser. Then, that AR browser may send target IDs, OCR text, etc., directly to the content provider, and the content provider may send AR content directly to the AR browser. Further details on custom AR applications may be found on the website of the Total Immersion company at www.t-immersion.com.
Also, some of the paragraphs above refer to an AR target that is suitable for use as a trademark or logo, since the AR target makes a meaningful impression in a human viewer and the AR target is easily recognizable to the human viewer and easily distinguished by the human viewer from other images or symbols. However, other embodiments may use other types of AR targets, including without limitation fiduciary markers such as those described at www.artoolworks.com/support/library/Using_ARToolKit_NFT_with_fiducial_markers_(versio n—3.x). Such fiduciary markers may also be referred to “fiducials” or “AR tags.”
Also, the foregoing discussion has focused on particular embodiments, but other configurations are contemplated. Also, even though expressions such as “an embodiment,” “one embodiment,” “another embodiment,” or the like are used herein, these phrases are meant to generally reference embodiment possibilities, and are not intended to limit the invention to particular embodiment configurations. As used herein, these phrases may reference the same embodiment or different embodiments, and those embodiments are combinable into other embodiments.
Any suitable operating environment and programming language (or combination of operating environments and programming languages) may be used to implement components described herein. As indicated above, the present teachings may be used to advantage in many different kinds of data processing systems. Example data processing systems include, without limitation, distributed computing systems, supercomputers, high-performance computing systems, computing clusters, mainframe computers, mini-computers, client-server systems, personal computers (PCs), workstations, servers, portable computers, laptop computers, tablet computers, personal digital assistants (PDAs), telephones, handheld devices, entertainment devices such as audio devices, video devices, audio/video devices (e.g., televisions and set top boxes), vehicular processing systems, and other devices for processing or transmitting information. Accordingly, unless explicitly specified otherwise or required by the context, references to any particular type of data processing system (e.g., a mobile device) should be understood as encompassing other types of data processing systems, as well. Also, unless expressly specified otherwise, components that are described as being coupled to each other, in communication with each other, responsive to each other, or the like need not be in continuous communication with each other and need not be directly coupled to each other. Likewise, when one component is described as receiving data from or sending data to another component, that data may be sent or received through one or more intermediate components, unless expressly specified otherwise. In addition, some components of the data processing system may be implemented as adapter cards with interfaces (e.g., a connector) for communicating with a bus. Alternatively, devices or components may be implemented as embedded controllers, using components such as programmable or non-programmable logic devices or arrays, application-specific integrated circuits (ASICs), embedded computers, smart cards, and the like. For purposes of this disclosure, the term “bus” includes pathways that may be shared by more than two devices, as well as point-to-point pathways.
This disclosure may refer to instructions, functions, procedures, data structures, application programs, configuration settings, and other kinds of data. As described above, when the data is accessed by a machine, the machine may respond by performing tasks, defining abstract data types or low-level hardware contexts, and/or performing other operations. For instance, data storage, RAM, and/or flash memory may include various sets of instructions which, when executed, perform various operations. Such sets of instructions may be referred to in general as software. In addition, the term “program” may be used in general to cover a broad range of software constructs, including applications, routines, modules, drivers, subprograms, processes, and other types of software components. Also, applications and/or other data that are described above as residing on a particular device in one example embodiment may, in other embodiments, reside on one or more other devices. And computing operations that are described above as being performed on one particular device in one example embodiment may, in other embodiments, be executed by one or more other devices.
It should also be understood that the hardware and software components depicted herein represent functional elements that are reasonably self-contained so that each can be designed, constructed, or updated substantially independently of the others. In alternative embodiments, many of the components may be implemented as hardware, software, or combinations of hardware and software for providing the functionality described and illustrated herein. For example, alternative embodiments include machine accessible media encoding instructions or control logic for performing the operations of the invention. Such embodiments may also be referred to as program products. Such machine accessible media may include, without limitation, tangible storage media such as magnetic disks, optical disks, RAM, ROM, etc. For purposes of this disclosure, the term “ROM” may be used in general to refer to non-volatile memory devices such as erasable programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash ROM, flash memory, etc. In some embodiments, some or all of the control logic for implementing the described operations may be implemented in hardware logic (e.g., as part of an integrated circuit chip, a programmable gate array (PGA), an ASIC, etc.). In at least one embodiment, the instructions for all components may be stored in one non-transitory machine accessible medium. In at least one other embodiment, two or more non-transitory machine accessible media may be used for storing the instructions for the components. For instance, instructions for one component may be stored in one medium, and instructions another component may be stored in another medium. Alternatively, a portion of the instructions for one component may be stored in one medium, and the rest of the instructions for that component (as well instructions for other components), may be stored in one or more other media. Instructions may also be used in a distributed environment, and may be stored locally and/or remotely for access by single or multi-processor machines.
Also, although one or more example processes have been described with regard to particular operations performed in a particular sequence, numerous modifications could be applied to those processes to derive numerous alternative embodiments of the present invention. For example, alternative embodiments may include processes that use fewer than all of the disclosed operations, process that use additional operations, and processes in which the individual operations disclosed herein are combined, subdivided, rearranged, or otherwise altered.
In view of the wide variety of useful permutations that may be readily derived from the example embodiments described herein, this detailed description is intended to be illustrative only, and should not be taken as limiting the scope of coverage.
The following examples pertain to further embodiments.
Example A1 is an automated method for using OCR to provide AR. The method includes automatically determining, based on video of a scene, whether the scene includes a predetermined AR target. In response to determining that the scene includes the AR target, an OCR zone definition associated with the AR target is automatically retrieved. The OCR zone definition identifies an OCR zone. In response to retrieving the OCR zone definition associated with the AR target, OCR is automatically used to extract text from the OCR zone. Results of the OCR are used to obtain AR content which corresponds to the text extracted from the OCR zone. The AR content which corresponds to the text extracted from the OCR zone is automatically caused to be presented in conjunction with the scene.
Example A2 includes the features of Example A1, and the OCR zone definition identifies at least one feature of the OCR zone relative to at least one feature of the AR target.
Example A3 includes the features of Example A1, and the operation of automatically retrieving an OCR zone definition associated with the AR target comprises using a target identifier for the AR target to retrieve the OCR zone definition from a local storage medium. Example A3 may also include the features of Example A2.
Example A4 includes the features of Example A1, and the operation of using results of the OCR to determine AR content which corresponds to the text extracted from the OCR zone comprises (a) sending a target identifier for the AR target and at least some of the text from the OCR zone to a remote processing system; and (b) after sending the target identifier and at least some of the text from the OCR zone to the remote processing system, receiving the AR content from the remote processing system. Example A4 may also include the features of Example A2 or Example A3, or the features of Example A2 and Example A3.
Example A5 includes the features of Example A1, and the operation of using results of the OCR to determine AR content which corresponds to the text extracted from the OCR zone comprises (a) sending OCR information to the remote processing system, wherein the OCR information corresponds to the text extracted from the OCR zone; and (b) after sending the OCR information to the remote processing system, receiving the AR content from the remote processing system. Example A5 may also include the features of Example A2 or Example A3, or the features of Example A2 and Example A3.
Example A6 includes the features of Example A1, and the AR target serves as a high-level classifier. Also, at least some of the text from the OCR zone serves as a low-level classifier. Example A6 may also include (a) the features of Example A2, A3, A4, or A5; (b) the features of any two or more of Examples A2, A3, and A4; or (c) the features of any two or more of Examples A2, A3, and A5.
Example A7 includes the features of Example A6, and the high-level classifier identifies the AR content provider.
Example A8 includes the features of Example A1, and the AR target is two dimensional. Example A8 may also include (a) the features of Example A2, A3, A4, A5, A6, or A7; (b) the features of any two or more of Examples A2, A3, A4, A6, and A7; or (c) the features of any two or more of Examples A2, A3, A5, A6, and A7.
Example B1 is a method for implementing a multi-level trigger for AR content. That method involves selecting an AR target to serve as a high-level classifier for identifying relevant AR content. In addition an OCR zone for the selected AR target is specified. The OCR zone constitutes an area within a video frame from which text is to be extracted using OCR. Text from the OCR zone is to serve as a low-level classifier for identifying relevant AR content.
Example B2 includes the features of Example B1, and the operation of specifying an OCR zone for the selected AR target comprises specifying at least one feature of the OCR zone, relative to at least one feature of the AR target.
Example C1 is a method for processing a multi-level trigger for AR content. That method involves receiving a target identifier from an AR client. The target identifier identifies a predefined AR target as having been detected in a video scene by the AR client. In addition, text is received from the AR client, wherein the text corresponds to results from OCR performed by the AR client on an OCR zone associated with the predefined AR target in the video scene. AR content is obtained, based on the target identifier and the text from the AR client. The AR content is sent to the AR client.
Example C2 includes the features of Example C1, and the operation of obtaining AR content, based on the target identifier and the text from the AR client, comprises dynamically generating the AR content, based at least in part on the text from the AR client.
Example C3 includes the features of Example C1, and the operation of obtaining AR content, based on the target identifier and the text from the AR client, comprises automatically retrieving the AR content from a remote processing system.
Example C4 includes the features of Example C1, and the text received from the AR client comprises at least some of the results from the OCR performed by the AR client. Example C4 may also include the features of Example C2 or Example C3.
Example D1 is at least one machine accessible medium comprising computer instructions for supporting AR enhanced with OCR. The computer instructions, in response to being executed on a data processing system, enable the data processing system to perform a method according to any of Examples A1-A7, B1-B2, and C1-C4.
Example E1 is a data processing system that supports AR enhanced with OCR. The data processing system includes a processing element, at least one machine accessible medium responsive to the processing element, and computer instructions stored at least partially in the at least one machine accessible medium. In response to being executed, the computer instructions enable the data processing system to perform a method according to any of Examples A1-A7, B1-B2, and C1-C4.
Example F1 is a data processing system that supports AR enhanced with OCR. The data processing system includes means for performing a method according to any of Examples A1-A7, B1-B2, and C1-C4.
Example G1 is at least one machine accessible medium comprising computer instructions for supporting AR enhanced with OCR. The computer instructions, in response to being executed on a data processing system, enable the data processing system to automatically determine, based on video of a scene, whether the scene includes a predetermined AR target. The computer instructions also enable the data processing system to automatically retrieve an OCR zone definition associated with the AR target, in response to determining that the scene includes the AR target. The OCR zone definition identifies an OCR zone. The computer instructions also enable the data processing system to automatically use OCR to extract text from the OCR zone, in response to retrieving the OCR zone definition associated with the AR target. The computer instructions also enable the data processing system to use results of the OCR to obtain AR content which corresponds to the text extracted from the OCR zone. The computer instructions also enable the data processing system to automatically cause the AR content which corresponds to the text extracted from the OCR zone to be presented in conjunction with the scene.
Example G2 includes the features of Example G1, and the OCR zone definition identifies at least one feature of the OCR zone relative to at least one feature of the AR target.
Example G3 includes the features of Example G1, and the operation of automatically retrieving an OCR zone definition associated with the AR target comprises using a target identifier for the AR target to retrieve the OCR zone definition from a local storage medium. Example G3 may also include the features of Example G2.
Example G4 includes the features of Example G1, and the operation of using results of the OCR to determine AR content which corresponds to the text extracted from the OCR zone comprises (a) sending a target identifier for the AR target and at least some of the text from the OCR zone to a remote processing system; and (b) after sending the target identifier and at least some of the text from the OCR zone to the remote processing system, receiving the AR content from the remote processing system. Example G4 may also include the features of Example G2 or Example G3, or the features of Example G2 and Example G3.
Example G5 includes the features of Example G1, and the operation of using results of the OCR to determine AR content which corresponds to the text extracted from the OCR zone comprises (a) sending OCR information to the remote processing system, wherein the OCR information corresponds to the text extracted from the OCR zone; and (b) after sending the OCR information to the remote processing system, receiving the AR content from the remote processing system. Example G5 may also include the features of Example G2 or Example G3, or the features of Example G2 and Example G3.
Example G6 includes the features of Example G1, and the AR target serves as a high-level classifier. Also, at least some of the text from the OCR zone serves as a low-level classifier. Example G6 may also include (a) the features of Example G2, G3, G4, or G5; (b) the features of any two or more of Examples G2, G3, and G4; or (c) the features of any two or more of Examples G2, G3, and G5.
Example G7 includes the features of Example G6, and the high-level classifier identifies the AR content provider.
Example G8 includes the features of Example G1, and the AR target is two dimensional. Example G8 may also include (a) the features of Example G2, G3, G4, G5, G6, or G7; (b) the features of any two or more of Examples G2, G3, G4, G6, and G7; or (c) the features of any two or more of Examples G2, G3, G5, G6, and G7.
Example H1 is at least one machine accessible medium comprising computer instructions for implementing a multi-level trigger for AR content. The computer instructions, in response to being executed on a data processing system, enable the data processing system to select an AR target to serve as a high-level classifier for identifying relevant AR content. The computer instructions also enable the data processing system to specify an OCR zone for the selected AR target, wherein the OCR zone constitutes an area within a video frame from which text is to be extracted using OCR, and wherein text from the OCR zone is to serve as a low-level classifier for identifying relevant AR content.
Example H2 includes the features of Example H1, and the operation of specifying an OCR zone for the selected AR target comprises specifying at least one feature of the OCR zone, relative to at least one feature of the AR target.
Example I1 is at least one machine accessible medium comprising computer instructions for implementing a multi-level trigger for AR content. The computer instructions, in response to being executed on a data processing system, enable the data processing system to receive a target identifier from an AR client. The target identifier identifies a predefined AR target as having been detected in a video scene by the AR client. The computer instructions also enable the data processing system to receive text from the AR client, wherein the text corresponds to results from OCR performed by the AR client on an OCR zone associated with the predefined AR target in the video scene. The computer instructions also enable the data processing system to obtain AR content, based on the target identifier and the text from the AR client, and to send the AR content to the AR client.
Example I2 includes the features of Example I1, and the operation of obtaining AR content, based on the target identifier and the text from the AR client, comprises dynamically generating the AR content, based at least in part on the text from the AR client.
Example I3 includes the features of Example I1, and the operation of obtaining AR content, based on the target identifier and the text from the AR client, comprises automatically retrieving the AR content from a remote processing system.
Example I4 includes the features of Example I1, and the text received from the AR client comprises at least some of the results from the OCR performed by the AR client. Example I4 may also include the features of Example I2 or Example I3.
Example J1 is a data processing system that includes a processing element, at least one machine accessible medium responsive to the processing element, and an AR browser stored at least partially in the at least one machine accessible medium. In addition, an AR database is stored at least partially in the at least one machine accessible medium. The AR database contains an AR target identifier associated with an AR target and an OCR zone definition associated with the AR target. The OCR zone definition identifies an OCR zone. The AR browser is operable to automatically determine, based on video of a scene, whether the scene includes the AR target. The AR browser is also operable to automatically retrieve the OCR zone definition associated with the AR target, in response to determining that the scene includes the AR target. The AR browser is also operable to automatically use OCR to extract text from the OCR zone, in response to retrieving the OCR zone definition associated with the AR target. The AR browser is also operable to use results of the OCR to obtain AR content which corresponds to the text extracted from the OCR zone. The AR browser is also operable to automatically cause the AR content which corresponds to the text extracted from the OCR zone to be presented in conjunction with the scene.
Example J2 includes the features of Example J1, and the OCR zone definition identifies at least one feature of the OCR zone relative to at least one feature of the AR target.
Example J3 includes the features of Example J1, and the AR browser is operable to use a target identifier for the AR target to retrieve the OCR zone definition from a local storage medium. Example J3 may also include the features of Example J2.
Example J4 includes the features of Example J1, and the operation of using results of the OCR to determine AR content which corresponds to the text extracted from the OCR zone comprises (a) sending a target identifier for the AR target and at least some of the text from the OCR zone to a remote processing system; and (b) after sending the target identifier and at least some of the text from the OCR zone to the remote processing system, receiving the AR content from the remote processing system. Example J4 may also include the features of Example J2 or Example J3, or the features of Example J2 and Example J3.
Example J5 includes the features of Example J1, and the operation of using results of the OCR to determine AR content which corresponds to the text extracted from the OCR zone comprises (a) sending OCR information to the remote processing system, wherein the OCR information corresponds to the text extracted from the OCR zone; and (b) after sending the OCR information to the remote processing system, receiving the AR content from the remote processing system. Example J5 may also include the features of Example J2 or Example J3, or the features of Example J2 and Example J3.
Example J6 includes the features of Example J1, and the AR browser is operable to use the AR target as a high-level classifier and to use at least some of the text from the OCR zone as a low-level classifier. Example J6 may also include (a) the features of Example J2, J3, J4, or J5; (b) the features of any two or more of Examples J2, J3, and J4; or (c) the features of any two or more of Examples J2, J3, and J5.
Example J7 includes the features of Example J6, and the high-level classifier identifies the AR content provider.
Example J8 includes the features of Example J1, and the AR target is two dimensional. Example J8 may also include (a) the features of Example J2, J3, J4, J5, J6, or J7; (b) the features of any two or more of Examples J2, J3, J4, J6, and J7; or (c) the features of any two or more of Examples J2, J3, J5, J6, and J7.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US13/29427 | 3/6/2013 | WO | 00 | 6/14/2013 |