I. Field of the Invention
This disclosure relates generally to apparatus and methods for augmented reality and other computer vision application, and more particularly to integration of camera auto-focus with computer vision-based recognition and tracking.
II. Background
Augmented reality systems use natural features as reference points within a sequence of images to place computer generated icons and images. A natural feature processing engine, including a natural feature detection module and a natural feature tracking module, is used to find and follow these reference points. Mobile devices may be enhanced with such augmented reality engines. Many mobile devices also have cameras with auto-focus capabilities provided by an auto-focus engine. Both natural feature and auto-focus engines track changes from image to image, however, known systems fail to allow communication between these engines.
In augmented reality, tracking that accurately follows the tracked object's movement and position creates a significantly improved user experience. Consequently, much effort is put into improving tracking performance. Object tracking functionally in a processor operates separately from auto-focus functionality at the front end of a camera. Auto-focus functionality is typically performed in hardware or with hardware acceleration. Auto-focus operations may result in information useful for improving natural feature detection and/or tracking. Similarly, natural feature detection and tracking may result in information useful for improving auto-focus functionality.
Many existing mobile devices 10 contain a camera and a processor. The camera provides images to the processor, which may modify the image by various augmented reality techniques. The processor may send a control signal trigger to camera activation and the camera provides the image or sequence of images to the processor for image processing in response. No information obtained from natural feature processing is returned to the camera to assist in obtaining an improved image. That is, control information beyond triggering does not flow from the processor to the camera.
In other existing mobile devices 10, image processing associated with natural feature detection and tracking is disassociated with image processing associated with auto-focusing.
In general, operations in the natural feature detection module 120 and the natural feature tracking module 125 function in parallel, however, for a particular natural feature, these operations appear to occur in sequence where a natural feature is first detected within an image then tracked through subsequent images. The location of the natural feature within the image is used by a separate processing for augmented reality module 130. Each image undergoes processing through the natural feature detection module 120 to detect new natural features and also undergoes processing through the natural feature tracking module 125 to follow the movement of already detected natural features from image to image.
As shown at delineation 400, the auto-focus engine 300 has no communication with the natural feature processing engine 110 and may run as a parallel task. The auto-focus engine 300 may be implemented in hardware or may be implemented in a combination of hardware and software. The auto-focus engine 300 operates in real-time or near real-time to capture new images. Thus, a continued need exists to improve both natural feature processing as well as auto focusing.
Disclosed is an apparatus and method for coupling a natural feature processing engine with an auto-focus engine.
According to some aspects, disclosed is a mobile device for use in computer vision, the mobile device comprising: a natural feature processing engine comprising a natural feature detection module and a natural feature tracking module; and an auto-focus engine coupled to the natural feature processing engine to communicate information to set a location of a window comprising at least one of a natural feature window and/or an auto-focus window.
According to some aspects, disclosed is a method in a mobile device for use in computer vision, the method comprising: selecting an auto-focus window within an image; auto-focusing on the selected window; communicating a location of the auto-focus window; limiting an area of a natural feature detection based on the location of the auto-focus window; and finding a natural feature within the limited area.
According to some aspects, disclosed is a method in a mobile device for use in computer vision, the method comprising: setting a first auto-focus window within a first image; setting a second auto-focus window within a second image; communicating a change from the first auto-focus window to the second auto-focus window; setting a next tracking search window based on the change; and tracking a natural within the next tracking search window.
According to some aspects, disclosed is a method in a mobile device for use in computer vision, the method comprising: tracking a natural feature to a first location within a first image; tracking the natural feature to a second location within a second image; communicating a change from the first location to the second location; setting a next auto-focus window based on the change; and auto-focusing within the auto-focus window.
According to some aspects, disclosed is a mobile device for use in computer vision, the mobile device comprising: a camera and an auto-focus engine; and a processor and memory comprising code for performing the methods described above.
According to some aspects, disclosed is a mobile device for use in computer vision, the mobile device comprising means for performing the methods described above.
According to some aspects, disclosed is a nonvolatile computer-readable storage medium including program code stored thereon, comprising program code for performing the methods described above.
It is understood that other aspects will become readily apparent to those skilled in the art from the following detailed description, wherein it is shown and described various aspects by way of illustration. The drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
Embodiments of the invention will be described, by way of example only, with reference to the drawings.
The detailed description set forth below in connection with the appended drawings is intended as a description of various aspects of the present disclosure and is not intended to represent the only aspects in which the present disclosure may be practiced. Each aspect described in this disclosure is provided merely as an example or illustration of the present disclosure, and should not necessarily be construed as preferred or advantageous over other aspects. The detailed description includes specific details for the purpose of providing a thorough understanding of the present disclosure. However, it will be apparent to those skilled in the art that the present disclosure may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the present disclosure. Acronyms and other descriptive terminology may be used merely for convenience and clarity and are not intended to limit the scope of the disclosure.
Position determination techniques described herein may be implemented in conjunction with various wireless communication networks such as a wireless wide area network (WWAN), a wireless local area network (WLAN), a wireless personal area network (WPAN), and so on. The term “network” and “system” are often used interchangeably. A WWAN may be a Code Division Multiple Access (CDMA) network, a Time Division Multiple Access (TDMA) network, a Frequency Division Multiple Access (FDMA) network, an Orthogonal Frequency Division Multiple Access (OFDMA) network, a Single-Carrier Frequency Division Multiple Access (SC-FDMA) network, Long Term Evolution (LTE), and so on. A CDMA network may implement one or more radio access technologies (RATs) such as cdma2000, Wideband-CDMA (W-CDMA), and so on. Cdma2000 includes IS-95, IS-2000, and IS-856 standards. A TDMA network may implement Global System for Mobile Communications (GSM), Digital Advanced Mobile Phone System (D-AMPS), or some other RAT. GSM and W-CDMA are described in documents from a consortium named “3rd Generation Partnership Project” (3GPP). Cdma2000 is described in documents from a consortium named “3rd Generation Partnership Project 2” (3GPP2). 3GPP and 3GPP2 documents are publicly available. A WLAN may be an IEEE 802.11x network, and a WPAN may be a Bluetooth network, an IEEE 802.15x, or some other type of network. The techniques may also be implemented in conjunction with any combination of WWAN, WLAN and/or WPAN.
A satellite positioning system (SPS) typically includes a system of transmitters positioned to enable entities to determine their location on or above the Earth based, at least in part, on signals received from the transmitters. Such a transmitter typically transmits a signal marked with a repeating pseudo-random noise (PN) code of a set number of chips and may be located on ground based control stations, user equipment and/or space vehicles. In a particular example, such transmitters may be located on Earth orbiting satellite vehicles (SVs). For example, a SV in a constellation of Global Navigation Satellite System (GNSS) such as Global Positioning System (GPS), Galileo, GLONASS or Compass may transmit a signal marked with a PN code that is distinguishable from PN codes transmitted by other SVs in the constellation (e.g., using different PN codes for each satellite as in GPS or using the same code on different frequencies as in GLONASS). In accordance with certain aspects, the techniques presented herein are not restricted to global systems (e.g., GNSS) for SPS. For example, the techniques provided herein may be applied to or otherwise enabled for use in various regional systems, such as, e.g., Quasi-Zenith Satellite System (QZSS) over Japan, Indian Regional Navigational Satellite System (IRNSS) over India, Beidou over China, etc., and/or various augmentation systems (e.g., an Satellite Based Augmentation System (SBAS)) that may be associated with or otherwise enabled for use with one or more global and/or regional navigation satellite systems. By way of example but not limitation, an SBAS may include an augmentation system(s) that provides integrity information, differential corrections, etc., such as, e.g., Wide Area Augmentation System (WAAS), European Geostationary Navigation Overlay Service (EGNOS), Multi-functional Satellite Augmentation System (MSAS), GPS Aided Geo Augmented Navigation or GPS and Geo Augmented Navigation system (GAGAN), and/or the like. Thus, as used herein an SPS may include any combination of one or more global and/or regional navigation satellite systems and/or augmentation systems, and SPS signals may include SPS, SPS-like, and/or other signals associated with such one or more SPS.
As used herein, a mobile device 100, sometimes referred to as a mobile station (MS) or user equipment (UE), such as a cellular phone, mobile phone or other wireless communication device, personal communication system (PCS) device, personal navigation device (PND), Personal Information Manager (PIM), Personal Digital Assistant (PDA), laptop or other suitable mobile device which is capable of receiving wireless communication and/or navigation signals. The term “mobile station” is also intended to include devices which communicate with a personal navigation device (PND), such as by short-range wireless, infrared, wireline connection, or other connection—regardless of whether satellite signal reception, assistance data reception, and/or position-related processing occurs at the device or at the PND. Also, mobile station 100 is intended to include all devices, including wireless communication devices, computers, laptops, etc. which are capable of communication with a server, such as via the Internet, WiFi, or other network, and regardless of whether satellite signal reception, assistance data reception, and/or position-related processing occurs at the device, at a server, or at another device associated with the network. Any operable combination of the above are also considered a “mobile station.”
Unlike existing mobile devices 10, a mobile device 100 in accordance with the present invention allows communication between the auto-focus engine 300 and the natural feature processing engine 110, as described below. Similar to existing mobile devices 10, the mobile device 100 contains memory, one or more processors, which function as a natural feature processing engine 110 and an auto-focus engine 300, and user interface, such as a display, speaker, touch screen and/or buttons. The natural feature processing engine 110, also referred to as computer vision-based recognition and tracking, includes a natural feature detection module 120 and a natural feature tracking module 125.
Processing speed directly correlates to the size of the natural feature detection window covers; smaller windows covering only a small area each are able to be processed more quickly. Other pixel dimensions are also possible for the natural feature detection window. For example, rather than using an 8×8 square grid, tracking may use other square or non-square fixed-dimension grid sizes (e.g., 4×4, 10×10 or 16×16) or variable-dimensions grid sizes (e.g., where the size depend on characteristics of the natural feature). Tracking will examine the same location defined by the 8-by-8 grid in a second image. If the correlation results in a high result, no movement has occurred between images and as expected the pixel location of the natural feature is expected to be in the same location on the second image. If the camera is moving linearly and/or rotating, or if objects in the image are moving relative to the mobile device 100, then the natural features will have appeared to move from the first image to the second image as shown in the following figure. In this case, a high correlation result will occur at the new location in the second image if the natural feature detection window encompasses the natural feature.
A natural feature or a group of natural features often appear to move from a previous location on one image to a next location on the next image as described above.
Cameras in mobile devices 100 often contain an auto-focus engine 300, which fix focusing based on a detected object. The auto-focus engine 300 may operate on a continuous analog image or may operate on a digital image to focus on an area of the image defined by an auto-focus window 310. From image to image, the auto-focus window 310 may appear to move in the sequence of images. In this sense, the auto-focus engine 300 appears to track an object within the sequence of images.
According to some embodiments of the present invention, a mobile device 100 integrates a camera's auto-focus engine 300 with natural feature processing engine 110 performing computer vision-based recognition and tracking. The auto-focus engine 300 and a natural feature processing engine 110 are allowed to communicate information such as a position or change in position of auto-focus window 310 and/or natural features. The auto-focus engine 300 may use information from the natural feature processing engine 110 to better position its auto-focus window 310 (i.e., a location of a box within the image). Similarly, the natural feature processing engine 110 may use information from the auto-focus engine 300 to better position correlation windows for finding a new position of a natural feature. Alternatively, natural feature processing engine 110 disregards this information from the auto-focus engine 300.
Such found objects may contain one or several natural features that the natural feature tracking module 125 is following. When searching for objects, the auto-focus engine 300 may advantageously use locations within an image as determined by the natural feature processing engine 110 to limit the search area from the entire image to an area in proximity to the detected and tracked natural features.
As shown at 410, some embodiments allow the auto-focus engine 300 to send information to the natural feature processing engine 110 that indicates the current size and/or location of an auto-focus window within an image, as described below with reference to
As shown at 420, some embodiments allow the auto-focus engine 300 to send information to the natural feature processing engine 110 that indicates a change in size and/or a change in location from previous auto-focus window to next auto-focus window, as described below with reference to
As shown at 430, some embodiments allow the natural feature processing engine 110 to send information to the auto-focus engine 300 that indicates a change from a previous location of natural feature and/or a natural feature detection window (e.g., 270 of
Embodiments include at least one or more of 410, 420 and/or 430 as information communicated between the auto-focus engine 300 and the natural feature processing engine 110. For example, some embodiments communicate only one of 410, 420 and 430: (1) a first embodiment communicates 410 but not 420 or 430; (2) a second embodiment communicates 410 but not 410 or 430; and (3) a third embodiment communicates 430 but not 410 or 420. Additional examples communicate two of 410, 420 and 430: (4) a fourth embodiment communicates both 410 and 420 but not 430; (5) a fifth embodiment communicates both 420 and 430 but not 410; and (6) a sixth embodiment communicates both 410 and 430 but not 430. Finally, further examples communicate all three: (7) a seventh embodiments communicates 410, 420 and 430. Therefore, when an embodiment communicates information between the auto-focus engine and the natural feature processing engine, some embodiments communicate just one of 410, 420 or 430, other embodiments communicate two of 410, 420 and 430, while still other embodiments communicate all three of 410, 420 and 430.
This communicated information is used to set a location of a natural feature window and/or an auto-focus window. For example, some embodiments only communicate information shown at 410 to limit the area of the next natural feature windows. Other embodiments only communicate information shown at 420 to change a center location of the next natural feature windows. Still other embodiments only communicate information shown at 430 to change a location of the next auto-focus window(s). As stated above, some embodiments implement two of 410, 420 and 430 as information communicated between the auto-focus engine 300 coupled and the natural feature processing engine 110, while other embodiments implement all three of 410, 420 and 430 as information communicated between the auto-focus engine 300 coupled and the natural feature processing engine 110. In some embodiments, the auto-focus engine 300 acts as a slave and the natural feature processing engine 110 acts as its master.
The natural feature processing engine 110 acts as a means for detecting and tracking natural features in the image with a natural feature processing engine. The natural feature detection module 120 acts as a means for detecting natural features. The natural feature tracking module 125 acts as a means for tracking natural features. A processor or processors may act as a means for performing each of the functions of the natural feature processing engine 110, such as selecting the auto-focus window within the image, limiting an area of a natural feature detection based on the location of the auto-focus window, finding a natural feature within the limited area, setting a next tracking search window based on a change, tracking a natural within the next tracking search window, tracking a natural feature to the first location within a first image, and/or tracking the natural feature to the second location within a second image.
The auto-focus engine 300 acts as a means for auto-focusing in an auto-focus window in an image. A processor or processors may act as a means for performing each of the functions of the auto-focus engine 300, such as setting a first auto-focus window within a first image, setting a second auto-focus window within a second image, setting a next auto-focus window based on the change, and auto-focusing within the auto-focus window.
These processor(s), engines and modules, separately or in combination, may act as means for communicating information between the auto-focus engine and the natural feature processing engine. The information may include a location of the auto-focus window, a change, a change from a first location to a second location, a change in location from a previous to a next auto-focus window, and/or a change from a previous to a next location of a natural feature.
The above embodiments are described with relationship to mobile devices implementing augmented reality functionality tracking natural features. In general, these methods and apparatus are equally applicable to other application that uses computer vision related technologies and may benefit from the teachings herein. For example, embodiments above may have the function of tracking natural features replaced or augmented with marker tracking and/or hand tracking. Embodiments may track and focus on a man-made marker (rather than a natural feature) such as a posted QR code (quick response code). Alternatively, embodiments may track and focus on a moving hand (rather than a fixed natural feature or man-made marker), for example, in order to capture gesture commands from a user. These embodiments may provide gesturing interfaces with or without augmented reality functionality.
The methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in hardware, firmware, software, or any combination thereof For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof
For a firmware and/or software implementation, the methodologies may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. Any machine-readable medium tangibly embodying instructions may be used in implementing the methodologies described herein. For example, software codes may be stored in a memory and executed by a processor unit. Memory may be implemented within the processor unit or external to the processor unit. As used herein the term “memory” refers to any type of long term, short term, volatile, nonvolatile, or other memory and is not to be limited to any particular type of memory or number of memories, or type of media upon which memory is stored.
If implemented in firmware and/or software, the functions may be stored as one or more instructions or code on a computer-readable medium. Examples include computer-readable media encoded with a data structure and computer-readable media encoded with a computer program. Computer-readable media includes physical computer storage media. A storage medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer; disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
In addition to storage on computer readable medium, instructions and/or data may be provided as signals on transmission media included in a communication apparatus. For example, a communication apparatus may include a transceiver having signals indicative of instructions and data. The instructions and data are configured to cause one or more processors to implement the functions outlined in the claims. That is, the communication apparatus includes transmission media with signals indicative of information to perform disclosed functions. At a first time, the transmission media included in the communication apparatus may include a first portion of the information to perform the disclosed functions, while at a second time the transmission media included in the communication apparatus may include a second portion of the information to perform the disclosed functions.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the spirit or scope of the disclosure.
Not Applicable.