This disclosure relates generally to input systems suitable for use with electronic devices, including display devices. More specifically, this disclosure relates to input systems capable of recognizing surface and air gestures and fingertips.
Projected capacitive (PCT) is currently the most widely used touch technology in mobile displays with high image clarity and input accuracy. However, PCT has challenges of scaling up, due to limitations of power consumption, response time and production cost. In addition, this technology generally requires users to touch the screen to make the system responsive. Camera-based gesture recognition technology has advanced in recent years with efforts to create more natural user interfaces that go beyond touch screens for smartphones and tablets. However, gesture recognition technology has not become mainstream in mobile devices due to the constraints of power, performance, cost and usability challenges including fast response, recognition accuracy and robustness with respect to noise. Further, cameras have a limited field of view with dead zones near the screen. As a result, camera-based gesture recognition performance deteriorates as gestures get closer to the screen.
The systems, methods and devices of the disclosure each have several innovative aspects, no single one of which is solely responsible for the desirable attributes disclosed herein.
One innovative aspect of the subject matter described in this disclosure can be implemented in an apparatus including an interface for a user of an electronic device, the interface having a front surface including a detection area; a plurality of detectors configured to detect interaction of an object with the device at or above the detection area and to output signals indicating the interaction such that an image can be generated from the signals; and a processor configured to: obtain image data from the signals, apply a linear regression model to the image data to obtain a first reconstructed depth map, and apply a trained non-linear regression model to the first reconstructed depth map to obtain a second reconstructed depth map. In some implementations, the first reconstructed depth map has a higher resolution than that of the image.
In some implementations, the apparatus may include one or more light-emitting sources configured to emit light. The plurality of detectors can be light detectors such that the signals indicate interaction of the object with light emitted from the one or more light-emitting sources. In some implementations, the apparatus may include a planar light guide disposed substantially parallel to the front surface of the interface, the planar light guide including: a first light-turning arrangement configured to output reflected light, in a direction having a substantial component orthogonal to the front surface, by reflecting emitted light received from one or more light-emitting sources; and a second light-turning arrangement that redirects light resulting from the interaction toward the plurality of detectors.
The second reconstructed depth map may have a resolution at least three times greater than the resolution of the image. In some implementations, the second reconstructed depth map has the same resolution as the first reconstructed depth map. The processor may be configured to recognize, from the second reconstructed depth map, an instance of a user gesture. In some implementations, the interface is an interactive display and the processor is configured to control one or both of the interactive display and the electronic device, responsive to the user gesture. Various implementations of the apparatus disclosed herein do not include a time-of-flight depth camera.
In some implementations, obtaining image data can include vectorization of the image. In some implementations, obtaining a first reconstructed depth map includes applying a learned weight matrix to vectorized image data to obtain a first reconstructed depth map matrix. In some implementations, applying a non-linear regression model to the first reconstructed depth map includes extracting a multi-pixel patch feature for each pixel of the first reconstructed depth map to determine a depth map value for each pixel.
In some implementations, the object is a hand. In such implementations, the processor may be configured to apply a trained classification model to the second reconstructed depth map to determine locations of fingertips of the hand. The locations may include translation and depth location information. In some implementations, the object can be a stylus.
Another innovative aspect of the subject matter described in this disclosure can be implemented in an apparatus including an interface for a user of an electronic device having a front surface including a detection area; a plurality of detectors configured to receive signals indicating interaction of an object with the device at or above the detection area, wherein an image can be generated from the signals; and a processor configured to: obtain image data from the signals, obtain a first reconstructed depth map from the image data, wherein the first reconstructed depth map has a higher resolution than the image, and apply a trained non-linear regression model to the first reconstructed depth map to obtain a second reconstructed depth map.
Another innovative aspect of the subject matter described in this disclosure can be implemented in a method including obtaining image data from a plurality of detectors arranged along a periphery of a detection area of a device, the image data indicating an interaction of an object with the device at or above the detection area; obtaining a first reconstructed depth map from the image data; and obtaining a second reconstructed depth map from the first reconstructed depth map. The first reconstructed depth map may have a higher resolution than the image data obtained from the plurality of detectors.
In some implementations, obtaining the first reconstructed depth map includes applying a learned weight matrix to vectorized image data. The method can further include learning the weight matrix. Learning the weight matrix can include obtaining training set data of pairs of high resolution depth maps and low resolution images for multiple object gestures and positions. In some implementations, obtaining a second reconstructed depth map includes applying a non-linear regression model to the first reconstructed depth map. Applying a non-linear regression model to the first reconstructed depth map may include extracting a multi-pixel patch feature for each pixel of the first reconstructed depth map to determine a depth map value for each pixel.
In some implementations, the object may be a hand. The method can further include applying a trained classification model to the second reconstructed depth map to determine locations of fingertips of the hand. Such locations may include translation and depth location information.
Details of one or more implementations of the subject matter described in this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings, and the claims. Note that the relative dimensions of the following figures may not be drawn to scale.
Like reference numbers and designations in the various drawings indicate like elements.
The following description is directed to certain implementations for the purposes of describing the innovative aspects of this disclosure. However, a person having ordinary skill in the art will readily recognize that the teachings herein can be applied in a multitude of different ways. The described implementations may be implemented in any device, apparatus, or system utilizing a touch input interface (including in devices that utilize touch input for purposes other than touch input for a display). In addition, it is contemplated that the described implementations may be included in or associated with a variety of electronic devices such as, but not limited to: mobile telephones, multimedia Internet enabled cellular telephones, mobile television receivers, wireless devices, smartphones, Bluetooth® devices, personal data assistants (PDAs), wireless electronic mail receivers, hand-held or portable computers, netbooks, notebooks, smartbooks, tablets, printers, copiers, scanners, facsimile devices, global positioning system (GPS) receivers/navigators, cameras, digital media players (such as MP3 players), camcorders, game consoles, wrist watches, clocks, calculators, television monitors, flat panel displays, electronic reading devices (e.g., e-readers), computer monitors, auto displays (including odometer and speedometer displays, etc.), cockpit controls and/or displays, camera view displays (such as the display of a rear view camera in a vehicle), electronic photographs, electronic billboards or signs, projectors, architectural structures, microwaves, refrigerators, stereo systems, cassette recorders or players, DVD players, CD players, VCRs, radios, portable memory chips, washers, dryers, washer/dryers, parking meters, and aesthetic structures (such as display of images on a piece of jewelry or clothing. Thus, the teachings are not intended to be limited to the implementations depicted solely in the Figures, but instead have wide applicability as will be readily apparent to one having ordinary skill in the art.
Implementations described herein relate to apparatuses, such as touch input devices, that are configured to sense objects at or above an interface of the device. The apparatuses include detectors configured to detect interaction of an object with the device at or above the detection area and output signals indicating the interaction. The apparatuses can include a processor configured to obtain low resolution image data from the signals and, from the low resolution image data, obtain an accurate high resolution reconstructed depth map. In some implementations, objects such as fingertips may be identified. The processor may be further configured to recognize instances of user gestures from the high resolution depth maps and object identification.
Particular implementations of the subject matter described in this disclosure can be implemented to realize one or more of the following potential advantages. In some implementations, depth map information of user interactions can be obtained by an electronic device without incorporating bulky and expensive hardware into the device. Depth maps having high accuracy may be generated, facilitating multiple fingertip detection and gesture recognition. Accurate fingertip or other object detection can be performed with low power consumption. In some implementations, the apparatuses can detect fingertips or gestures at or over any part of a detection area including in areas that are inaccessible to alternative gesture recognition technologies. For example, the apparatuses can detect gestures in areas that are dead zones for camera-based gesture recognition technologies due to the conical view of cameras. Further, implementations of the subject matter described in this disclosure may detect fingertips or gestures at the surface of an electronic device as well as above the electronic device.
The mobile electronic device 1 may be configured for both surface (touch) and air (non-contact) gesture recognition. An area 5 (which represents a volume) in the example of
The apparatus and methods disclosed herein can have, for example, z-direction recognition distance or depth of up to about 20-40 cm or even greater from the surface (of, for example, an interactive display of a mobile electronic device), depending on the sensor system employed and depending upon the feature being recognized or tracked. For example, for fingertip detection and tracking (for fingertip-based gestures), z-direction recognition distances or depths of up to about 10-15 cm or even greater are possible. For detection and tracking of the entire palm or hand, for example for a hand-swipe gesture, z-direction recognition distances or depths of up to 30 cm or even greater are possible. As described above with reference to
It should be noted however, that the apparatus and methods may be employed with sensor systems having any z-direction capabilities, including for example, PCT systems. Further, implementations may be employed with surface-only sensor systems.
The apparatus and methods disclosed herein use low resolution image data. The low resolution image data is not limited to any particular sensor data but may include image data generated from photodiodes, phototransistors, charge coupled device (CCD) arrays, complementary metal oxide semiconductor (CMOS) arrays or other suitable devices operable to output a signal representative of a characteristic of detected visible, infrared (IR) and/or ultraviolet (UV) light. Further, the low resolution image data may be generated from non-light sensors including capacitance sensing mechanisms in some implementations. In some implementations, the sensor system includes a planar detection area having sensors along one or more edges of the detection area. Examples of such systems are described below with respect to FIGS. 2A-2D and 3.
It should be noted that the low resolution image data from which depth maps may be reconstructed are not depth map image data. While some depth information may be implicit in the data (e.g., signal intensity may correlate with distance from the surface), the low resolution image data does not include distance information itself. As such, the methods disclosed herein are distinct from various methods in which depth map data (for example, an initial depth map generated from a monocular image) is improved on using techniques such as bilateral filtering. Further, in some implementations, the resolution of the low resolution image data may be considerably lower than that a bilateral filtering technique may use. Such a technique may employ an image having a resolution of at least 100×100, for example. While the methods and apparatus disclosed herein can be implemented to obtain a reconstructed depth map from a 100×100 or higher resolution image, in some implementations, low resolution image data used in the apparatus and methods described herein may be less than 50×50 or even less than 30×30.
The resolution of the image obtained may depend on the size and aspect ratio of the device. For example, for a device having an aspect ratio of about 1.8, the resolution of a low resolution image may be less than 100×100, less than 100×55, less than 60×33, or less than 40×22, in some implementations.
Resolution may also be characterized in terms of pitch, i.e., the center-to-center distance between pixels, with a larger pitch corresponding to a smaller resolution. For example, for a device such as a mobile phone having dimensions of a 111 mm×51 mm, a pitch of 3 mm corresponds to a resolution of 37×17. An appropriate pitch may be selected based on the size of an object to be recognized. For example, for finger recognition, a pitch of 5 mm may be appropriate. A pitch of 3 mm, 1 mm, 0.5 mm or less may be appropriate for detection of a stylus, for example.
It will be understood that the methods and apparatus disclosed herein may be implemented using low resolution data having higher resolutions and smaller pitches than described above. For example, devices having larger screens may have resolutions of 200×200 or greater. For any resolution or pitch, the methods and apparatus disclosed herein may be implemented to obtain higher resolution reconstructed depth maps.
In the illustrated implementation, two light sensors 33 are provided; however, more light sensors may be provided in other implementations as discussed further below with reference to
In the illustrated implementation, the light sensors 33 are disposed at the periphery of the light guide 35. However, alternative configurations are within the contemplation of the present disclosure. For example, the light sensors 33 may be remote from the light guide 35, in which case light detected by the light sensors 33 may be transmitted from the light guide 35 by additional optical elements such as, for example, one or more optical fibers.
In an implementation, the light-emitting source 31 may be one or more light-emitting diodes (LED) configured to emit primarily infrared light. However, any type of light source may be used. For example, the light-emitting source 31 may include one or more organic light emitting devices (“OLEDs”), lasers (for example, diode lasers or other laser sources), hot or cold cathode fluorescent lamps, incandescent or halogen light sources. In the illustrated implementation, the light-emitting source 31 is disposed at the periphery of the light guide 35. However, alternative configurations are within the contemplation of the present disclosure. For example, the light-emitting source 31 may be remote from the light guide 35 and light produced by the light-emitting source 31 may be transmitted to light guide 35 by additional optical elements such as, for example, one or more optical fibers, reflectors, etc. In the illustrated implementation, one light-emitting source 31 is provided; however, two or more light-emitting sources may be provided in other implementations.
The transparent material may have an index of refraction greater than 1. For example, the index of refraction may be in the range of about 1.4 to 1.6. The index of refraction of the transparent material determines a critical angle ‘α’ with respect to a normal of front surface 37 such that a light ray intersecting front surface 37 at an angle less than ‘α’ will pass through front surface 37 but a light ray having an incident angle with respect to front surface 37 greater than ‘α’ will undergo total internal reflection (TIR).
In the illustrated implementation, the light guide 35 includes a light-turning arrangement that reflects emitted light 41 received from light-emitting source 31 in a direction having a substantial component orthogonal to the front surface 37. More particularly, at least a substantial fraction of reflected light 42 intersects the front surface 37 at an angle to the normal that is less than critical angle ‘α’. As a result, such reflected light 42 does not undergo TIR, but instead may be transmitted through the front surface 37. It will be appreciated that the reflected light 42 may be transmitted through the front surface 37 at a wide variety of angles.
In an implementation, the light guide may have a light-turning arrangement that includes a number of reflective microstructures 36. The microstructures 36 can all be identical, or have different shapes, sizes, structures, etc., in various implementations. The microstructures 36 may redirect emitted light 41 such that at least a substantial fraction of reflected light 42 intersects the front surface 37 at an angle to normal less than critical angle ‘α’.
As illustrated in
In some implementations, the low resolution image data may include information that identifies image characteristics at x-y locations within the image.
The process 60 continues at block 64 with obtaining a first reconstructed depth map from the low resolution image data. The reconstructed depth map contains information relating to the distance of the surfaces of the object from the surface of the device. Block 64 may upscale and retrieve notable object structure from the low resolution image data, with the first reconstructed depth map having a higher resolution than the low resolution image corresponding to the low resolution image data. In some implementations, the first reconstructed depth map has a resolution corresponding to the final desired resolution. According to various implementations, the first reconstructed depth map may have a resolution at least about 1.5 to at least about 6 times higher than the low resolution image. For example, the first reconstructed depth map may have a resolution at least about 3 or 4 times higher than the low resolution image. Block 64 can involve obtaining a set of reconstructed depth maps corresponding to sequential low resolution images.
Block 64 may involve applying a learned regression model to the low resolution image data obtained in block 62. As described further below with reference to
Returning to
In some implementations, a relatively simple trained non-linear regression model may be applied. In one example, an input layer of a neural network regression may include a 5×5 patch from a first reconstructed depth map, such that the size of the input layer is 25. A hidden layer of size 5 may be used to output a single depth map value.
The processes described above with reference to
The process 100 continues at block 104 by vectorizing the training set data to obtain a low resolution matrix C and a high resolution matrix D. Matrix C includes m vectors, each vector being a vectorization of one of the training low resolution images, which may include values representing signals as received or simulated from the sensor system for all (or a subset) of the low resolution images in the training set data. Matrix D also includes m vectors, each vector being a vectorization of one of the training high resolution images, which may include 0 to 1 grey scale depth map values for all (or a subset) of the high resolution depth map images in the training set data. The process 100 continues at block 106 by performing a linear regression to determine to learn a scaling weight matrix W, with D=W×C. W represents the linear relationship between the low resolution images and high resolution depth maps that may be applied during operation of an apparatus as described above with respect to
The process 110 continues at block 114 by extracting features from the first reconstructed depth maps. In some implementations, multiple multi-pixel patches are randomly selected from each of the first reconstructed depth maps.
If used, the multi-pixel patches can be vectorized to form a multi-dimensional feature vector. For example, a 7×7 patch forms a 49-dimension feature vector. All of the patch feature vectors from a given R1i matrix can be then be concatenated to perform training. This may be performed on all m first reconstructed depth maps (R11-m).
Returning to
Another aspect of the subject matter described herein is an apparatus configured to identify fingertip locations. The location information can include translation (x, y) and depth (z) information.
The process 130 continues at block 134 by optionally performing segmentation on the reconstructed depth map to identify the palm area, reducing the search space. The process continues at block 136 by applying a trained non-linear classification model to classify pixels in the search space as either fingertip or not fingertip. Examples of classification models that may be employed include random forest and neural network classification models. In some implementations, features of the classification model can be multi-pixel patches as described above with respect to
In one example, an input layer of a neural network classification may include a 15×15 patch from a second reconstructed depth map, such that the size of the input layer is 225. A hidden layer of size 5 may be used, with the output layer having two outputs: fingertip or not fingertip.
The process 130 continues at block 138 by defining boundaries of pixels identified as classified as fingertips. Any appropriate technique may be performed to appropriately define the boundaries. In some implementations, for example, blob analysis is performed to determine a centroid of blobs of fingertip-classified pixels and draw bounding boxes. The process 130 continues at block 140 by identifying the fingertips. In some implementations, for example, a sequence of frames may be analyzed as described above, with similarities matched across frames.
The information that can be obtained by the process in
In some implementations, block 152 includes obtaining second reconstructed depth maps by applying a learned non-linear regression model to first reconstructed depth maps that are obtained from the training set data as described with respect to
The process 150 continues at block 154 by extracting features from the reconstructed depth maps. In some implementations, multiple multi-pixel patches are extracted at the fingertip locations for positive examples and at random positions exclusive to the fingertip locations for negative examples. The features are appropriately labeled as fingertip/not fingertip based on the corresponding ground truth depth map. The process 150 continues at block 156 by performing machine learning to learn a non-linear classification model.
Arrangement 230 (examples of which are described and illustrated herein above) may be disposed over and substantially parallel to a front surface of the interactive display 202. In an implementation, the arrangement 230 may be substantially transparent. The arrangement 230 may output one or more signals responsive to a user gesture. Signals outputted by the arrangement 230, via a signal path 211, may be analyzed by the processor 204 as described herein to obtain reconstructed depth maps, identify fingertip locations, and recognize instances of user gestures. In some implementations, the processor 204 may then control the interactive display 202 responsive to the user gesture, by way of signals sent to the interactive display 202 via a signal path 213.
The various illustrative logics, logical blocks, modules, circuits and algorithm processes described in connection with the implementations disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. The interchangeability of hardware and software has been described generally, in terms of functionality, and illustrated in the various illustrative components, blocks, modules, circuits and processes described above. Whether such functionality is implemented in hardware or software depends upon the particular application and design constraints imposed on the overall system.
The hardware and data processing apparatus used to implement the various illustrative logics, logical blocks, modules and circuits described in connection with the aspects disclosed herein may be implemented or performed with a general purpose single- or multi-chip processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, or, any conventional processor, controller, microcontroller, or state machine. A processor also may be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In some implementations, particular processes and methods may be performed by circuitry that is specific to a given function.
In one or more aspects, the functions described may be implemented in hardware, digital electronic circuitry, computer software, firmware, including the structures disclosed in this specification and their structural equivalents thereof, or in any combination thereof. Implementations of the subject matter described in this specification also can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a computer storage media for execution by, or to control the operation of, data processing apparatus.
If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium, such as a non-transitory medium. The processes of a method or algorithm disclosed herein may be implemented in a processor-executable software module which may reside on a computer-readable medium. Computer-readable media include both computer storage media and communication media including any medium that can be enabled to transfer a computer program from one place to another. Storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, non-transitory media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection can be properly termed a computer-readable medium. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. Additionally, the operations of a method or algorithm may reside as one or any combination or set of codes and instructions on a machine readable medium and computer-readable medium, which may be incorporated into a computer program product.
Various modifications to the implementations described in this disclosure may be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other implementations without departing from the spirit or scope of this disclosure. Thus, the claims are not intended to be limited to the implementations shown herein, but are to be accorded the widest scope consistent with this disclosure, the principles and the novel features disclosed herein. Additionally, a person having ordinary skill in the art will readily appreciate, the terms “upper” and “lower” are sometimes used for ease of describing the figures, and indicate relative positions corresponding to the orientation of the figure on a properly oriented page, and may not reflect the proper orientation of the device as implemented.
Certain features that are described in this specification in the context of separate implementations also can be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation also can be implemented in multiple implementations separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Further, the drawings may schematically depict one more example processes in the form of a flow diagram. However, other operations that are not depicted can be incorporated in the example processes that are schematically illustrated. For example, one or more additional operations can be performed before, after, simultaneously, or between any of the illustrated operations. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products. Additionally, other implementations are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results.
This application claims benefit of priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application No. 61/985,423, filed Apr. 28, 2014, which is incorporated by reference herein in its entirety and for all purposes.
Number | Date | Country | |
---|---|---|---|
61985423 | Apr 2014 | US |