The present invention relates in general to a hand-held device and to a method of controlling the hand-held device.
Hand-held devices are available in many different shapes and sizes and for many different functions. Examples include mobile electronic games consoles, personal music players and personal digital assistance (PDAs), as well as communication-oriented devices such as cellular telephones. These hand-held devices typically contain computing applications requiring directional input from a user to control the movement of cursors, pointers, elements in games, the scrolling of a display screen or navigation through a menu structure. A directional command is supplied through a keypad, thumbwheel, touchpad, joystick or similar manipulable input. Typically these manipulable inputs are finger operated and can be difficult to use, particularly when the hand-held device is itself relatively small. The manipulable inputs tend to require relatively fine and accurate control by the user and sometimes operations become frustratingly difficult.
It is often desired to operate the hand-held device independently in free space. This restricts the use of other known devices for providing a directional input, such as a mouse or trackball, which rely on a desk or other fixed operating surface.
One aim of the present invention is to provide a hand-held device, and a method of controlling the same, which is simple and intuitive for a user to operate. A preferred aim is to avoid or reduce the use of manipulable inputs such as a keypad. Another preferred aim is to reduce the level of user dexterity required to operate the device.
Other aims and advantages of the invention will be discussed below or will be apparent from the following description.
According to the present invention there is provided an apparatus and method as set forth in the appended claims. Preferred features of the invention will be apparent from the dependent claims, and the description which follows.
Briefly, the present invention provides a hand-held device which carries an image receptor such as a camera. Images captured by the image receptor are processed to determine directional movements of the hand-held device. The detected movement is then used to control an operation or output of the hand-held device.
In a first aspect of the present invention there is provided a hand-held device, comprising: a computing application of the hand-held device which responds to directional commands of a user; an image registering unit to register a series of images; an image processing unit to derive motion data from the series of images corresponding to translational and/or rotational movement of the hand-held device in free space; and a direction control unit to convert the motion data into a directional command and to supply the directional command to the computing application.
For a better understanding of the invention, and to show how embodiments of the same may be carried into effect, reference will now be made, by way of example, to the accompanying diagrammatic drawings in which:
a,
5
b
5
c and 5d are perspective views to illustrate example implementations of the preferred embodiment of the present invention;
a,
6
c and 6c illustrate a first example 2D image processing algorithm;
a and 8b illustrate a preferred example image processing operation using a linear array;
a and 9b illustrate example layouts of linear arrays over an image field; and
Referring to
The hand-held device 10 includes a display screen 11 and one or more user input keys or other manipulable inputs 12. Further, the hand-held device 10 carries an image receptor 15 such as a camera. In one embodiment the camera 15 is integrated within the hand-held device 10. In another embodiment (not shown) the camera 15 is removably attached to the hand-held device 10, such as with a clip-on fitting. In either case, it is preferred that the camera 15 is fixedly arranged in use with respect to a main body portion 10a of the hand-held device 10, such that the camera 15 moves together with the hand-held device 10.
The hand-held device 10 suitably includes a microphone or other audio input 13 and a speaker or other audio output 14 In this case a radio frequency (RF) communication unit 18 is provided having an aerial 19 for wireless communication such as using GSM standards. In other embodiments the hand-held device 10 is arranged for local communication using, for example, Bluetooth or IEEE 802.11 WLAN protocols.
Referring to
Step 310 comprises producing motion data 302 from the image data 301. Here, the image processing unit 162 performs a motion detection algorithm to produce a motion data stream.
At step 320 the motion data 302 is supplied to the direction control unit 164 to control a function or operation of the hand-held device.
The images are preferably captured by the user holding the device 10 in free space and not closely adjacent to any particular object or surface. Ideally, the device is held in the hand at some distance from surrounding objects. Thus, the captured images represent the general surroundings of the hand-held device 10 within a building or externally. In preferred embodiments the device 10 is held between about 0.2 m and 2 m from surrounding objects. This range allows a good field of view from the camera 15 and provides the image data suitable for motion detection.
The camera 15 is fixedly carried by the device 10, such that movement of the device 10 causes images captured by the camera 15 to change. The changed image reflects the change of position of the device 10. Advantageously, the user moves the entire device 10, which requires relatively large motor movements. Most users find it much easier to make large-scale movements with larger motor muscles in their hand, arm or wrist as opposed to making very small movements with fine motor muscles in their fingers or thumb.
Controlling the hand-held device 10 using images from the camera 15 provides a more intuitive and simpler user interface, compared with traditional keypads or other manipulable inputs. The user simply moves the whole device 10 rather than clicking a particular button.
The image-derived interface of the present invention also provides a richer experience for the user than can be achieved by conventional manipulable inputs inputs. Most conventional user input techniques are restricted to translational movement in two directions (up-down and left-right). However, through suitable image signal processing by the image processing unit 162, with the present invention it is possible to distinguish three dimensions of translation (up-down, left-right and zoom-in/out) as well as three dimensions of rotation (pitch, roll and yaw). Although in practice few applications require control in all six of these dimensions of movement simultaneously, providing a combination of any two, three or more movements (such as pitch, roll and zoom) are immediately possible. Such combinations are especially useful within gaming applications, amongst others, by replacing the use of awkward and often unintuitive keypad combinations but still providing an equivalently complex user input.
a,
5
b
5
c and 5d are perspective views to illustrate example implementations of the preferred embodiment of the present invention.
a shows a plan view of the device 10 from above, in which the user rotates the device horizontally in order to control some computing application whose output is displayed on the screen 12 of the device.
b shows the same device 10 and user from below, including the lens 15a of the camera 15 mounted on the underside of the device.
c shows a side elevation of the device 10 and user, and an up-down tilting motion, which may be used to control an up-down motion of some element of a computing application. The field of view of the camera 15 is also illustrated.
d shows an end elevation of the device 10 and user with two further ranges of movement: a left-right tilting motion, and a zooming motion.
In addition to the six degrees of freedom of movement, suitable processing of images derived from the camera may also provide information about the motion of the device relative to specific objects in the environment, rather than to the general background, to provide input to a computing application. For example, the measured motion of the device relative to a physical obstacle may provide useful input to a game in which an avatar's position relative to virtual obstacles provides an element of game play.
In another embodiment the device is not held in the hand but is attached in some other way to the user such that their movements, either deliberate or not, effect directional control to some computing application. In one embodiment the device is wearable or otherwise readily portable. For example, the device is worn at a user's forehead, such that changes in the direction they face are used to control a computer game.
Optic Flow
Characteristics of the camera 15 are determined by the mobile device 10 in which this invention is embodied, while the characteristics of the output or function of a computing application depend on the particular purpose to which this invention is put. Therefore it is the characteristics and implementation of the motion detection algorithm that are discussed here in detail.
In one embodiment, the invention utilises measurements of optic flow within a series of images to determine the motion of the device 10 relative to its surroundings. Optic flow is the perceived motion of objects as the observer—in this case the camera 15—moves relative to them. For example, if the image of an object is expanding but not moving within the field of view then the observer is moving straight towards that object. This measurement would then be interpreted as a translation of the device perpendicular to the plane of the camera 15—in effect a ‘zoom-in’ command issued by the user. Similarly, a series of images dominated by a parallel left-to-right shift corresponds to a shear of the device to the users' right, parallel to the dominant background.
Given sufficiently detailed and high-quality images, a large enough field of view, and sufficient computer processing power it is possible to derive measures of all six degrees of freedom of rotation and translation of the camera 15. In addition, a measure of the time to impact, and hence relative distance from, an obstacle in the surroundings can be also be derived from the ratio between the perceived size of an object in an image and its rate of expansion.
There are many techniques available for the computation of the optic flow characteristics of a series of images, and for subsequently determining the motion of the camera. Some of these techniques involve specialised hardware, including specialised image processing hardware and specialised photoreceptor arrays. Although such devices may be employed in one possible embodiment of this invention, the preferred embodiment requires no hardware modification to the digital device 10 on which the invention is intended to operate. Instead the preferred embodiment utilises the computing resources provided by the device to perform an algorithm to compute characteristics of the optic flow.
Optic flow is mathematically expressed as a vector field in the two dimensional visual field, and typical computer vision systems compute the values of this vector field by analysing differences in series of images.
The simplest types of method for computing optic flow, known as correlation algorithms, rely on spatial search to find the displacement of features between temporally adjacent images.
a is a schematic view of example image features, to show a location of a feature of an image in the first frame I1 of a series of images.
b shows how a correlation algorithm searches the space around that position in a subsequent frame I2. The location of the best match in the second frame I2 is found and the translation between the two locations determines an optic flow vector V as shown in
The process is then repeated for other features in the first image I1 to produce a description of the optic flow vector V for some proportion of the image. If many partial match locations are found within the range of search (illustrated by a large circle in
Correlation algorithms are conceptually simple and robust, but computationally expensive, since they require searching a potentially large 2D space to find feature matches. A typical correlation algorithm capable of determining the complete optic flow field of an image has complexity of the order O(V2S), where S is the image size and V is the maximum motion velocity that may be detected. This complexity is a particular concern when implementing such algorithms on mobile devices that have restricted computational power, and for applications in which real-time responses to user input is required.
An additional refinement to this technique of using 1-dimensional arrays is to discount changes in average lighting intensity across an image by taking first or second derivatives of the image values along the length of the array, rather than absolute values.
Such linear arrays may be combined in various configurations to provide estimates of various properties of the optic flow field and the relative motion of the camera with respect to a fixed background.
The accuracy of the relative motion of the camera can be further enhanced by combining independent estimations of the optic flow in each of the red, green and blue colour channels typically used to represent a digital image, rather than by making a single estimation of the optic flow using a grey-scale approximation of an original colour image.
a shows a configuration of arrays that may be used to estimate a zooming motion of the camera, or a left-right or up-down tilting motion, depending on the magnitude and sign of the measured components of motion of each array.
b shows a configuration of linear arrays that may be used to estimate horizontal rotation of the camera.
Thus a further refinement of this invention is to adjust the configuration of linear arrays, including the configurations shown in
Referring to
Example Applications
In a first preferred example, the hand-held device 10 is controlled in relation to an audio output such as through the speaker 14. In these embodiments the direction control unit 164 causes a musical output to change, allowing the user to create music through movement of the hand-held device. The created music is stored such as in memory 17 and for retrieval later such as a polyphonic ring tone.
In a second example, sound output is controlled with rising and falling pitch according to movement of the device 10 to create a “swoosh” or “light sabre” sound.
In other embodiments the device 10 is controlled to provide a textual input by a “hand writing” or “graffiti” function or a “dasher” type system.
In a third embodiment the device 10 is controlled to navigate menu structures, to scroll a display 11 in 1, 2 or 3 dimensions, or to control a movement of a graphical pointer.
Many other creative gaming and imaging effects are also applicable in relation to the present invention. For example, shaking the device creates a snow storm effect to gradually white out an image displayed on the display screen 11. Alternatively, simple 2D line drawings are created through movement of the device. Further, many games employing motion such as a “pachinko” type game controlling the motion of balls falling across a screen, or a “ball maze” type game in which a ball is guided around a maze whilst avoiding holes. Other games include surfing, snowboarding or sky diving type activities where motion of the device 10 controls the displayed motion of an avatar.
Yet further applications of the present invention control operations within the device 10. For example, a mobile telephone recognises a physically inactive state (e.g. lying on a desk) and then recognises activity (i.e. the phone has been picked up). This activity recognition can, for example, then be used to recognise that the user has picked up the phone and automatically answer an incoming voice call.
Another application is to allow a user to view an image that will not fit on a display of the device. Movement of the device can be used to scroll across the image to the left, right, or up/down. Also movement of the device towards the user or away can zoom in or out on a part of the view image. Thus the impression is given that a user is viewing as if through a moving magnifying glass or window onto the image.
A further application is the use of the motion detection to navigate web pages. In such an application the up/down motion or motion towards/away from a user (which can be used to zoom) may also activate a chosen hyperlink.
In another gaming example, movement of the device 10 is used to generate a random number or pseudo random number, such that movement of the device 10 is equivalent to shaking of dice.
The present invention has many benefits and advantages, as will be apparent from the description and explanation herein. In particular, a hand-held device is provided which is simple and intuitive for a user to operate. The image-derived control operations avoid or reduce the need for user to action a keypad or other manipulable inputs-type input. Moving substantially the whole device reduces the level of user dexterity required to operate the device. In some embodiments of the invention, this may allow people with movement difficulties, such as due to illness or injury, to better be able to operate a device.
Although a few preferred embodiments have been shown and described, it will be appreciated by those skilled in the art that various changes and modifications might be made without departing from the scope of the invention, as defined in the appended claims.
Implementation On A Mobile Device
The algorithm tested detects optic flow translation across the visual field of the image in the x and y directions independently. The optic flow detector consists of three sets (one for each colour channel) of two crossed arrays, each filtered by a 1D gaussian filter normal to the direction of the array. If the value of a cell at position i within an array at frame t is given by It,color,orientation(i), where 0≦i≦1, then the displacement, d(t), between frames in each array is that which minimises the mean absolute difference between cells at that displacement.
where v is the maximum translation velocity to be detected, measured in pixels per frame. In order to detect false positives—i.e. camera motions that do not correspond to restricted tilting—a threshold, θ, was defined for this difference: if no correlation less than the threshold were found, then the translation in that array was recorded as not matched. If all arrays for an orientation, x or y, were matched then the optic flow in each direction is calculated as the mean of the displacements within each colour channel:
The standard deviation and aperture of the gaussian filter were both 70% of the total available image size, and the threshold was 2% of the total available image intensity.
Note that this differs from the technique used by Golland (P Golland and A M Bruckstein, Motion from Color, Tech Report, Israel Institute of Technology, 1997). Whereas she uses a color conservation assumption (i.e. that the ratios between color remains constant) here we are making a color intensity conservation assumption (i.e. that the absolute intensity of each color channel remains constant). This latter consumption was found to yield more accurate estimates of optical flow in practice, possibly due to two reasons. First, we are interested in the small translations due to camera tilt, and the color intensity assumption is more likely to hold in these cases than for larger motions. Second, the color channels are filtered separately and, since division is not distributive over addition, the color conservation assumption does not imply that the ratios between filtered color channel values remain constant.
This algorithm was implemented in micro edition Java (J2ME) for a Mobile Information Device Profile (MIDP2.0) with Mobile Media API (JSR-135), and tested on a Nokia 7610 mobile phone with a Symbian series 60 operating system with 8 MB RAM (6 MB heap available) and a 32-bit RISC ARM9 CPU running at 123 MHz. The camera on this device captures video data at 15 frames per second at a resolution of 128×96 pixels with a field view of 53°. This implementation was chosen since it represents a common, standard, mobile platform. No attempt was made to optimise the implementation for the characteristics of the device and hence the performance of the algorithm on this platform could be reasonably expected to be similar for a wide range of similar devices.
Testing And Evaluation
This implementation was tested in two ways: for accuracy and efficiency, and for usability.
Accuracy And Efficiency
The algorithm was tested for accuracy and efficiency against a set of short video clips of various interior scenes recorded on the mobile phone camera as the device underwent a tilting motion that a user (in this case, the author) felt represented by a clear command action. The optic flow in these clips was dominated by translation of known magnitude and direction, thus a mean error, in pixels per frame, could be calculated. In total the algorithm was tested against 150 frames with an average translation of 8.13 pixels per frame (equivalent to the device having angular velocity of 21° per second). Thus, the algorithm was tuned to recognise that action of a user, rather than the user being forced to adapt to a fixed interface. The efficiency of the algorithm was tested by measuring the time taken to calculate optic flow per frame on the target device.
The large size of orthogonal filter, and the relatively large translations between frames, suggests that significant improvements in efficiency could be gained by decreasing the resolution of both the filter and arrays used to calculate correlations. Instead of taking the value of every pixel when filtering the image, only every fth is taken. And instead of calculating the value of every pixel (and corresponding correlation) along the x and y arrays, only every ath is taken. The effect on computation time and accuracy of array and filter resolution are summarised in Table 1 and
It is clear that for this particular application the resolution of both the correlation array and filters can be lowered to give an increase in efficiency with little reduction in accuracy. In particular, an array and filter resolution of a=3 and f=4 gives a performance of 11 frames per second (over twice the base rate) while the error rate only increases from 0.9 to 1.1 pixels per frame.
User Evaluation
Whether or not the performance of this algorithm is satisfactory for the purpose of controlling a user interface was tested by two simple tasks. In the first, the tilt interface was used in place of the cursor arrow keys: the user was presented with a series of polar directions (up, down, left, right) and had to tilt the device in the desired direction. Once a movement of the device was registered then the next direction was presented to the user. Both the accuracy and average time per “click” were recorded for a total of 100 clicks per user. As a control, the task was also repeated using the arrow keys of the phone d-pad, and also on a desktop computer using the cursor keys on a standard keyboard.
The second task was a proportional control task, in which the user was asked to follow a randomly moving circular target. The target had a radius of 25 pixels per second. (The screen resolution of the Nokia 7610 is 176×144 pixels) . As a control the task was also repeated using the arrow keys of the phone d-pad, and on a desktop computer using a standard mouse. The proportion of time that the pointer strayed from the target over a 1 minute period was recorded.
5 users were recruited for each task and were given a ten minute practice session with the interface. The results are given in table 2 and table 3.
Although N is low in this case, the results give a qualitative indication of the potential of the interface. In the case of the direction-click task the error rate using the tilt interface is significantly higher than using either of the standard button-based interfaces, though the rate of response was comparable to that of the standard phone interface. Observation suggests that a significant source of error in registering a directional click was the tendency of users to “skew” the phone in a non-polar direction, rather than make a “clean” tilt.
For the target pursuit task, the error rate was similar to that of the standard phone keys but worse than that for the mouse interface. It should be noted that none of the users recruited for this task were teenagers or expert phone users and reported problems with the small size of the Nokia 7610 d-pad, particularly when required to give fast repeated clicks when following a moving target. This may partially explain the (relatively) poor performance of the d-pad interface compared to an ostensibly less reliable tilt interface.
Taken together, these results suggest that this optical flow algorithm is efficient enough to support a tilting vision-based interface. However, the high error rate on the repetition task may preclude it from being used as a straight replacement for cursor keys in applications, such as business or productivity applications, where accurate discrete cursor control is essential. The results on the target pursuit task suggest that the interface would be more suited to games or other applications where proportional-controlled movement is required, and where continuous feedback on the effect of users' input is available.
Attention is directed to all papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.
All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.
Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
The invention is not restricted to the details of the foregoing embodiment(s). The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.
Number | Date | Country | Kind |
---|---|---|---|
0503253.7 | Feb 2005 | GB | national |