The present invention relates to a method for controlling an interface by means of a camera fitting out a communications terminal. This interface may be graphic (controlling the display on a screen) or audio (controlling the sound emitted by the loudspeakers of the piece of equipment) or both simultaneously (controlling a video).
This method, notably but not exclusively, applies to calculating in real time an apparent movement by means of a camera fitting out a communications terminal, to interpreting this apparent movement as user commands, and then to modifying the interface which results therefrom.
The method according to the invention is particularly adapted to communications terminals having limited resources both in computing power and in memory capacity.
This method may replace or advantageously complete certain repetitive pressing sequences on the keys of a terminal. The terminal may be a communications terminal, a computer or an audio or video terminal (hi-fi system, video reader).
Following the evolution of needs and technology, it is interesting to show that communications terminals increasingly involve rich multimedia contents. Not only the terminals propose a larger variety of media, but also the size of the latter does not stop increasing; the images are increasingly large and the stored texts increasingly large.
Because of the small size of most communications terminals, the capacities for the display or input control devices are limited. This has the immediate consequence of considerably burdening the graphic interfaces of these terminals. For example, images or texts have to be partially displayed in order to retain comfortable legibility. Thus, displacing the image or text requires the frequent pressing of several keys. Also, controlling the scrolling of an audio or video file is reduced to using the keys of the keyboard or the remote control keys which does not allow much freedom for light, sound or video effects, such as mixing, adding percussion effects or other superposed audio or video effects.
In very many cases, the number of pressing operations on the keys becomes rapidly prohibitive and very serious for a user; let us also and non-exclusively mention the adjustment of the luminosity, the contrast, the sound volume levels, the navigation in a menu or a set of icons, the displacements of a graphic cursor, the scrolling of a text or image, the change in the scale at which an image or a text is displayed, the triggering and the displacement in a tape or audio or video file, the scrolling of a sound track at different speeds or even the control of action games.
It is known that inputting user commands by simple voluntary movements of the communications terminal may advantageously replace certain repetitive pressing sequences on the keys. Notably, this principle makes possible use of commands proportional to the displacement of the terminal, providing a form of feedback control favorable to better interaction between the user and the terminal, and therefore to a larger comfort in use and more accurate control. Moreover, the use of commands formed by voluntary movements of the communications terminal provides new perspectives. This new user input may advantageously be used in conjunction with other terminals. For example, with this method the graphic cursor of a desktop computer may be controlled, or the volume, the contrast, the intensity, the scrolling of an audio or video file may be controlled on a piece of equipment such a hi-fi system or video reader, by means of the movements of the communications terminal. Also, external events may influence the communications terminal in its interpretation of the apparent movement into commands; for this purpose and non-exhaustively, let us mention as examples, an incoming communication which inhibits the method in order to be able to take this communication, or even a network game taking into account the actions of the other players.
The movement of the communications terminal may be achieved via specific sensors taken on board the terminal. These sensors traditionally are accelerometers or gyroscopes. With the latter it is often possible to reference in an absolute way the position or the orientation of the terminal in space. However, these sensors pose integration problems in increasingly reduced terminals and induce production overcost. Moreover, their accuracy docs not always allow fine control of the interface by very low amplitude movements.
Now, communications terminals integrating a camera are more and more numerous. It is then legitimate to want to use this integrated camera for obtaining information on the movement of the terminal.
It is known that information on movement may be computed by means of a camera observing a textured and illuminated planar surface. However, the difficulty in computing this movement information becomes insurmountable when the camera fitting out a communications terminal, observes any scene without any constraint on illumination.
A first difficulty is that the camera fitting out a communications terminal does not generally observe a planar surface or even a single object, and therefore the observed movement results from the movement of the camera and of the objects present. Computing the three-dimensional movement of the camera with any image sequence as sole piece of information, is still to a large extent an open problem, where most of the difficulties remain unsolved. In the present state of knowledge, it is therefore not conceivable to a posteriori restore all the movements of the terminal only from images acquired by the camera.
A second significant difficulty is that, as the illumination of the scene cannot be controlled by the device, even by using a flash, the color intensities of the textures recorded in the images of the camera vary in an unpredictable way in the successive images. This then prohibits the use of well-known techniques for computing the apparent movement based on the constancy of the intensities of the colors of the observed textures.
The object of the present invention is to find a remedy to these drawbacks and to allow the apparent movement to be computed in real time by means of images from the camera, and then to interpret this apparent movement as user commands. This type of system may advantageously be used when the intention is to navigate in a menu, to displace an image or text, or to position a graphic cursor, or even when games are played requiring the control of a movement in several directions simultaneously and intuitively, or else to control the sound volume, the sound or light contrast, the light intensity, the scrolling of an audio or video file, or to add sound effects by superposition on the audio file or to mix effects on sound or multimedia tapes.
Thus, the method according to the invention comprises the following steps:
The computation of the apparent movement is a problem widely dealt with in the literature, and an exhaustive synthesis of which may notably be found in the articles of the journals Brown, L. G., A survey of Image Registration Techniques, 1992, and Zitova and Flusser, Image Registration Methods: a survey, 2003.
Apart from the computation of a dense movement which is irrelevant in our case where a single piece of information on the movement is required, we note two main approaches for computing the apparent movement by means of parametric models: an indirect approach which consists of matching primitives from the images; and a direct approach which utilizes the equation for optical flow conservation, described in Horn and Schunck, Determining Optical Flow, 1981. This last very widespread approach assumes as a postulate that any variation in intensity of the images over time is exclusively due to the displacement of an object, the perceived intensity of which is supposed to be constant in the successive images, or from the observation point of the scene.
The indirect methods proceed with computing the movement in three steps: (i) extracting the primitives (corners, regions, etc.), (ii) pairing the primitives over several images, (iii) adjusting the parametric model. The delicate points of these methods deal with the selection of the primitives to be extracted, of their numbers, and also with the rejection of false pairings. With these methods, it is possible to rediscover movements of large amplitude if certain primitives may be paired between successive images. Nevertheless, each of these steps may prove to be costly both in terms of computation complexity and memory occupancy. Accordingly, these methods do not seem to be indicated within the scope of applications on board the terminals with limited memory resources and limited computing power, the cameras of which have low resolution in a preview mode.
The direct methods compute the movement from the intensities of the image. The computation of the dense movement is a sub-determined problem which requires adding an additional constraint. For example, estimation of a dense displacement field is performed by means of prior regularity as in Horn and Schunck, Determining Optical Flow, 1981, or a constraint of local uniformity as in Lucas and Kanade, An Iterative Image Registration Technique with an Application to Stereo Vision, 1981. By searching for a movement described by a global parametric model, as described in Bergen et al., Hierarchical model-based motion estimation, 1992, we introduce a sufficient constraint on the displacement field.
In order to compute the movement between two images, parameters of the movement model which minimize a given criterion are sought. This criterion is most often a criterion of the least square type, and is globally computed on the whole of the pixels of the image. It is also possible to generalize this criterion by a robust standard similar to the one described in Odobez and Bouthemy, Robust Multiresolution Estimation of Parametric Motion Models, 1995. However the minimization of such a criterion becomes iterative and cumbersome in terms of computing cost.
It is known that direct computing techniques do not allow estimation of movements of large amplitudes and this in spite of the use of multi-scale techniques as in Burl and Adelson, The laplacian pyramid as a compact image code, 1983.
In order to find a remedy to these drawbacks and to thereby reduce the computing time and compute apparent movements of large amplitude, the method according to the invention proposes preprocessing of the images by reducing them by a predetermined factor f.
As explained above, because of frequent and unpredictable changes in the illumination conditions of the scene and in the automatic control of the balance of whites of the camera, the color intensities of the recorded textures in the images vary in the successive images. Now, the direct methods based on intensity differences of the images are very sensitive thereto and may then provide approximate or even absurd results.
In order to find a remedy to this drawback, the method according to the invention comprises preprocessing of the images by histogram equalization so as to restore a series of images, the intensity levels of which are then standardized.
Further, the images acquired in an economical mode by the camera are generally of low resolution and noisy.
In order to suppress this drawback, the invention proposes their preprocessing by reducing the number of representation levels of the color intensities.
It is known that multi-scale techniques pose the delicate problem of propagation of the motion information from one scale to the other. However, these methods compute a specific movement when they are well initialized.
The object of the method according to the invention is notably to find a remedy to this drawback by performing the computation of the apparent movement with two successive images possibly preprocessed as follows:
The method according to the invention proposes that the computation of an apparent translational movement m is performed by means of two images I1 and I2, and comprises the following steps:
Owing to the degradation of the images transmitted by the camera in the economical acquisition mode, the computation may provide an apparent movement which is corrupted by noise, or which may have absurd values.
Advantageously, filtering the apparent movement may then consist of canceling each of its components if the latter, as an absolute value, is lower than a predetermined threshold and, in the other cases, of reducing or increasing it by this same threshold. A non-limiting example of such filtering in the case of translation is given by the following formula:
m′=(m1′,m2′)=(sign(m1).max(O,|m1|−s),sign(m2).max(O,|m2|−s)).
Advantageously, in order to filter the absurd results out of the movement computation, filtering may consist of imposing an upper limit and lower limit for each of its components.
Advantageously, the displacement of the graphic elements or the adjustment of the sound or light or contrast level or the scrolling of the audio or video file will be performed in proportion to the computed apparent movement, with a gain possibly proportional to this apparent movement.
The present invention also proposes that the apparent movement be interpreted as commands of the graphic and/or audio and/or video interface according to the application context and/or the simultaneous pressing of one or several keys of the keyboard by the user.
The different modes for controlling the graphic interface according to the invention concern:
The different modes for controlling the audio interlace according to the invention concern:
The graphic and/or audio and/or video elements which may be controlled in this way may consist in:
For example, an apparent movement in a certain direction may be interpreted as a command for changing scale by forward zooming, and as a command for changing scale in the opposite direction by backward zooming. Also, an apparent movement in a certain direction may be interpreted as a command for displacing a graphic and/or audio and/or video element in the same direction or in the opposite direction. An apparent movement in a certain direction may be interpreted as a command for rotating a graphic element in a certain direction and in the opposite direction when the filtered apparent movement is in an opposite direction. An apparent movement in a certain direction may be interpreted as a command for increasing the sound or light or contrast level and for reducing the sound or light or contrast level when the filtered apparent movement is in an opposite direction.
The method according to the invention may also be used for controlling graphic and/or audio and/or video elements of another terminal connected via a wire or aerial route (via infrared, Bluetooth, Wifi, GSM, GPRS, UMTS, CDMA or W-CDMA or Internet) to the communications terminal conducting the measurement of the apparent movement. An application of this method may therefore consist of controlling the graphic cursor of a PC or another terminal from a communications terminal fitted out with an integrated camera.
Advantageously, the apparent movement may be computed and interpreted as a user command only when a key associated beforehand with a control of the interface is kept pressed down, and may no longer be computed or interpreted as a user command when none of these keys is pressed down.
The method according to the invention also allows other user inputs to be taken into account in combination with the apparent movements like voice commands, commands received from an external keyboard or from another
terminal physically connected or connected via infrared, Bluetooth, Wifi, GSM, GPRS, UMTS, CDMA or W-CDMA or Internet.
It is also possible with this invention to adjust the sound and light levels and contrasts, to trigger a sound, a series of sounds, the scrolling of an audio or video file, fast scrolling in one direction or in the other of an audio or video file, to produce superposition effects of sounds or images or mixing effects of sound by means of the voluntary or involuntary movement of the user of the piece of equipment.
Embodiments of the invention will be described hereafter, as non-limiting examples with reference to the appended drawings, wherein:
In the examples shown in
In the example shown in
Number | Date | Country | Kind |
---|---|---|---|
0508188 | Jul 2005 | FR | national |
0603525 | Apr 2006 | FR | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/FR2006/001846 | 7/26/2006 | WO | 00 | 1/29/2008 |