There are many computing devices available which allow touch-based input, such as many smart phones and tablet computers. Some of these devices also offer gesture-based input, where a gesture involves the motion of a user's hand, finger, body etc. An example of a gesture-based input is a downwards stroke on a touch-screen which may translate to scrolling the window downwards. Some touch-sensitive devices can detect multiple simultaneous touch events which enables detection of multi-touch gestures. An example of a multi-touch gesture-based input is a pinching movement on a touch-screen which may be used to resize (and possibly rotate) images that are being displayed. These computing devices which offer gesture-based input comprise gesture recognizers (implemented in software) which translate the touch sensor information into gestures which can then be mapped to software commands (e.g. scroll, zoom, etc).
In order to train and evaluate gesture recognizers, recordings of actual gestures made by human users are used. However, these recordings can contain imprecise gestures and make it difficult to test the full behavior of a gesture recognizer. Furthermore, because of the subjective nature of gesture instructions given to users generating the gestures which are recorded, it may be necessary to manually check the recordings to ensure that the gestures recorded actually correspond to the expected gesture.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known methods of training or evaluating gesture recognizers.
The following presents a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not an extensive overview of the disclosure and it does not identify key/critical elements of the invention or delineate the scope of the invention. Its sole purpose is to present some concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later.
A synthetic gesture trace generator is described. In an embodiment, a synthetic gesture trace is generated using a gesture synthesizer which may be implemented in software. The synthesizer receives a number of inputs, including parameters associated with a touch sensor to be used in the synthesis and a gesture defined in terms of gesture components. The synthesizer breaks each gesture component into a series of time-stamped contact co-ordinates at the frame rate of the sensor, with each time-stamped contact co-ordinate detailing the position of any touch events at a particular time. Sensor images are then generated from the time-stamped contact co-ordinates using a contact-to-sensor transformation function. Where there are multiple simultaneous contacts, there may be multiple sensor images generated having the same time-stamp and these are combined to form a single sensor image for each time-stamp. This sequence of sensor images is formatted to create the synthetic gesture trace.
Many of the attendant features will be more readily appreciated as the same becomes better understood by reference to the following detailed description considered in connection with the accompanying drawings.
The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein:
Like reference numerals are used to designate like parts in the accompanying drawings.
The detailed description provided below in connection with the appended drawings is intended as a description of the present examples and is not intended to represent the only forms in which the present example may be constructed or utilized. The description sets forth the functions of the example and the sequence of steps for constructing and operating the example. However, the same or equivalent functions and sequences may be accomplished by different examples.
Training a gesture recognizer using recordings of actual gestures is problematic because the recordings can contain imprecise gestures and it is very difficult to obtain a range of recordings which test the full behavior of a gesture recognizer. Training and evaluation of a gesture recognizer can be improved through use of synthetically generated gestures and methods and apparatus for synthesizing gestures are described below. With synthesized gestures, the gestures can be made with a precision and regularity which a human generating a gesture recording cannot replicate. This enables a more thorough evaluation of the code within a gesture recognizer, in particular by exploring the boundary cases for recognizing gestures (e.g. gestures which are only just long enough to qualify as a particular gesture, or are very quick or slow, or start at the boundary of a defined ‘start’ region for a gesture, etc).
The lower diagram in
One of the inputs 118 comprises parameters associated with the sensor (e.g. sensor 106) the operation of which is being synthesized. Such parameters may comprise some or all of: the physical dimensions of the sensor (e.g. height and width), the number of sensor points (e.g. a grid of 15×13 cells), the output frequency (e.g. in frames/second) and the number of quantization levels of capacitance that the sensor is aware of (e.g. 32 levels). There may also be additional parameters such as the base capacitance level, i.e. the capacitance value of the sensor when there is no contact with the sensor, (e.g. if it is non-zero, as described below), data specifying the sensor layout where it comprises an irregular grid of cells, the percentage of a sensor which may fail, etc. Through specification of the appropriate parameters in this input 118, the synthesizer 112 may be used to generate synthetic traces for any capacitive sensor and these sensors may be used in many different types of devices, from small devices (e.g. smart phones or touch-sensitive mice or input pads) to very large devices (e.g. surface computers or large touch-sensitive televisions).
Another of the inputs 119 comprises data on building blocks (which may also be referred to as ‘gesture components’) from which gestures may be constructed. These building blocks may themselves be constructed from geometric constructs, e.g. a labeled 5-part vector describing a vertical scroll movement. In an example, each building block may be defined as a vector comprising the start and end coordinates of the movement (normalized so that they are in the range 0 to 1), the duration of the gesture and a unique ID (which may be referred to as a ‘vector tag’). The building block may also comprise timing information, such as any time offset before the gesture starts (e.g. 3 seconds from the start of “recording”) and/or the time taken to perform a gesture (for example, if you start to move a single finger “fast enough” then this may be interpreted as doing a “flick” and correspond to one command, whereas moving the same finger “slowly” should be considered a “scroll” and in such an example, the timing information may detail the amount of time for a finger to move between the start and end points). Where timing information is included, it may be in terms of absolute values or normalized values. In some examples, however, any timing information that is required may be put at a higher level (e.g. in the data defining the required gestures 120). In an example, each building block describes a single-touch gesture (e.g. a gesture using a single finger) and a multi-touch gesture may be built up from multiple building blocks (e.g. 3 distinct IDs would be used to form a three finger gesture). Single-touch gestures may also be built up from multiple building blocks, as described in more detail below. In some examples, a building block may have multiple segments and so may be defined in terms of one or more vectors.
A further one of the inputs 120 comprises data which defines the gestures which are to be synthesized by reference to one or more building blocks (or gesture components) and may be referred to as ‘gesture data’. In an example, this data may be provided in the form of an input script, which may be an XML file, which defines one or more output file names and for each output file name provides a sequence of building block IDs with any necessary timing information (e.g. where it is not provided within a building block). Complex gestures may be built up from a sequence of line segments, each defined by a separate building block. Each output file name corresponds to a generated synthetic trace and through use of such an input script, many traces can be defined for generation. In an example, the input script may also include the sensor parameters 118 described above, particularly where the sensor parameters 118 are the same for all traces generated by a single script but may be different for other input scripts. Alternatively, the input script may include a reference to a particular set of sensor parameters 118. In an example, the building block vectors themselves may be included within the input script (in addition to, or instead of, the building block IDs); however, it may be more efficient and flexible to store the building block data separately to the scripts such that the gesture synthesizer 112 accesses the required building blocks as defined by the input script.
An example of an input script in XML is shown below which comprises a three level structure: a <file> may contain multiple <vector> entries, each for a specific contact path. These in turn may contain multiple <line> entries (which each correspond to a building block) which define the line segments which make up the contact path. Multiple building blocks may therefore be used to define complex paths for a single contact.
- <configuration>
- <file path=“Vertical.synth”>
- <vector id=“0”>
- <file path=“Horizontal.synth”>
- <vector id=“0”>
- <file path=“TwoFingerVertical.synth”>
- <vector id=“0”>
- <vector id=“1”>
- <file path=“Compound.synth”>
- <vector id=“0”>
There are many different ways that an input script (e.g. as described above) could be generated and in an example the script may be generated by hand, i.e. defined from line segments (as described above). In another example, however, the script may be extracted from an existing gesture recording and then some noise or spatial/temporal shifting may be added. This example generates alternatives to existing recorded gestures. For example, using this method, the synthetic gesture generator may be used to generate the traces for slowed down (or speeded up) motions of existing user gestures. In a further example, the script may be a creative morph of one or more gesture traces to generate new traces which can be used to test or train a gesture recognizer. In yet another example, the script may be generated using a degree of randomness in order to simulate many (or all) possibilities of what the sensor could sense, regardless of whether fingers or not are used. Such a script could be used to detect noise in a sensor or to remove inadvertent actuation. In a further example, an existing gesture dataset recorded on a different device may be used to generate scripts for a new sensor/device (even where the existing sensor and the new sensor are of a completely different form) and then the new synthesized gesture data could be used to test the gesture recognition. Where the existing and new sensors have a different form or otherwise are very different, the generation of scripts may use a topological or form fitting function.
Although
Based on the input data (from blocks 202-206) the synthesizer generates a series of time-stamped contact co-ordinates for each gesture component (block 208). The contact co-ordinates are generated at the frame rate of the sensor (as detailed in the data received in block 204), such that, in a simple example, if a gesture component defines a movement which starts at time t=0 and lasts for 1 second, a contact co-ordinate would be generated for that gesture component at each of t={0, 1/f, 2/f, 3/f, . . . , f/f} seconds where f is the frame rate of the sensor in frames/second. In generating the contact co-ordinates (in block 208), the normalized co-ordinates specified in the gesture component may be scaled according to the actual dimensions of the sensor (as defined in the sensor parameters received in block 204). Alternatively, the values of the co-ordinates may be scaled to real-world values at the stage where the sensor image's capacitance levels are calculate (i.e. in block 210). The contact co-ordinates for each time-stamp are generated (in block 208) by dividing the total motion into the appropriate number of steps given the time taken to complete the motion and the frame rate of the sensor (f steps in the simple example given). The process is repeated (within block 208) for each gesture component to produce a series of time-stamped contact co-ordinates for each gesture component.
A sensor image is constructed for each time-stamp (in block 210) based on the series of time-stamped contact co-ordinates (generated in block 208) and a contact-to-sensor function and two example methods of performing this are shown in the flow diagrams of
In the second example method shown in
It can be seen that the two methods shown in
However, this can be simplified to:
y=y1y2h
where: y1=1−(w−2x2)
y
2=1−(w−2z2)
The resultant sensor image comprises a matrix of capacitance values (which may be defined in terms of the levels of capacitance that the sensor detects) at each cell center. In an example, the matrix for the 7×7 grid shown in the lower part of
When adding a sensor image to an existing sensor image (e.g. in block 308 or 312) and where the images have contact areas which partially overlap (e.g. due to two fingers being next to each other and in contact with each other), the way the two images are added takes into consideration the sensor type and the way that the sensor responds to touch events. In the example described above with reference to
the resultant matrix will be:
In an example where the base capacitance value is non-zero (e.g. 200), the actual sensor images may be calculated as described above, but in generating the final image (e.g. the resultant matrix above), all the images for a particular time-stamp may be subtracted from the base image. Using the example above, the resultant matrix would be:
In some examples, it may be useful to add at least one blank (or empty) sensor image onto the end of the series of sensor images (block 211). In an example, eight blank sensor images may be added onto the end of the series of images generated in block 210. These one or more blank sensor images at the end of the data may enable the gesture recognizer 102 to recognize the end of a gesture and/or may be used to clear any cache of sensor images within the gesture recognizer.
Having generated the series of sensor images (in block 210, e.g. as shown in the flow diagram in
Although the gesture synthesizer is described above as generating a synthetic gesture trace comprising a series of synthesized sensor images in a format that is the same as a recorded gesture, in some examples the format may not be the same (e.g. it may be slightly different to a recorded gesture). For example, a user input device (such as a touch-sensitive mouse) may produce the series of images from a recorded gesture in the form of multiple USB packets per sensor frame and the gesture synthesizer may produce a corresponding set of images but with one USB packet per frame.
The synthesizer may be able to output more than one type of file and a synthesizer may output a single file type for a synthesized gesture or may output a number of files of different types for a single synthesized gesture. A first example output file is a binary file which comprises the sensor data (described above), a free-form data block detailing information regarding the sensor being synthesized (e.g. device type) and in addition, for each time-stamp, records details of any active contact points and whether these contact points are the start/middle/end of a gesture. A second example output file is an XML file which comprises a binary stream of sensor data (run-length encoded, as described above) and a free-form data block (which may detail information regarding the sensor being synthesized). A third example output file (which may also be an XML file) details just the series of time-stamped contact co-ordinates without the sensor data and this file may be used to control a robot for automated device verification, as described in more detail below. In a variation of this third example, the output file may comprise vector instructions instead of the time-stamped contact co-ordinates, for example where the particular robot implementing the commands only requires instructions to start moving in a defined direction at a particular speed and instructions to stop moving for particular time-stamps, rather than specific co-ordinates for each time-stamp.
The example method shown in
In the example described above, the series of time-stamped contact co-ordinates are generated (in block 208) by dividing the motion into the same number of steps as there are frames. In some examples, however, the resultant co-ordinates may then be adjusted (within block 208 of
The first of the parameters considered in
The second of the parameters considered in
In an example embodiment of a gesture synthesizer, the synthetic gesture traces may be generated using a number of assumptions. These assumptions may comprise: that all files generated start at the same time offset (e.g. t=0) and the size and shape of the contact generated (e.g. the finger size, which may, for example, be as shown in the upper diagram of
Computing-based device 800 comprises one or more processors 802 which may be microprocessors, controllers or any other suitable type of processors for processing computing executable instructions to control the operation of the device in order to generate synthetic gesture traces. In some examples, for example where a system on a chip architecture is used, the processors 802 may include one or more fixed function blocks (also referred to as accelerators) which implement a part of the method of generating a synthetic gesture trace in hardware (rather than software or firmware). Platform software comprising an operating system 806 or any other suitable platform software may be provided at the computing-based device to enable application software 808, including a gesture synthesizer 810, to be executed on the device.
The computer executable instructions may be provided using any computer-readable media that is accessible by computing based device 800. Computer-readable media may include, for example, computer storage media such as memory 804 and communications media. Computer storage media, such as memory 804, includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store information for access by a computing device. In contrast, communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transport mechanism. Although the computer storage media (memory 804) is shown within the computing-based device 800 it will be appreciated that the storage may be distributed or located remotely and accessed via a network or other communication link (e.g. using communication interface 818).
The memory 804 may also be used to store one or more of the gesture component data 812, the sensor parameters 814, the data defining required gestures 816 and the generated synthetic gesture traces 817. The input data 812, 814, 816 to the gesture synthesizer 810 may alternatively be received via the communication interface 818 or be input directly by a user (e.g. via input/output controller 822).
The computing-based device 800 may also comprise an input/output controller 822 which is arranged to receive and process input from one or more devices, such as a user input device 823 (e.g. a mouse or a keyboard). As described above, this user input may be used to provide input data for the gesture synthesizer 810. The input/output controller 822 may also be arranged to output display information to a display device 824 which may be separate from or integral to the computing-based device 800. The display information may provide a graphical user interface, e.g. to enable a user to specify gestures for synthesizing and/or to provide any other inputs required. It may, in addition or instead, display the generated images or the resultant synthesized trace to the user. In an embodiment the display device 824 may also act as the user input device 823 if it is a touch sensitive display device. The input/output controller 822 may also output data to devices other than the display device, e.g. a locally connected printing device (not shown in
The gesture synthesizer and methods of generating synthetic gesture traces described above are agnostic to the actual sensor being synthesized (e.g. sensor 106 in
As described above, the synthetic gesture traces generated using the methods and apparatus described above can be used in training or evaluating (i.e. testing) a gesture recognizer, which may be a parametric gesture recognizer, and which may be implemented in software or hardware. In an example, the synthetic gesture traces may be used to test very specific features, which may be the numerical limits of operation of the sensor (e.g. in terms of contact positions, gesture speeds etc) or at the limits of gesture recognition (e.g. where two gestures are very similar in one or more respects). Where gestures are defined in terms of regions of the sensor (e.g. a start region and/or an end region), the synthetic gestures may be used to explore the edges of these regions and to test that the gesture recognizer correctly classifies the gestures. In an example, a single input script may be used to define multiple gestures which explore the required test space. The synthetic gesture traces may also be used to test specific features to look at exactly what interpretation (by the gesture recognizer) results from a detailed scenario.
In a particular embodiment, the sensor (e.g. sensor 106 in
The gesture recognizer may also be used to output files which can be used to control a test robot in order to perform verification of sensor devices. In such an example application, the test robot comprises a test finger with substantially the same characteristics as a real finger (for the purposes of detection of capacitance on a sensor) and the motion of the test finger on the device may be controlled using a file output by the gesture synthesizer. This file may comprise contact co-ordinate data and/or vector data, as described above. The resultant output from the sensor under test (e.g. images 108-110 in
Although the present examples are described and illustrated herein as being implemented in a system in which much of the mathematics is performed within a normalized frame of reference, the system described is provided as an example and not a limitation. As those skilled in the art will appreciate, the present examples are suitable for application in a variety of different frames of reference and whilst normalization can simplify the computations required, the computations may be performed without this normalization.
In the description above, reference is made to detecting finger position and finger gestures and this is by way of example only. Although many systems use finger gestures, the methods described above are applicable to any gesture recognizer which receives an input from a capacitive touch sensor and the detected gestures may alternatively be made with a different part of the body or with an input device, such as a stylus or pen. It will be appreciated that the contact-to-sensor functions (e.g. as shown in the top part of
The term ‘computer’ is used herein to refer to any device with processing capability such that it can execute instructions. Those skilled in the art will realize that such processing capabilities are incorporated into many different devices and therefore the term ‘computer’ includes PCs, servers, mobile telephones, personal digital assistants and many other devices.
The methods described herein may be performed by software in machine readable form on a tangible storage medium e.g. in the form of a computer program comprising computer program code means adapted to perform all the steps of any of the methods described herein when the program is run on a computer and where the computer program may be embodied on a computer readable medium. Examples of tangible (or non-transitory) storage media include disks, thumb drives, memory etc and do not include propagated signals. The software can be suitable for execution on a parallel processor or a serial processor such that the method steps may be carried out in any suitable order, or simultaneously.
This acknowledges that software can be a valuable, separately tradable commodity. It is intended to encompass software, which runs on or controls “dumb” or standard hardware, to carry out the desired functions. It is also intended to encompass software which “describes” or defines the configuration of hardware, such as HDL (hardware description language) software, as is used for designing silicon chips, or for configuring universal programmable chips, to carry out desired functions.
Those skilled in the art will realize that storage devices utilized to store program instructions can be distributed across a network. For example, a remote computer may store an example of the process described as software. A local or terminal computer may access the remote computer and download a part or all of the software to run the program. Alternatively, the local computer may download pieces of the software as needed, or execute some software instructions at the local terminal and some at the remote computer (or computer network). Those skilled in the art will also realize that by utilizing conventional techniques known to those skilled in the art that all, or a portion of the software instructions may be carried out by a dedicated circuit, such as a DSP, programmable logic array, or the like.
Any range or device value given herein may be extended or altered without losing the effect sought, as will be apparent to the skilled person.
It will be understood that the benefits and advantages described above may relate to one embodiment or may relate to several embodiments. The embodiments are not limited to those that solve any or all of the stated problems or those that have any or all of the stated benefits and advantages. It will further be understood that reference to ‘an’ item refers to one or more of those items.
The steps of the methods described herein may be carried out in any suitable order, or simultaneously where appropriate. Additionally, individual blocks may be deleted from any of the methods without departing from the spirit and scope of the subject matter described herein. Aspects of any of the examples described above may be combined with aspects of any of the other examples described to form further examples without losing the effect sought.
The term ‘comprising’ is used herein to mean including the method blocks or elements identified, but that such blocks or elements do not comprise an exclusive list and a method or apparatus may contain additional blocks or elements.
It will be understood that the above description of a preferred embodiment is given by way of example only and that various modifications may be made by those skilled in the art. The above specification, examples and data provide a complete description of the structure and use of exemplary embodiments of the invention. Although various embodiments of the invention have been described above with a certain degree of particularity, or with reference to one or more individual embodiments, those skilled in the art could make numerous alterations to the disclosed embodiments without departing from the spirit or scope of this invention.