There are ways that users may provide input to a computer system through direct manipulation, where the user interacts with user interface elements without aid of an on-screen cursor. This direct manipulation is in contrast to indirect manipulation, where an on-screen cursor is manipulated by a user, such as with a mouse or a scroll wheel. Examples of forms of direct manipulation include touch input to a touch-sensitive surface with a finger or stylus, digitizer pen input to a digitizer surface, voice input captured by a microphone, and body gesture or eye-tracking input provided to a motion capture device (such as the MICROSOFT KINECT motion capture device).
Referring specifically to touch input, users may provide input to a computer system through touching a touch-sensitive surface, such as with his or her finger(s), or a stylus. An example of this touch-sensitive surface is a track pad, like found in many laptop computers, in which a user moves his finger along a surface, and those finger movements are reflected as cursor or pointer movements on a display device. Another example of this touch-sensitive surface is a touch screen, like found in many mobile telephones, where a touch-sensitive surface is integrated into a display device, and in which a user moves his finger along the display device itself, and those finger movements are interpreted as input to the computer.
There are also general techniques for using multiple fingers at the same time as input to a computer system. These techniques are sometimes referred to as “multi-point” or “multi-touch.” A “multi-point” gesture commonly is one that involves multiple fingers or other input devices, whereas a “multi-touch” gesture commonly is one that involves interacting with multiple regions of a touch surface, though the term is commonly used as synonymous with “multi-point.” As used herein, the terms will both be used to mean a gesture that comprises the use of multiple fingers or other input devices.
An example of such a multi-point gesture is where a user presses two fingers on a touch-sensitive surface and drags them down, and this input is interpreted as scrolling the active window on the desktop down. Current techniques for user input to a touch-sensitive surface and other forms of direct manipulation are limited and have many problems, some of which are well known.
It would therefore be an improvement to provide an invention for improved direct manipulation input. The present invention relates to ways to manipulate video, images, text columns or other elements embedded within a window or page, such as a web page.
There are known techniques for controlling the size or zoom of a window generally. For instance, a user may tap twice on an area of a touch-sensitive surface to zoom in on part of the display that corresponds to the area tapped. There are also “pinch” and “reverse-pinch” gestures that enable a user to zoom out and zoom in, respectively. In a pinch gesture, a user puts two fingers on the touch-sensitive surface and converges them (drags them closer together), which generally is interpreted as input to zoom out, centered on the area being “pinched.” In a reverse-pinch gesture, a user puts two fingers on the touch-sensitive surface and then diverges them (drags them apart), which generally is interpreted as input to zoom in, centered on the area being “reverse-pinched.”
The problem with the tap, pinch, and reverse-pinch gestures is that they provide a poor means for a user to achieve a common goal—to “snap” an element of a user interface (such as a video, an image, or a column of text) to a border (frequently the edge of a display area). A scenario that benefits greatly from snapping techniques is zooming a video to full-screen—snapping the outer edges of the video to the edges of the display area (the display area comprising a display on which the video is displayed, or a distinct portion of that display, such as a window within that display). A user may use a reverse-pinch gesture to zoom a video to full-screen, but it is difficult to do this exactly because of the impreciseness of using one's fingers to manipulate the video an exact number of pixels—the user may zoom the video past full-screen, meaning that some of the video is not displayed, or the user may not zoom the video to full-screen, meaning that the video does not fill the screen, as is desired.
Furthermore, even where a current technique, such as a tap on an element, causes the element to snap to a border, this technique harms the user experience because it denies the user a belief that he is in control of the manipulation. When the user taps on an element, it may be that rather than snapping this element to a border, a second element that encloses this element is what is snapped to the border. In such a scenario, the user is left feeling as if he is not controlling the computer.
Techniques for indirect manipulation of elements for snapping work poorly in the direct manipulation environment. Where a user snaps or unsnaps an element with a cursor, there is no direct relationship between the position of the user's hand that moves the mouse (or how the user otherwise provides indirect input) and the cursor and element being manipulated. Since the user is not manipulating the element directly, the user does not notice that, when an element unsnaps, it is does not “catch up” to the user's hand position, which has continued to move even while the element was snapped. Rather, it merely unsnaps. This does not work in a direct manipulation scenario, because now the user's finger (for instance) on the touch screen leads the element by a distance. To provide a better user experience in the direct manipulation scenario, the element must catch up to the user's finger (or other form of direct manipulation) after unsnapping.
The present invention overcomes these problems. In an example embodiment, as the user reverse-pinches to zoom in on a video, the invention tracks the amount of a zoom. When the user has zoomed to the point where one of the dimensions (height or width) of the video reaches a threshold (such as some percentage of a dimension of the display device—e.g. the width of the video reaches 80% of the width of the display device), the invention determines to display the video in full-screen, and “snaps” the video to full-screen. The invention may do this by way of an animation, such as expanding the video to fill the screen.
In another example embodiment, a user performs direct manipulation input to move an element toward a threshold at which a snap is applied. When the element reaches the snap threshold (such as a position on screen), it is snapped to a snap position. As the user continues to provide direct manipulation to the element, it remains snapped to the snap position until the user's direct manipulation reaches an unsnap threshold (such as another position on screen). The element is then unsnapped from the snap threshold, and the element is moved faster than the direct manipulation until the element catches up to the direct manipulation. For instance, where a finger on a touch-screen is used to move an element by pressing down on a part of the screen where the element is displayed, the element catches up to the direct manipulation when it resumes being displayed on a portion of the touch-screen touched by the finger.
The primary embodiment of the invention discussed herein involves the manipulation of a dimension of a video. As used herein, mentions of dimension should be read to also include a change in the position of the video. Such a scenario where a change in the position of the video result in snapping may be where the position of a video is moved so that it is sufficiently close to the edge of a display area that it is determined that the video is to be snapped to the edge of the display area.
There are other aspects of the invention, which are described in the detailed description of the drawings. Such aspects include snapping an element to a border by manipulating its pitch or yaw, or by manipulating its translation (its center point within a region).
As used herein, “video” may refer to a video itself, or the container in which a video may be played, even though a video may not be played in the container at the time the user makes the full-screen zoom gesture, or other gesture. It may be appreciated that the invention may be applied to still images, text, and other elements, as well as video, though video is discussed herein as the primary embodiment.
It may also be appreciated that a video may not have the same dimensions, or aspect ratio, as the display device upon which is displayed. For instance, the video may have a 4:3 aspect ratio (where the width of the video is 4/3 times greater than the height of the video) and it may be displayed on a display device with a 16:9 aspect ratio. In this scenario, as the video expands, its height may reach the height of the display before its width reaches the width of the display. Thus, in this scenario, full screen may be considered filling the video such that height of the video is set to be as large as the height—“limiting dimension” of the display device. Then, the rest of the display device may be filled with something other than the video, such as black (sometimes referred to as “black bars”).
In another scenario where the aspect ratio of the video differs from the aspect ratio of the display device, full screen may comprise “cropping” the video, where the video is expanded until every pixel of the display is occupied by the video, even though some of the video is not displayed. Using the above example of a 4:3 video and a 16:9 display device, the video may be expanded until the width of the video equals the width of the display device. This will result in parts of the top and bottom of the video to be “cut off,” or not displayed, though some of the video will occupy all of the display device. This is sometimes referred to as “filling” the screen.
Other embodiments of an invention for using touch gestures to zoom a video to full-screen exist, and some examples of such are described with respect to the detailed description of the drawings.
The systems, methods, and computer-readable media for using touch gestures to zoom a video to full-screen are further described with reference to the accompanying drawings in which:
Embodiments may execute on one or more computer systems.
The term processor used throughout the description can include hardware components such as hardware interrupt controllers, network adaptors, graphics processors, hardware based video/audio codecs, and the firmware used to operate such hardware. The term processor can also include microprocessors, application specific integrated circuits, and/or one or more logical processors, e.g., one or more cores of a multi-core general processing unit configured by instructions read from firmware and/or software. Logical processor(s) can be configured by instructions embodying logic operable to perform function(s) that are loaded from memory, e.g., RAM, ROM, firmware, and/or mass storage.
Referring now to
A number of program modules comprising computer-readable instructions may be stored on computer-readable media such as the hard disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25, including an operating system 35, one or more application programs 36, other program modules 37 and program data 38. Upon execution by the processing unit, the computer-readable instructions cause the actions described in more detail below to be carried out or cause the various program modules to be instantiated. A user may enter commands and information into the computer 20 through input devices such as a keyboard 40 and pointing device 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner or the like. These and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus, but may be connected by other interfaces, such as a parallel port, game port or universal serial bus (USB). A monitor 47, display or other type of display device can also be connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the display 47, computers typically include other peripheral output devices (not shown), such as speakers and printers. The exemplary system of
The computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 49. The remote computer 49 may be another computer, a server, a router, a network PC, a peer device or other common network node, and typically can include many or all of the elements described above relative to the computer 20, although only a memory storage device 50 has been illustrated in
When used in a LAN networking environment, the computer 20 can be connected to the LAN 51 through a network interface or adapter 53. When used in a WAN networking environment, the computer 20 can typically include a modem 54 or other means for establishing communications over the wide area network 52, such as the Internet. The modem 54, which may be internal or external, can be connected to the system bus 23 via the serial port interface 46. In a networked environment, program modules depicted relative to the computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used. Moreover, while it is envisioned that numerous embodiments of the present disclosure are particularly well-suited for computerized systems, nothing in this document is intended to limit the disclosure to such embodiments.
System memory 22 of computer 20 may comprise instructions that, upon execution by computer 20, cause the computer 20 to implement the invention, such as the operational procedures of
The interactive display device 200 (sometimes referred to as a touch screen, or a touch-sensitive display) comprises a projection display system having an image source 202, optionally one or more mirrors 204 for increasing an optical path length and image size of the projection display, and a horizontal display screen 206 onto which images are projected. While shown in the context of a projection display system, it will be understood that an interactive display device may comprise any other suitable image display system, including but not limited to liquid crystal display (LCD) panel systems and other light valve systems. Furthermore, while shown in the context of a horizontal display system, it will be understood that the disclosed embodiments may be used in displays of any orientation.
The display screen 206 includes a clear, transparent portion 208, such as sheet of glass, and a diffuser screen layer 210 disposed on top of the clear, transparent portion 208. In some embodiments, an additional transparent layer (not shown) may be disposed over the diffuser screen layer 210 to provide a smooth look and feel to the display screen.
Continuing with
To sense objects located on the display screen 206, the interactive display device 200 includes one or more image capture devices 220 configured to capture an image of the entire backside of the display screen 206, and to provide the image to the electronic controller 212 for the detection objects appearing in the image. The diffuser screen layer 210 helps to avoid the imaging of objects that are not in contact with or positioned within a few millimeters of the display screen 206, and therefore helps to ensure that only objects that are touching the display screen 206 (or, in some cases, in close proximity to the display screen 206) are detected by the image capture device 220. While the depicted embodiment includes a single image capture device 220, it will be understood that any suitable number of image capture devices may be used to image the backside of the display screen 206. Furthermore, it will be understood that the term “touch” as used herein may comprise both physical touches, and/or “near touches” of objects in close proximity to the display screen
The image capture device 220 may include any suitable image sensing mechanism. Examples of suitable image sensing mechanisms include but are not limited to CCD (charge-coupled device) and CMOS (complimentary metal-oxide-semiconductor) image sensors. Furthermore, the image sensing mechanisms may capture images of the display screen 206 at a sufficient frequency or frame rate to detect motion of an object across the display screen 206 at desired rates. In other embodiments, a scanning laser may be used in combination with a suitable photo detector to acquire images of the display screen 206.
The image capture device 220 may be configured to detect reflected or emitted energy of any suitable wavelength, including but not limited to infrared and visible wavelengths. To assist in detecting objects placed on the display screen 206, the image capture device 220 may further include an additional light source 222 such as one or more light emitting diodes (LEDs) configured to produce infrared or visible light. Light from the light source 222 may be reflected by objects placed on the display screen 222 and then detected by the image capture device 220. The use of infrared LEDs as opposed to visible LEDs may help to avoid washing out the appearance of projected images on the display screen 206.
The graph depicted in
It may be appreciated that the element does not instantly snap to the snap position 610 when the lower snap threshold 612 is reached (if this were the case, the position of the element between the lower snap threshold 612 and the snap position 610 would be graphed as a vertical line). Rather, the movement of the element is accelerated toward the snap position 610, as is reflected by that portion of the plot of the position of the element over time 608 having a steeper slope during that portion than during the preceding portion.
As the user continues to move his finger past the lower snap threshold 612 toward the upper snap threshold 614, the position of the element does not change, but remains at the snap position 610. When the position of the user's finger reaches the upper snap threshold 614, the position of the element “un-snaps” and moves at a greater rate of change than the position of the user's finger, until it catches up to the position of the user's finger. Elements 616, 618, 620, and 622 depict various times at which these movements occur, and will be explained in greater detail with respect to
Also depicted in
In
In
In
In
In
In
In
In
In
Operation 1702 depicts displaying a user interface on a display device, the user interface comprising a first area. For instance, the user interface may comprise a web page, and the first area may comprise an embedded video that is embedded in that web page. The user interface in which the first area is displayed may occupy the entirety of the display device's visual output, or a subset thereof, such as a window that is displayed in part of the display device's visual output.
In an embodiment, the first area comprises an area in which a video may be displayed, an image, or a column of text. The first area may comprise such an area where a border or dimension may be defined, so that the border may be snapped to a snap border upon determining that the dimension is equal to a threshold.
Operation 1704 comprises determining that the first area comprises a visual media. This operation may comprise parsing a web page in which the video is displayed (such as by parsing Hyper Text Markup Language—HTML—and other code that make up the web page, or a Document Object Model—DOM—of a document) to determine that the first area contains a visual media, such as a video or an image.
Operation 1706 comprises determining the size of the dimension of the first area. As with operation 1704, this operation may comprise parsing a web page in which the video is displayed, such as by evaluating a “height” or “width” attribute that is defined for the first area in the web page.
Operation 1708 comprises determining an aspect ratio of the first area. As with operations 1704 and 1706, this may comprise parsing a web page in which the video is displayed, such as by evaluating both a “height” and a “width” attribute that is defined for the first area in the web page to determine an aspect ratio (the aspect ratio of visual media commonly being the ratio of the width to the height).
Operation 1710 depicts determining that user input received at a touch-input device is indicative of modifying a dimension of the first area to a threshold value. This user input may comprise a reverse-pinch gesture.
In an embodiment, operation 1710 includes determining that the user input is indicative of increasing the height, the width, a dimension or the area of the first area to a threshold value. The user may make touch input to move the first area, or zoom in on the first area. This user input may be processed as just that—moving the first area, or zooming the first area, respectively—so long as the input does not cause a dimension of the first area to be modified to a threshold value (such as zoomed until its width is at least 75% of the width of the display device's display area).
In an embodiment where the touch-input device and the display device comprise a touch screen, the user input is received at a location where the first area is displayed on the touch screen. It may be that the user is using a touch screen—where the display device itself is configured to accept user touch input on the display area. Where a touch screen is involved, the user may interact with the first area by touching the area of the touch screen where the first area is displayed.
Operation 1712 depicts displaying the first area snapped to a border on the display device. Upon determining that the user input has caused a dimension of the first area to equal a threshold value, the user interface may show the first area snapped to a border of the display device. This is not necessarily the border of the display device (the topmost, leftmost, rightmost, or bottommost part of the display area), but rather a “snap border”—a predefined place where elements (such as the first area) that have a dimension brought above a threshold are snapped to. For instance, a snap border may involve snapping the first area so that it is centered on the display device. Also, displaying the first area snapped to a border on the display device may comprise displaying the first area in a full-screen mode, where the border comprises the topmost, leftmost, rightmost, and bottommost parts of the display area.
In an embodiment, operation 1712 includes animating a transition from the dimension of the first area equaling a threshold value to displaying the first area snapped to a border on the display device. In an embodiment where the user input is indicative of increasing the size of the first area at a rate, and wherein animating the transition comprises animating the transition at a second rate, the second rate being greater than the rate. Once it has been determined to snap the first area to a border, it may be beneficial to perform this snap faster than the user manipulating the first area, so as to speed up the process.
In an embodiment, operation 1712 comprises, before displaying the first area in a full-screen mode, determining that a second user input received at the touch-input device is indicative of modifying the dimension below the threshold value; displaying the first area wherein the first area is not snapped to the border; and wherein displaying the first area snapped to the border occurs in response to determining that a third user input received at the touch-input device is indicative of modifying the dimension to the threshold value. After a user's input has caused the first area to reach a threshold value, he may still disengage the change to snapping the area to a border. The user may do this by performing a gesture that indicates manipulation of the first area in the opposite manner. For instance, where before he was diverging his fingers to zoom in, he may disengage by converging his fingers to zoom out, or where before he was moving his fingers to the right to move the element to the right, he may disengage by moving his fingers to the left to move the element to the left.
In an embodiment, operation 1712 comprises modifying the translation, pitch, or yaw of the first area in snapping it to the border. Translation refers to whether the first area is centered on the area in which it is snapped. For instance, where snapping the first area to the border comprises displaying the first area in a full-screen mode, and the first area is located below and to the left of the center point of the display area when this snapping is to initiate, the translation of the first area may be modified so that it is centered in the display area.
The pitch of the first area may also be changed when snapping it to a border. For instance, the first area and the display area may both be rectangles, and the border to snap the first area to may be the border of the display area. If the side of the first area to be snapped to the border is not parallel with the border, then there is a difference in pitch between the side of the first area and the border, and that is modified during the snapping process so that the edge is flush with the border. The yaw of the first area may also be modified in a similar manner as the pitch of an area. The yaw of the first area may be different from that of the border in certain scenarios, such as where the user interface is three-dimensional (3D) or has a z-depth in addition to a value within a x-y plane in a Cartesian coordinate system.
Operation 1714 depicts determining that a second user input received at the touch-input device is indicative of modifying a dimension of the first area to a second threshold value; and terminating displaying the first area snapped to a border on the display device. Once the first area is displayed snapped to the border, the user may provide input to disengage this snapping. This input may comprise continuing with his input as before which caused the snapping, or by providing differing input. For instance, as depicted in
Likewise, as depicted in
Operation 1716 depicts displaying a control for a media displayed in the first area; and hiding the control in response to determining that user input received at the touch-input device is indicative of modifying the dimension of the first area to the threshold value. For instance, when the user causes the video in the first area to snap to a full-screen mode, this may be because the user wishes to sit back from the display and watch the video. In such an instance, the user experience may be improved by hiding these media controls when the video is snapped to a full-screen mode.
It may be appreciated that not all operations of
While the present invention has been described in connection with the preferred aspects, as illustrated in the various figures, it is understood that other similar aspects may be used or modifications and additions may be made to the described aspects for performing the same function of the present invention without deviating there from. Therefore, the present invention should not be limited to any single aspect, but rather construed in breadth and scope in accordance with the appended claims. For example, the various procedures described herein may be implemented with hardware or software, or a combination of both. Thus, the methods and apparatus of the disclosed embodiments, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium. When the program code is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus configured for practicing the disclosed embodiments. In addition to the specific implementations explicitly set forth herein, other aspects and implementations will be apparent to those skilled in the art from consideration of the specification disclosed herein. It is intended that the specification and illustrated implementations be considered as examples only.