This application includes a compact disk appendix containing the following files ASCII text files:
The material on the compact disk submitted with this application is hereby incorporated herein by reference.
The present invention relates to transmitting video information and more particularly to systems for streaming and displaying video images.
In many situations, a scene or object is captured by multiple cameras, each of which capture a scene or object from a different angle or perspective. For example, at an athletic event multiple cameras, each at a different location, capture the action on the playing field. While each of the cameras is viewing the same event, the image available from the different cameras is different due to the fact that each camera views the event from a different angle and location. Such images can not in general be seamed into a single panoramic image.
The technology for streaming video over the Internet is well developed. Streaming video over the internet, that is, transmitting a series of images requires a substantial amount of bandwidth. Transmitting multiple streams of images (e.g. images from multiple separate cameras) or transmitting a stream of panoramic images requires an exceptionally large amount of bandwidth.
A common practice in situations where an event such as a sporting event is captured with multiple cameras, is to utilize an editor or technician in a control room to select the best view at each instant. This single view is transmitted and presented to users that are observing the event on a single screen. There are also a number of known techniques for presenting multiple views on a single screen. In one known technique, multiple images are combined into a single combined image which is transmitted and presented to users as a single combined image. With another technique the streams from the different cameras remain distinct and multiple streams are transmitted to a user who then selects the desired stream for viewing. Each of the techniques which stream multiple images require a relatively large amount of bandwidth. The present invention is directed to making multiple streams available to a user without using an undue amount of bandwidth.
The present invention provides a system for capturing multiple images from multiple cameras and selectively presenting desired views to a user. Multiple streams of data are streamed to a user's terminal. One data stream (called a thumbnail stream) is used to tell the user what image streams are available. In this stream, each image is transmitted as a low resolution thumbnail. One thumbnail is transmitted for each camera and the thumbnails are presented as small images on the users screen. The thumbnail stream uses a relatively small amount of bandwidth. Another data stream (called the focus stream) contains a series of high resolution images from a selected camera. The images transmitted in this streams are displayed in a relatively large area on the viewer's screen. A user can switch the focus stream to contain images from any particular camera by clicking on the associated thumbnail. In an alternate embodiment in addition to the thumbnails from individual cameras a user is also provided with a thumbnail of panoramic image (e. g. a full 360 degree panorama or a portion thereof) which combines into a single image, the images for multiple cameras. By clicking at a position on the panoramic thumbnail, the focus stream is switched to an image from viewpoint or view window located at the point in the panorama where the user clicked. In other alternate embodiments a variety of other data streams are also sent to the user. The other data streams sent to the user can contain (a) audio data, (b) interactivity markup data which describes regions of the image which provide interactivity opportunities such as hotspots, (c) presentation markup data which defines how data is presented on the user's screen, (d) a telemetry data stream which can be used for various statistical data. In still another embodiment one data stream contains a low quality base image for each data stream. The base images serve as the thumbnail images. A second data stream contains data that is added to a particular base stream to increase the quality of this particular stream and to create the focus stream.
An overall diagram of a first relatively simplified embodiment of the invention is shown in
The two video streams are sent to a user terminal and display 111. The images visible to the user are illustrated in
The user 306 can see a display 304. An example of what appears on display 304 is shown in
The details of a first embodiment of the invention are given in
The web client 402 includes a stream selection control 403. This may for example be a conventional mouse. When the user, clicks on one of the thumbnails, a signal is sent to the server 401 and the focus stream F is changed to the stream of images that coincides with the thumbnail that was clicked. In this embodiment server 401 corresponds to stream control 302 shown in
An optional procedure that can be employed to give a user the illusion that the change from one stream to another stream occurs instantaneously is illustrated in
As indicated by block 301, the data streams from the cameras are edited before they are sent to users. It is during this editing step that the thumbnail images are created as indicated in
The first embodiment of the invention is made to operate with the commercially available streaming video technology marketed by RealNetworks Inc. located in Seattle, Wash. RealNetworks Inc. markets a line of products related to streaming video including products that can be used to produce streaming video content, products for servers to stream video over the Internet and video players that users can use to receive and watch streamed video which is streamed over the Internet.
As indicated in
In the specific embodiment shown “video clips” are stored on a disk storage sub-system 411. Each video clip has a file type “.pan” and it contains the video streams from each of the four cameras and the thumbnail stream. When system receives a URL calling for one of these clips, the fact that the clip has a file type “.pan” indicates that the file should be processed by plug in 414.
One of the streams stored in a pan file is a default stream and this stream is sent as the focus stream until the user indicates that another stream should be the focus stream. Plug in 414 process requests from the user and provides the appropriate T and F streams to streaming server 413 which sends the streams to the user. The components of the plug 414 are explained later with reference to
As illustrated in
It should be clearly noted the specific examples given in
As shown in
The embodiment shown in
In other alternative embodiments which show a thumbnail of a panorama, as described above, in addition to (or in place of) the thumbnails of the individual camera views from the camera which were used to record the panorama, thumbnails from other camera are provided. These additional cameras may be cameras which are also viewing the same event, but from a different vantage point. Alternatively they can be from some related event.
A somewhat more complicated alternate embodiment of the invention is shown in
The server selects the streams that are to be streamed to the user as described with the first embodiment of the invention. The selected streams are then sent over a network (for example over the Internet) to the client system.
The additional data streams provided by this embodiment of the invention include an audio stream S4, an interactivity markup stream S3, a presentation markup stream S2 and a telemetry data stream S1. The audio stream S4 provides audio to accompany the video stream. Typically there would be an single audio stream which would be played when any of the video streams are viewed. For example, there may be a play by play description of a sporting event which would be applicable irrespective of which camera is providing the focus stream. However, there could be an audio stream peculiar to each video stream.
The interactivity markup stream S3 describes regions of the presentation which provide for additional user interaction. For example there may be a button and clicking on this button might cause something to happen. The interactivity markup stream consists of a series of encoded commands which give type and position information. The commands can be in a descriptive language such as XML encoded commands or commands encoded in some other language. Such command languages are known and the ability to interpret commands such as XML encoded commands is known.
The presentation markup stream provides an arbitrary collection of time synchronized images and data. For example, the presentation markup stream can provide a background image for the display and provide commands to change this background at particular times. The presentation mark up stream may provide data that is static or dynamic. The commands can, for example, be in the form of XLM encoded commands.
The telemetry data stream S1 can provide any type of statistical data. For example this stream can provide stock quotes or player statistics during a sporting event. Alternatively the stream could provide GPS codes indicating camera position or it could be video time codes.
Yet another alternate embodiment of the invention is shown in
A key consideration relative to video streaming is the bandwidth required. If unlimited bandwidth were available, all the data streams would be sent to the client. The present invention provides a mechanism whereby a large amount of data, for example data from a plurality of camera, can be presented to a user over a limited bandwidth in a manner such that the user can take advantage of the data in all the data streams. The specific embodiments shown relate to data from multiple camera that are viewing a particular event. However, the multiple streams need not be from cameras. The invention can be used in any situation where there are multiple streams of data which a user is interested in monitoring via thumbnail images. With the invention, the user can monitor the multiple streams via the thumbnail images and then make any particular stream the focus stream which becomes visible in an high quality image. Depending upon the amount of bandwidth available there could be a large number of thumbnails and there may be more than one focus stream that is sent and shown with a higher quality image.
The flowing table shows the bandwidth requirements of various configurations.
The interaction between the server and the client is illustrated in
As illustrated in
When the server receives the command, it stops streaming the old focus stream and starts streaming the new focus stream as indicated by arrow 995. A new layout for the user's display is also sent as indicated by arrow 996. It is noted that a wide variety of circumstances could cause the server to send to the client a new layout for the users display screen. When the client receives the new display layout, the display is reconfigured.
Arrow 997 indicates that the user can request an end to the streaming operation. Upon receipt of such a request or when the presentation (e.g. the clip) ends, the server stops the streaming operation and ends access to the presentation source as indicated by arrows 998. The server also ends the connection to the client as indicated by arrow 999 and the server session ends. It should be understood that the above example is merely illustrative and a wide variety of different sequences can occur.
Another embodiment of the invention operates by sending base information to create the thumbnail images and additional information to create the focus image. The user sees the same display with this embodiment as the user sees with the previously described embodiments; however, this embodiment uses less bandwidth. With this embodiment, the focus data stream is not a stream of complete images. Instead, the focus stream is merely additional information, that can be added to the information in one of the thumbnails images to create a high resolution image. The thumbnail images provide basic information which creates a low resolution thumbnail. The focus stream provides additional information which can be added to the information in a thumbnail to create a high resolution large image.
The following table illustrates the bandwidth savings:
Subdividing the image data can further reduce bandwidth by allowing optimized compression techniques to be used on each subdivision. Subdivisions may be made by any desirable feature of the imagery, such as pixel regions, foreground/background, frame rate, color depth, resolution, detail type, etc., or any combination of these. Each data stream can be compressed using a technique that preserves the highest quality for a given bandwidth given its data characteristics. The result is a collection of optimally compressed data streams, each containing a component of the resultant images. With this embodiment, each thumbnail image stream is constructed on the client by combining several of these data streams, and its corresponding focus image stream is constructed on the client by combining the thumbnail streams (or thumbnail images themselves) and more data streams.
For example, consider a multiple view video that consists of different views of live action characters superimposed against the same static background image. The client sees a low-resolution thumbnail stream for each view and a high-resolution focus stream of one of them. These view streams could be compressed as described before, with a low-resolution thumbnail stream and additional data streams for turning them into high-resolution focus streams. However, additional bandwidth savings can be realized if two features of the images streams are utilized: a) the frame rate of the background image is different than the foreground, specifically, the background image is static throughout the entire presentation, so only one image of it ever needs to be sent regardless of how many image frames the presentation is, and b) the same background image is used for all the view streams, so only one copy of the background image needs to be sent and can be reused by all the view streams. In order to realize this bandwidth savings, a foreground/background subdivision may be made to the video data in the following way:
In this embodiment, each image in the thumbnail stream is generated on the client by combining the low-resolution background image with the appropriate low-resolution foreground image. Each image in the focus stream is generated on the client by: adding the additional background image data to the low-resolution background image to generate the high-resolution foreground image, adding the additional foreground image data to the low-resolution foreground image to generate the high-resolution foreground image, and then combining the high-resolution foreground and background images to generate the final focus-stream image.
As another example, consider a video where each stream contains a view of a subject against a blurry background, such as one might see at a sporting event where a cameraman has purposely selected camera settings that allow the player to be in crisp focus while the crowd behind the player is significantly blurred. The client sees a low-resolution thumbnail stream for each view and a high-resolution focus stream of one of them. These views could be compressed with a quality setting chosen to preserve the detail in the player. However, bandwidth savings could be realized by utilizing the fact that the blurry crowd behind the player is unimportant to the viewer and can therefore be of lower quality. In order to realize this bandwidth savings, a pixel region subdivision can be made to the image data in the following way:
Each image in the thumbnail stream is generated on the client by combining the player region with the rest of that image. Each image in the focus stream is generated on the client by: adding the additional player region data to the low-resolution player image to generate the high-resolution player image, adding the additional remaining image data to the low-resolution remaining image region generate the high-resolution remaining image region, and then combining the two regions to generate the final focus-stream image.
As another example, consider a video where each stream contains fast-moving objects that are superimposed on slowly changing backgrounds. The client sees a low-resolution thumbnail stream for each view and a high-resolution focus stream of one of them. Each stream of video could use a frame rate that allows the fast-moving object to be displayed smoothly. However, bandwidth savings could be realized by utilizing the fact that the slowly changing background differs little from one frame to the next, while the fast-moving object differs significantly from one frame to the next. In order to realize this bandwidth savings, a pixel region subdivision must be made to the image data in the following way:
In this embodiment, each image in the thumbnail stream is generated on the client by combining the fast-moving object region with the most-recent frame of the rest of that image. Each image in the focus stream is generated on the client by: adding the additional fast-moving object region data to the low-resolution fast-moving object image to generate the high-resolution fast-moving object image, adding the additional remaining image data to the low-resolution remaining image region to generate the high-resolution remaining image region, and then combining the high-resolution fast-moving object regions with the most recent frame of the remaining image region to generate the final focus-stream image.
As another example, consider a video where each stream contains well-lit subjects in front of a differently lit background that results in a background that is shades of orange. The client sees a low-resolution thumbnail stream for each view and a high-resolution focus stream of one of them. Each stream of video could use the whole images as is. However, bandwidth savings could be realized by utilizing the fact that the background uses a restricted palette of orange and black hues. In order to realize this bandwidth savings, a pixel region subdivision must be made to the image data in the following way:
In this embodiment, each image in the thumbnail stream is generated on the client by combining the well-lit subject object region with the remaining image region in which the brightness values in the image were used to select the correct brightness of orange color for those parts of the image. Each image in the focus stream is generated on the client by: adding the additional well-lit subject region data to the low-resolution well-lit subject image to generate the high-resolution well-lit subject image, adding the additional remaining image data to the low-resolution remaining image region to generate the high-resolution remaining image region and using the brightness values in the image to select the correct brightness of orange color for those parts of the image, and then combining the high-resolution well-lit subject regions with the remaining image region generated earlier.
While the invention has been shown and described with respect to a plurality of preferred embodiments, it will be appreciated by those skilled in the art, that various changes in form and detail may be made without departing from the spirit and scope of the invention. The scope of applicant's invention is limed only by the appended claims.
This application is a continuation in part of application 60/205,942 filed May 18, 2000 and a continuation of in part of application 60/254,453 filed Dec. 7, 2000.
Number | Name | Date | Kind |
---|---|---|---|
4573072 | Freeman | Feb 1986 | A |
4602279 | Freeman | Jul 1986 | A |
4847698 | Freeman | Jul 1989 | A |
4847699 | Freeman | Jul 1989 | A |
4847700 | Freeman | Jul 1989 | A |
4918516 | Freeman | Apr 1990 | A |
5258837 | Gormley | Nov 1993 | A |
5382972 | Kannes | Jan 1995 | A |
5537141 | Harper | Jul 1996 | A |
5585858 | Harper | Dec 1996 | A |
5625410 | Washino et al. | Apr 1997 | A |
5632007 | Freeman | May 1997 | A |
5648813 | Tanigawa | Jul 1997 | A |
5682196 | Freeman | Oct 1997 | A |
5706457 | Dwyer et al. | Jan 1998 | A |
5724091 | Freeman | Mar 1998 | A |
5729471 | Jain et al. | Mar 1998 | A |
5745161 | Ito | Apr 1998 | A |
5774664 | Hidary | Jun 1998 | A |
5778181 | Hidary | Jul 1998 | A |
5861881 | Freeman | Jan 1999 | A |
6006265 | Rangan et al. | Dec 1999 | A |
6028603 | Wang et al. | Feb 2000 | A |
6144375 | Jain et al. | Nov 2000 | A |
6154771 | Rangan et al. | Nov 2000 | A |
6185369 | Ko | Feb 2001 | B1 |
6307550 | Chen et al. | Oct 2001 | B1 |
6452615 | Chiu et al. | Sep 2002 | B1 |
6591068 | Dietz | Jul 2003 | B1 |
6618074 | Seeley et al. | Sep 2003 | B1 |
6636259 | Anderson et al. | Oct 2003 | B1 |
6646655 | Brandt et al. | Nov 2003 | B1 |
6675386 | Hendricks et al. | Jan 2004 | B1 |
6741977 | Nagaya et al. | May 2004 | B1 |
20010013123 | Freeman | Aug 2001 | A1 |
20020049979 | White et al. | Apr 2002 | A1 |
20020089587 | White et al. | Jul 2002 | A1 |
20020133405 | Newman | Sep 2002 | A1 |
20020154210 | Ludwig et al. | Oct 2002 | A1 |
20020188943 | Freeman | Dec 2002 | A1 |
20030174154 | Yukie et al. | Sep 2003 | A1 |
Number | Date | Country |
---|---|---|
0 314 572 | May 1994 | EP |
0 982 943 | Mar 2000 | EP |
0 982 943 | May 2000 | EP |
1 021 036 | Jul 2000 | EP |
1 021 037 | Jul 2000 | EP |
1 021 038 | Jul 2000 | EP |
WO 9605699 | Feb 1996 | WO |
WO 9637075 | Nov 1996 | WO |
WO 9729458 | Aug 1997 | WO |
WO 9733434 | Sep 1997 | WO |
WO 9841020 | Sep 1998 | WO |
WO 0016544 | Mar 2000 | WO |
Number | Date | Country | |
---|---|---|---|
20030197785 A1 | Oct 2003 | US |
Number | Date | Country | |
---|---|---|---|
60254453 | Dec 2000 | US | |
60205942 | May 2000 | US |