The present disclosure is directed to techniques for assessing performance of media distribution systems.
Media distribution systems have become familiar to consumers of online content. Typically, they involve streaming of audio content, visual content (often, both) from a source device to a rendering device over a network such as the Internet. Although streamed content often have timestamps inserted by audio and video coding protocols, there are a wide variety of rendering environments in which the timestamps are not honored all the way through to audio and video rendering. It can occur that delays in a rendering pipeline for one element of media content may be different than for another media content, which causes the different content elements to be output at different times and causes consumers to perceive an error in rendering. Moreover, the disparity among different rendering environments can create impediments to diagnostic techniques that attempt to quantify such errors.
Even in portions of a media distribution system where coding timestamps are honored, processing events can impair performance of the distribution system. For example, processing stages imposed by content coding algorithms and distribution elements can lead to delays in rendering at end points of a distribution system. Video data that should be perceived as “live” may not be so accepted if undue processing delays occur. Other processing phenomena may arise that cause video frames to be lost. Diagnostic techniques would be enhanced if performance of the distribution system could be measured based on video data at the time it is output from a display device (colloquially, “at the glass”) rather than at some intermediate location within a system that may not account for all sources of processing delays or other errors.
The present disclosure describes techniques for measuring propagation delay of a media distribution system based on content output by rendering devices. An output from an output device of the media distribution system may be captured and a token may be detected from the captured content. A timecode may be derived from the detected token. The system's propagation delay may be determined from the derived timecode and may provide a basis to analyze system delays and other processing artifacts. In this manner, propagation artifacts may be estimated between multiple rendering devices that lack controls to synchronize their operation.
The system 100 may include several processing stages. [At the source 110, the system 100 may include a shared source 111 of timing information, a video token generator 112 and an audio token generator 113, each of which generates tokens from common timing information. The source 110 also may include a video coder 114 that generates video data from the token, and an audio coder 115 that generates audio data from the token. In some aspects, audio and video tokens may be integrated with other audio content and video content, respectively, but this need not occur in all cases. The resultant coded audio data and coded video data may be output from the source 110 to a network 130 for delivery to a rendering device 120.
The rendering device 120 may have rendering pipelines 122, 124 for video data and audio data, respectively. The video rendering pipeline 122 may receive coded video from the source 110, decode it, and display it on a display device such as an LCD screen (stages not shown). Similarly, the audio rendering pipeline 124 may receive coded audio from the source 110, decode it, and output it to speaker devices (also not shown). Thus, the rendering device 120 may output video content that contains the video tokens and it may output audio content that contains the audio tokens.
Although
The scanning device 140 may include a video capture system 141, an audio capture system 142, and token extractors 143, 144 for video and audio, respectively. The video capture system 141 may capture video content output by the rendering device 120, which includes the video content representing tokens contained therein. Similarly, the audio capture system 142 may capture audio content output by the rendering device 120, which includes the audio content representing tokens contained therein. The token extractors 143, 144 may extract timing information from token information contained within the captured video content and the captured audio content, and it may output the extracted timing information to an analyzer 145 within the scan device 140.
Typically, the source device 110 and the rendering device 120 may have runtime applications involving distribution of audio/visual content. For example, such applications' processes may be employed in coding, delivery, decoding and rendering of live media content. Aspects disclosed herein may be employed in diagnostic modes of operation of the system 100 in which latencies imposed by these processes are quantified.
The video processing pipeline 230 may include a video token generator 232, a video compositor 234, and a video encoder 236. The video token generator may generate visual content, a visual token, representing the timecode received from the timecode converter 220. The video compositor may integrate the visual token into video content received from an external source. The video encoder 236 may apply video coding operations to the resultant video and may output the coded video from the source device 200.
Exemplary processing operations performed by a video encoder 236 may include motion-compensated predictive coding such as by those defined the ITU-T H.264, H.265 coding specifications (or the predecessor specifications), stream assembly, segmentation and packetization, and buffering for transmission (steps not shown).
The audio processing pipeline 240 may include an audio token generator 242, an audio compositor 244, and an audio encoder 246. The audio token generator may generate an audio representation of the timecode received from the timecode converter 220 as an audio token. The audio compositor may integrate the audio token into audio content received from an external source. The audio encoder 246 may apply audio coding operations to the resultant audio and may output the coded audio from the source device 200.
Exemplary processing operations performed by an audio encoder 246 may include those performed for the MP3, Vorbis, AAC and/or Opus coders.
As can be seen from
The principles of the present invention find application with other video tokens. For example, video tokens may be generated as Gaussian sequence watermarks or gray scale watermarks. Moreover, transparent watermarks or steganographic watermarks may be applied to reduce perceptual artifacts that the video tokens otherwise may create.
Audio tokens may be generated both for use cases where consumer-oriented audio is to be presented by audio devices and for other use cases where consumer-oriented audio is not desired. For example, a spread spectrum audio watermark may be integrated with audio content, which may have a character that does not disturb a consumer's perception of the audio but can be detected and analyzed by computer analysis tools that employ pseudo-noise spread spectrum decoding. In an aspect where consumer-oriented audio is not to be conveyed, audio tokens may be generated as linear time code (LTC) signals.
As illustrated in
Although
A media player 410 may possess an audio decoder 460 and one or more audio rendering pipelines 470. The audio decoder 460 may decode coded audio data and may output the decoded audio to the audio rendering pipeline(s) 470. The audio rendering pipelines 470 may possess a communication fabric 472 over which audio output from the audio decoder 460 is supplied to the speaker device, an audio rendering processing system 474, and a speaker 476. Different devices may apply different processes by their respective processing systems 474, which may include volume control, spectral modifications, audio filtering, spatialization, stereo separation, beam forming and other processing operations designed to tailor the input audio to characteristics of the speaker. The speaker 476 represents hardware components of the speaker device that outputs audio from the speaker device.
In an aspect, a media player 410 may possess a local video token generator 480 and an audio token generator 490. In this aspect, media streams need not have tokens embedded in their content as illustrated in
For example, testing of an audio rendering pipeline 470 may be performed in a manner in which audio tokens generated from timestamps in an audio source are interleaved into audio that is fed to one of the audio rendering pipelines 470. Such a “blank-and-burst” may assist diagnoses of timing issues in one such audio rendering pipeline.
In an aspect, video tokens may be placed in video content in a layered relationship with respect to other video content elements, which may provide control over whether the video tokens will be displayed by a display device. In an aspect, a compositor, e.g., of the video token generator 480, may control display of a video token responsive to an externally-supplied control signal. For example, a device operator may place a rendering system 400 in a diagnostic mode at which time the compositor 480 may cause the video token to be included in video data output to the display device(s) 420 as above.
In another aspect, audio tokens may be placed in audio content in a layered relationship with respect to other audio content elements, which may provide control over whether the audio tokens will be displayed by a display device. In an aspect, a compositor, e.g., of the audio token generator 490, may selectively control output of the audio token responsive to an externally-supplied control signal. Again, a device operator may place a rendering system 400 in a diagnostic mode at which time the compositor 490 may cause the audio token to be included in audio data output to the display device(s) 420.
The camera 510 may capture video, output by a display panel of a rendering device (
The microphone 520 may capture audio output from a speaker device (
The timecode comparator 546 may analyze the timecodes output from the video token analyzer 542, the audio token analyzer 544 or both, to quantify propagation delay(s) through the source 110 and the rendering device 120 (
In a first analysis, the timecode comparator 546 may compare timecodes from the video token analyzer 542 to timecodes from the audio token analyzer 544 to quantify relative delays between the video delivery path, defined by a source device 110 and a rendering device 120 (
In a second analysis, the timecode comparator 546 may compare a timecode output from the video token analyzer 542 to a timing reference provided by a timing source 548 to quantify overall processing delay imposed by the video delivery path defined by a source device 110 and a rendering device 120 (
In a third analysis, the timecode comparator 546 may compare a timecode output from the audio token analyzer 544 to a timing reference provided by a timing source 548 to quantify overall processing delay imposed by the audio delivery path defined by a source device 110 and a rendering device 120 (
These techniques may form the basis of diagnostic operations to be performed on rendering applications when an aberrant operation behavior is detected. For example, if a viewer observed “lip-sync” issues between displayed video and rendered audio, the foregoing techniques may be applied to quantify timing differences between the video path and the audio path, and take corrective measures (for example, by introducing latency into one or more of the paths until synchronization is established). Similarly, rendering delays may be observed between otherwise paired video devices and/or paired audio devices. Here, again, path-specific delays may be quantified by the foregoing techniques, and corrective measure may be taken. The scanning device architecture illustrated in
In another application, media distribution organizations may employ the timecode analysis described herein to quantify processing delays imposed by their distribution systems. As discussed, the video coding and audio coding operations performed by source devices may impose processing delays. Personnel may employ a scanning device to compare the times at which timecodes are output from rendering devices 520, 530.1, 530.2 to times at which video and audio was admitted to source devices 110 (
In a parallel operation, the method 600 may generate an audio token from the timecode (box 640) and process the audio token by an audio distribution sub-system (box 645). The method 600 may output the audio token at an output device (box 650) after processing by the audio distribution sub-system. The audio token may be captured (box 655) as it is output. The timecode may be derived from the captured audio data representing the output audio token (box 660). The method 600 may analyze the timecodes (box 665) to quantify delays imposed by the video distribution sub-system, the audio distribution sub-system, or both.
Several extensions find application with the techniques disclosed hereinabove in
In a second exemplary application, tokens may be augmented to contain information regarding streams being used in adaptive streaming systems. Adaptive streaming systems make available multiple copies (“streams”) of a media item, which are coded at different bitrates. Oftentimes, rendering devices will select a stream for download and for rendering based on local estimate of operating condition (such as the available bandwidth and/or the available processing resources) and, if operating condition changes, the devices may change to a different stream, download it, and render it. Tokens may contain information about the streams in which they are contained, which may be collected by a processing system 540 for analysis. Thus, stream switching behavior of a rendering device may be monitored.
In a third exemplary application, tokens may be designed to include other information regarding the source information from which the tokens were generate. Such information may include: frame rate of the video, a source URL from which the video was obtained, dynamic range of the content and the display device, displayed dimension, source dimensions, codecs used to generate the media stream, and the like.
This application claims the benefit of U.S. Provisional Application No. 62/823,308 filed on Mar. 25, 2019, the disclosure of which is incorporated by reference herein.
Number | Name | Date | Kind |
---|---|---|---|
5487092 | Finney | Jan 1996 | A |
6108782 | Fletcher | Aug 2000 | A |
6907041 | Turner | Jun 2005 | B1 |
7134035 | Sharma | Nov 2006 | B2 |
7424080 | Liu | Sep 2008 | B1 |
8290423 | Wang | Oct 2012 | B2 |
8923141 | Bryant | Dec 2014 | B2 |
9058135 | Schumacher | Jun 2015 | B1 |
20020059535 | Bekritsky | May 2002 | A1 |
20060195780 | Zuccolotto | Aug 2006 | A1 |
20060203851 | Eidson | Sep 2006 | A1 |
20070124756 | Covell | May 2007 | A1 |
20070297799 | Tse-Au | Dec 2007 | A1 |
20080069150 | Badt | Mar 2008 | A1 |
20080082510 | Wang | Apr 2008 | A1 |
20100085989 | Belhadj | Apr 2010 | A1 |
20100135314 | Fourcand | Jun 2010 | A1 |
20100272102 | Kobayashi | Oct 2010 | A1 |
20110164625 | Fourcand | Jul 2011 | A1 |
20110202967 | Hecht | Aug 2011 | A1 |
20110286442 | Maurice | Nov 2011 | A1 |
20110317991 | Tsai | Dec 2011 | A1 |
20120249806 | Gong | Oct 2012 | A1 |
20120250704 | Yamada | Oct 2012 | A1 |
20120284434 | Warren | Nov 2012 | A1 |
20120331026 | Menkhoff | Dec 2012 | A1 |
20130103861 | Ahn | Apr 2013 | A1 |
20130185374 | Fukasawa | Jul 2013 | A1 |
20130194496 | Atherton | Aug 2013 | A1 |
20130215753 | Le Pallec | Aug 2013 | A1 |
20140150021 | Subramanian | May 2014 | A1 |
20140165085 | Karacali-Akyamac | Jun 2014 | A1 |
20140196077 | Gordon | Jul 2014 | A1 |
20190058929 | Young | Feb 2019 | A1 |
20200314467 | Goldrei | Oct 2020 | A1 |
Entry |
---|
Haouzia et al. “Methods for Image Authentication: A Survey”, Multimedia t˜Tools and Applications, vol. 39, No. 1 (2008), pp. 1-46. |
Petitcolas et al., “Information hiding-a survey”, Proceedings of the IEEE87, No. 7 (1999), pp. 1062-1078. |
Potdar et al., “A Survey of Digital Image Watermarking Techniques”, In INDIN'05. 2005 3rd IEEE International Conference on Industrial Informatics, 2005., pp. 709-716. |
Number | Date | Country | |
---|---|---|---|
20200314467 A1 | Oct 2020 | US |
Number | Date | Country | |
---|---|---|---|
62823308 | Mar 2019 | US |