The subject matter disclosed herein generally relates to monitoring of media. Specifically, the present disclosure addresses methods, devices, and systems involving pattern-based monitoring of the media synchronization.
In the 21st century, media frequently takes the form of media data that may be communicated as a stream of media data, stored permanently or temporarily in a storage medium, or any combination thereof. In many situations, multiple streams of media data, with each stream representing distinct media content, are combined for synchronized rendering (e.g., playback). For example, a movie generally includes a video track and at least one audio track. The movie may also include non-video non-audio content, such as, for example, textual content used in providing closed captioning services or an electronic programming guide. As a further example, a broadcast television program may include interactive content for providing enhanced media services (e.g., reviews, ratings, advertisements, internet-based content, games, shopping, or payment handling).
Combinations of various media data are well-known in the art. Such combinations of media include audio accompanied by metadata that describes the audio, video with multiple camera angles (e.g., from security cameras or for flight simulator screens), video with regular audio and commentary audio, video with audio in multiple languages, and video with subtitles in multiple languages. In short, any number of streams of media data, of any type, may be combined together to effect a particular transmission of information or to provide a particular viewer experience. This combining of media data streams is often referred to as “multiplexing” the streams together.
Synchronization between or among multiplexed streams of media data may be affected by various systems and devices used to communicate the media data. It is generally considered helpful to preserve the synchronization of multiplexed streams of media data. For example, in a movie, the video and audio tracks of the movie are synchronized so that audio from spoken dialogue is heard with corresponding video of the speaker talking. This is commonly known as “lip-sync” between audio and video. Any shifting of the audio with respect to the video degrades lip-sync.
Although mild degradations in synchronization are common and generally acceptable to many viewers, if the synchronization becomes too degraded, the ability of the media to effect a particular transmission of information or to provide a particular viewer experience may be lost. In the movie example, if the audio is heard too far behind, or too far in advance of, the corresponding video, lip-sync is effectively lost, and the viewer experience may be deemed unacceptable by an average viewer.
Some embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings in which:
Example methods, devices, and systems are directed to pattern-based monitoring of media synchronization. Examples merely typify possible variations. Unless explicitly stated otherwise, components and functions are examples and may be combined or subdivided, and operations may vary in sequence or be combined or subdivided. In the following description, for purposes of explanation, numerous specific details are set forth to provide a thorough understanding of example embodiments. It will be evident to one skilled in the art, however, that the present subject matter may be practiced without these specific details.
To monitor media synchronization of media data, reference media data (e.g., original source media data) and monitored media data (e.g., transmitted and received media data) are accessed. Media data may be accessed as streams of media data, as media data stored in a memory, or any combination thereof. A first pattern of first media content (e.g., a video event) and a second pattern of second media content (e.g., an audio event) are identified in the reference media data, and their corresponding counterparts are identified in the monitored media data as a third pattern of first media content (e.g., a video event) and a fourth pattern of second media content (e.g., an audio event). After these four patterns are identified, a first time interval is determined between two of the patterns, and a second time interval is determined between two of the patterns. A difference between the two time intervals is then determined and stored in a memory. This difference may be presented via a user interface as a media synchronization error of the monitored media data as compared to the reference media data.
Identification of a pattern of media content may be based on any type of information used to record, store, communicate, render, or otherwise represent the media content. For example, a pattern of media content may be identified based on information that varies in time. Examples of such time-variant information include, but are not limited to, luminance information (e.g., luminance of video), amplitude information (e.g., amplitude of a sound wave), textual information (e.g., text in subtitles), time code information (e.g., a reference clock signal), automation information (e.g., instructions to control a machine), or any combination thereof.
In some example embodiments, identification of a pattern involves selecting a reference portion of the reference media data (e.g., a reference video or audio clip) and a candidate portion of the monitored media data (e.g., a candidate video or audio clip), determining a correlation value based on the reference and candidate portions, and determining that the correlation value is sufficient to identify the pattern (e.g., a video or audio event). In certain example embodiments, identification of a pattern involves selecting first and second portions of media data (e.g., first and second video frames of a video clip, or first and second audio envelopes of an audio clip), respectively determining first and second values of the first and second portions, determining a temporal change based on the first and second values, and determining that the temporal change is sufficient to identify the pattern (e.g., a video or audio event). In various example embodiments, identification of a video event involves removing a video image border (e.g., padding, matting, or letter-boxing) by selecting a video frame, identifying pixels representative of the image border, and storing the image pixels as the video frame.
The same media content is communicated via both the reference path 120 and the monitored path 130, even though media data communicated via the reference path 120 may differ from media data communicated via the monitored path 130. For example, the monitored path 130 may involve use of one or more systems, devices, conversions, transformations, alterations, or modifications that are not used in the reference path 120. As a result, considering data as binary bits of information, the media data communicated via the reference path 120 will differ significantly from the media data communicated via the monitored path 130. However, for example, if the media data communicated via the reference path 120 represents particular media content (e.g., a fiery explosion in a movie), then the media data communicated via the monitored path 130 represents that same particular media content (e.g., the same fiery explosion in the same movie).
The access module 115 accesses reference media data and monitored media data. To this end, the access module 115 accesses a memory that stores media data permanently or temporarily (e.g., memory 112, a buffer memory, a cache memory, or a machine-readable medium). A stream of media data may be accessed by reading data payloads of network packets used to communicate the media data. In some example embodiments, accessing a stream of media data involves reading the data payloads from a memory. The access module 115 may be implemented as a hardware module, a processor implemented module, or any combination thereof.
The identification module 117 identifies a pattern of media content. For example, the identification module 117 may identify a video event in reference media data, a video event in monitored media data, an audio event in reference media data, an audio event and monitored media data, or any combination thereof. As additional examples, the identification module 117 may identify a text event in reference media data, a text event in monitored media data, a time code event in reference media data, a time code event in monitored media data, or any combination thereof. Further operation of the identification module 117 may identify further patterns of media content. Example methods of identifying a pattern of media content are described in greater detail below with respect to
The processing module 119 determines a first time interval between two patterns identified by the identification module 117. The processing module 119 also determines a second time interval between two patterns identified by the identification module 117. The two patterns used to determine the first time interval need not be the same two patterns used to determine the second time interval. The processing module 119 determines a difference between the first and second time intervals and stores the difference in the memory 112. Example methods of determining first and second time intervals are described in greater detail below with respect to
The processor 111 may be any type of processor as described in greater detail below with respect to
The reference video data 411 includes a reference video clip 415, which in turn includes a reference video event 451. The reference audio data 413 includes a reference audio clip 416, which in turn includes a reference audio event 461. Similarly, the monitored video data 421 includes a monitored video clip 425, which in turn includes a monitored video event 452, and the monitored audio data 423 includes a monitored audio clip 426, which in turn includes a monitored audio event 462.
The reference video event 451 and the monitored video event 452 correspond to each other and represent the same video content (e.g., a fiery explosion in a movie). Similarly, reference audio event 461 and the monitored audio event 462 correspond to each other and represent the same audio content (e.g., a loud boom). The audio content corresponds to the video content in the sense that both have been multiplexed into the reference stream 410 for synchronized rendering. However, nothing requires that the audio content correspond contextually, semantically, artistically, or musically with the video content. For example, the audio content may be dialogue that corresponds to video content other than the video content represented in the reference video event 451 and the monitored video event 452.
As shown in
As shown in
As shown in
As shown in
In
Because the reference video event 451 and the monitored video event 452 correspond to each other, and because the reference audio event 461 and the monitored audio event 462 correspond to each other, any difference between the video time interval 570 and the audio time interval 580 represents an additional delay that has been introduced into the monitored stream 420. As noted above, this may be referred to as a media synchronization error (e.g., a lip-sync error) in the monitored stream 420 with respect to the reference stream 410.
In the reference media data 610, media content 611 includes a portion 615, which in turn includes a first pattern 651. Media content 611 also includes another portion 617. Media content 613 includes a portion 616, which in turn includes a second pattern 661. Similarly, in the monitored media data 620, media content 621 includes a portion 625, which in turn includes a third pattern 652. Media content 621 also includes an additional portion 627. Media content 623 includes a portion 626, which in turn includes a fourth pattern 662.
As shown in
In any of the methodologies discussed herein (e.g., with respect to
In operation 910, the access module 115 accesses reference media data (e.g., reference media data 610, or reference stream 410) stored in the memory 112. In operation 920, the access module 115 accesses monitored media data (e.g., monitored media data 620, or monitored stream 420) stored in the memory 112.
In operation 930, the identification module 117 identifies a first pattern of first media content (e.g., pattern 651, or video event 451) and identifies a second pattern of second media content (e.g., pattern 661, or audio event 461). The identifications of the first and second patterns are based on the reference media data accessed in operation 910. Further details with respect to identification of a pattern are given below are described below with respect to
In operation 940, the identification module 117 identifies a third pattern of first media content (e.g., pattern 652, or video event 452) and identifies a fourth pattern of second media content (e.g., pattern 662, or audio event 462). The identifications of the third and fourth patterns are based on the monitored media data accessed in operation 920.
In operation 950, the processing module 119 determines a reference time interval (e.g., reference time interval 470) between the first and second patterns, which were identified in operation 930. For example, the processing module 119 may determine the reference time interval by calculating a time difference (e.g., via a subtraction operation) between the starting times of the first and second patterns. In operation 960, the processing module 119 determines a monitored time interval (e.g., monitored time interval 480) between the third and fourth patterns, which were identified in operation 940. As an example, the processing module 119 may determine the monitored time interval by calculating a time difference between the starting times of the third and fourth patterns.
In operation 970, the processing module 119 determines and stores a difference between the reference time interval (e.g., reference time interval 470) and the monitored time interval (e.g., monitored time interval 480). For example, the processing module 119 may subtract the monitored time interval from the reference time interval to obtain the difference between the two time intervals. The difference is stored in the memory 112. In operation 980, the user interface module 113 presents the difference as a media synchronization error (e.g., media sync error 490).
In operation 1010, the access module 115 accesses reference media data (e.g., reference media data 610, or reference stream 410) stored in the memory 112. In operation 1020, the access module 115 accesses monitored media data (e.g., monitored media data 620, or monitored stream 420) stored in the memory 112.
In operation 1030, the identification module 117 identifies a first pattern of first media content (e.g., pattern 651, or video event 451) and identifies a second pattern of second media content (e.g., pattern 661, or audio event 461). The identifications of the first and second patterns are based on the reference media data accessed in operation 1010. Further details with respect to identification of a pattern are given below are described below with respect to
In operation 1040, the identification module identifies a third pattern of first media content (e.g., pattern 652, or video event 452) and identifies a fourth pattern of second media content (e.g., pattern 662, or audio event 462). The identifications of the third and fourth patterns are based on the monitored media data accessed in operation 1020.
In operation 1050, the processing module 119 determines a first time interval (e.g., video time interval 570) between the first and third patterns, which are of first media content (e.g., video content). For example, the processing module 119 may determine the first time interval by calculating a time difference (e.g., via a subtraction operation) between the starting times of the first and third patterns. In operation 1060, the processing module determines a second time interval (e.g., audio time interval 580) between the second and fourth patterns, which are of second media content (e.g., audio content). As an example, the processing module may determine the second time interval by calculating a time difference between the starting times of the second and fourth patterns.
In operation 1070, the processing module 119 determines and stores a difference between the first time interval (e.g., video time interval 570) and the second time interval (e.g., audio time interval 580). For example, the processing module 119 may subtract the second time interval from the first time interval to obtain the difference between the two time intervals. The difference is stored in the memory 112. In operation 1080, the user interface module 113 presents the difference as a media synchronization error.
In operation 1110, the identification module 117 selects a reference portion of reference media data (e.g., portion 615 of reference media data 610, or video clip 415 of reference stream 410) stored in the memory 112. In operation 1120, the identification module 117 selects a candidate portion of monitored media data (e.g., portion 625 of monitored media data 620, or video clip 425 of monitored stream 420) stored in the memory 112.
In operation 1130, the identification module 117 determines a correlation value based on the reference and candidate portions, which were selected in operations 1110 and 1120. The correlation value is a result of a mathematical correlation function applied to reference data included in the reference portion and to candidate data included in the candidate portion.
Operation 1140 involves determining that the correlation value is sufficient to identify a pattern of media content (e.g., a video or audio event) as common to both the reference portion and the candidate portion. In operation 1140, the identification module 117 compares the correlation value to a correlation threshold. If the correlation value transgresses (e.g., exceeds) the correlation threshold, the identification module 117 determines that the correlation value is sufficient to treat the reference portion and the candidate portion as representative of the same pattern, thus facilitating identification of the pattern. For example, the identification module 117 may determine that the correlation value is sufficient to identify video event 452 of video clip 425 as corresponding to video event 451 of video clip 415. As another example, the identification module 117 may determine that the correlation value is sufficient to identify audio event 462 of audio clip 426 as corresponding to audio event 461 of audio clip 416.
In operation 1210, the identification module 117 selects first and second portions of media data (e.g., portions 615 and 617 from reference media data 610, or portions 625 and 627 from monitored media data 620) stored in the memory 112. The first and second portions are selected from the same media content (e.g., content 611). For example, the first and second portions may be two video frames (e.g., video frame 810) from a stream of video data (e.g., video data 411). As another example, the first and second portions may be two audio envelopes from a stream of audio data (e.g., audio data 413).
In operation 1220, the identification module 117 determines a first value of the first portion, which was selected in operation 1210. In operation 1230, the identification module 117 determines a second value of the second portion, which was selected in operation 1210. A first or second value may be a result of a mathematical transformation of data included in the selected portion of media content (e.g., a mean value, a median value, or a hash value). For example, a first or second value may be a mean value of a video frame (e.g., video frame 810, or image pixels 830 stored as a video frame). As another example, a first or second value may be a median value of an audio envelope.
In operation 1240, the identification module 117 determines a temporal change based on the first and second values, determined in operations 1220 and 1230. The temporal change represents a variation in time between the first portion of media content and the second portion of media content. For example, the temporal change may represent an increase in luminance from one video frame to another. As another example, the temporal change may represent a decrease in amplitude of sound waves from one audio envelope to another.
Operation 1250 involves determining that the temporal change is sufficient to identify a pattern of media content (e.g., a video or audio event). In operation 1250, the identification module 117 compares the temporal change to a temporal threshold. If the temporal change transgresses (e.g., exceeds) the temporal threshold, the identification module 117 determines that the temporal change is sufficient to treat the first and second portions as representative of an event within the media content (e.g., content 611), thus facilitating identification of the event. For example, the identification module 117 may determine that the temporal change is sufficient to identify a video event (e.g., video event 451) as being a video event. As another example, the identification module 117 may determine that the temporal change is sufficient to identify an audio event (e.g., audio event 461) as being an audio event.
Example embodiments may provide the capability to monitor media synchronization without any need to transmit a test pattern (e.g., an audio test tone, video color bars, or a beep-flash test signal) through the various systems and devices used to communicate the media data, since the appearance of test patterns may be regarded by viewers as interruptive of normal media programming. An ability to monitor media synchronization may facilitate detection of media synchronization errors induced by one or more systems, devices, conversions, transformations, alterations, or modifications involved in a monitored data path (e.g., monitored path 130). Example embodiments may also facilitate improvement in viewer experiences of media due to frequent or continuous monitoring of media synchronization, reduced network traffic corresponding to reduced complaints from viewers, and an improved capability to identify specific media data likely to cause a media synchronization error.
The computer system 1300 includes a processor 1302 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a field-programmable gate array (FPGA), an application specific integrated circuit (ASIC), a radio-frequency integrated circuits (RFIC), or any combination thereof), a main memory 1304, and a static memory 1306, which communicate with each other via a bus 1308. The computer system 1300 may further include a graphics display unit 1310 (e.g., a plasma display panel (PDP), a liquid crystal display (LCD), a projector, a light emitting diode (LED), or a cathode ray tube (CRT)). The computer system 1300 may also include an alphanumeric input device 1312 (e.g., a keyboard), a cursor control device 1314 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 1316, a signal playback device 1318 (e.g., a speaker), and a network interface device 1320.
The storage unit 1316 includes a machine-readable medium 1322 on which is stored instructions 1324 (e.g., software) embodying any one or more of the methodologies or functions described herein. The instructions 1324 may also reside, completely or at least partially, within the main memory 1304, within the processor 1302 (e.g., within the processor's cache memory), or both, during execution thereof by the computer system 1300, the main memory 1304 and the processor 1302 also constituting machine-readable media. The instructions 1324 may be transmitted or received over a network 1326 via the network interface device 1320.
As used herein, the term “memory” refers to a machine-readable medium able to store data temporarily or permanently and may be taken to include, but not be limited to, random-access memory (RAM), read-only memory (ROM), buffer memory, flash memory, and cache memory. While the machine-readable medium 1322 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 1324). The term “machine-readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., software) for execution by the machine and that cause the machine to perform any one or more of the methodologies described herein. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, a data repository in the form of a solid-state memory, an optical medium, a magnetic medium, or any combination thereof.
Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware modules. A “hardware module” is tangible unit capable of performing certain operations and may be configured or arranged in a certain physical manner. In various example embodiments, one or more computer systems (e.g., a standalone computer system, a client computer system, or a server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
In some embodiments, a hardware module may be implemented mechanically, electronically, or any combination thereof. For example, a hardware module may include dedicated circuitry or logic that is permanently configured to perform certain operations. For example, a hardware module may be a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC). A hardware module may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations. For example, a hardware module may include software encompassed within a general-purpose processor or other programmable processor. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
Accordingly, the term “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions described herein. As used herein, “processor-implemented module” refers to a hardware module implemented using one or more processors.
Similarly, the methods described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or processors or processor-implemented modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations.
The one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., an application program interface (API)).
The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
Some portions of this specification are presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a machine memory (e.g., a computer memory). These algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an “algorithm” is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to such signals using words such as “data,” “content,” “bits,” “values,” “elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.
Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or any combination thereof), registers, or other machine components that receive, store, transmit, or display information. Furthermore, unless specifically stated otherwise, the terms “a” or “an” are herein used, as is common in patent documents, to include one or more than one instance. Finally, as used herein, the conjunction “or” refers to a non-exclusive “or,” unless specifically stated otherwise.