Media packages for audio-video (AV) content include collections of media files. In order to distribute numerous localized versions of the AV content that are variously compliant with local censorship requirements, consistent with localized imagery for a given geographical region, or both, each media package may differ slightly in the makeup of the collection of media files it contains. The same is true of audio content, where dubbed audio may be used. Analogously, timed text may be added or changed depending on the requirements in each locale.
However, media files can be very large and time consuming to encode. Due to the typically high degree of redundant underlying media content in each localized version of a media package, systems that build media packages representing all localized versions of a feature are faced with the challenge of identifying media files that can be reused across the set of localized versions, in order to avoid the costs and inefficiencies resulting from the unnecessary encoding of media files that are already available. Thus, there is a need in the art for a solution enabling identification of existing media files suitable for reuse, in order to avoid media file encoding redundancies.
The following description contains specific information pertaining to implementations in the present disclosure. One skilled in the art will recognize that the present disclosure may be implemented in a manner different from that specifically discussed herein. The drawings in the present application and their accompanying detailed description are directed to merely exemplary implementations. Unless noted otherwise, like or corresponding elements among the figures may be indicated by like or corresponding reference numerals. Moreover, the drawings and illustrations in the present application are generally not to scale, and are not intended to correspond to actual relative dimensions.
Media packages for audio-video (AV) content, such as Interoperable Master Format (IMF) packages for example, include collections of media files, such as Media Exchange Format (MXF) video track files for instance. As stated above, in order to distribute numerous localized versions of AV content that are variously compliant with local censorship requirements, consistent with localized imagery for a given geographical region, or both, each media package may differ slightly in the makeup of the collection of media files it contains. The same is true of audio content, where dubbed audio may be used. Analogously, timed text may be added or changed depending on the requirements in each locale.
However, media files such as MXF video track files, can be very large and time consuming to encode. Accordingly, and due to the typically high degree of redundant underlying media content in each localized version of a media package, systems that build media packages representing all localized versions of a feature face the challenge of identifying media files that can be reused across the set of localized versions, in order to avoid the costs and inefficiencies resulting from the unnecessary encoding of media files that are already available.
The present application discloses systems and methods that address and overcome the challenge described above by automating avoidance of media file encoding redundancies. The automated solution disclosed by the present application advances the state-of-the-art by providing a media file focused approach to identifying redundant content for inclusion in a media package, thereby advantageously enabling the cost reductions and efficiencies resulting from reuse of existing media file encodings. Various implementations of the present application may be utilized in co-pending U.S. application Ser. No. 18/123,644, filed Mar. 20, 2023, and titled “Automation of Differential Media Uploading,” which is hereby incorporated by reference in its entirety into the present application.
It is noted that, as defined for the purposes of the present application, the terms “automation.” “automated.” and “automating” refer to systems and processes that do not require the participation of a human system administrator. Although in some implementations the media file redundancy determinations made by the systems and methods disclosed herein may be reviewed or even modified by a human system administrator, that human involvement is optional. Thus, the methods described in the present application may be performed under the control of hardware processing components of the disclosed systems.
It is also noted that the types of media content to which the present automated solution may be applied include AV content having both audio and video components, audio unaccompanied by video, and video unaccompanied by audio. In addition, or alternatively, in some implementations, the type of media content to which the present novel and inventive solution can be applied may be or include digital representations of persons, fictional characters, locations, objects, and identifiers such as brands and logos, for example, which populate a virtual reality (VR), augmented reality (AR), or mixed reality (MR) environment. Moreover, that content may depict virtual worlds that can be experienced by any number of users synchronously and persistently, while providing continuity of data such as personal identity, user history, entitlements, possessions, payments, and the like. It is noted that the solution disclosed by the present application may also be applied to content that is a hybrid of traditional audio-video and fully immersive VR/AR/MR experiences, such as interactive video.
As further shown in
It is noted that although system 100 may be communicatively coupled to media asset management system 116 via communication network 112 and network communication links 114, in some implementations, media asset management system 116 including unique identifier database 117 may be integrated with computing platform 102 of system 100, or may be in direct communication with system 100, as shown by dashed communication link 118.
Although the present application refers to software code 140 as being stored in system memory 106 for conceptual clarity, more generally, system memory 106 may take the form of any computer-readable non-transitory storage medium. The expression “computer-readable non-transitory storage medium,” as defined in the present application, refers to any medium, excluding a carrier wave or other transitory signal that provides instructions to hardware processor 104 of computing platform 102. Thus, a computer-readable non-transitory medium may correspond to various types of media, such as volatile media and non-volatile media, for example. Volatile media may include dynamic memory, such as dynamic random access memory (dynamic RAM), while non-volatile memory may include optical, magnetic, or electrostatic storage devices. Common forms of computer-readable non-transitory storage media include, for example, optical discs, RAM, programmable read-only memory (PROM), erasable PROM (EPROM), and FLASH memory.
It is further noted that although
Hardware processor 104 may include multiple hardware processing units, such as one or more central processing units, one or more graphics processing units, and one or more tensor processing units, one or more field-programmable gate arrays (FPGAs), custom hardware for machine-learning training or, and an application programming interface (API) server, for example. By way of definition, as used in the present application, the terms “central processing unit” (CPU). “graphics processing unit” (GPU), and “tensor processing unit” (TPU) have their customary meaning in the ail. That is to say, a CPU includes an Arithmetic Logic Unit (ALU) for carrying out the arithmetic and logical operations of computing platform 102, as well as a Control Unit (CU) for retrieving programs, such as software code 140, from system memory 106, while a GPU may be implemented to reduce the processing overhead of the CPU by performing computationally intensive graphics or other processing tasks. A TPU is an application-specific integrated circuit (ASIC) configured specifically for artificial intelligence applications such as machine learning modeling.
In some implementations, computing platform 102 may correspond to one or more web servers, accessible over a packet-switched network such as the Internet, for example. Alternatively, computing platform 102 may correspond to one or more computer servers supporting a private wide area network (WAN), local area network (LAN), or included in another type of limited distribution or private network. In addition, or alternatively, in some implementations, system 100 may utilize a local area broadcast method, such as User Datagram Protocol (UDP) or Bluetooth, for instance. Furthermore, in some implementations, system 100 may be implemented virtually, such as in a data center. For example, in some implementations, system 100 may be implemented in software, or as virtual machines. Moreover, in some implementations, communication network 112 may be a high-speed network suitable for high performance computing (HPC), for example a 10 GigE network or an Infiniband network.
Although client system 120 is shown as a desktop computer in
With respect to display 122 of client system 120, display 122 may be physically integrated with client system 120, or may be communicatively coupled to but physically separate from client system 120. For example, where client system 120 is implemented as a smartphone, laptop computer, or tablet computer, display 122 will typically be integrated with client system 120. By contrast, where client system 120 is implemented as a desktop computer, display 122 may take the form of a monitor separate from client system 120 in the form of a computer tower. Furthermore, display 122 of client system 120 may be implemented as a liquid crystal display (LCD), a light-emitting diode (LED) display, an organic light-emitting diode (OLED) display, a quantum dot (QD) display, or any other suitable display screen that performs a physical transformation of signals to light.
Software code 240, plurality of video frames 230, unique identifier 228, encoded media file 250, and unique identifier database 217 correspond respectively in general to software code 140, plurality of video frames 130, unique identifier 128, encoded media file 150, and unique identifier database 117, in
As shown in
Depending upon the particular implementation, there may be advantages to constraining encoded media file 350 to include a predetermined creative interval of media content, to have a predetermined number of video frames, or to include a sequence of video frames having a predetermined time duration. In some implementations, for example encoded media file 350 may include encoded video frames 354a, 354b, 354c and 354d of a single shot 356 of video, while in other implementations, encoded media file 350 may include encoded video frames 354a-354j of a single scene 358 of video. It is noted that, as defined for the purposes of the present application, the expression “shot” of video refers to a sequence of video frames that are captured from a unique camera perspective without cuts or other cinematic transitions. It is further noted that as defined for the purposes of the present application, the expression “scene” of video refers to one or more shots of video that, together, deliver a single, complete and unified dramatic element of film narration, or block of storytelling within a film.
The functionality of software code 140/240 will be further described by reference to
Referring to
It is noted that the media content contained by plurality of video frames 130/230 may include AV content having both audio and video components, audio unaccompanied by video, and video unaccompanied by audio. In addition, or alternatively, in some implementations, the media content contained by plurality of video frames 130/230 may be or include digital representations of persons, fictional characters, locations, objects, and identifiers such as brands and logos, for example, which populate a VR. AR, or MR environment. Moreover, that media content may depict virtual worlds that can be experienced by any number of users synchronously and persistently, while providing continuity of data such as personal identity, user history, entitlements, possessions, payments, and the like. It is noted that unique identifier 128/228 may include content that is a hybrid of traditional audio-video and fully immersive VR/AR/MR experiences, such as interactive video.
Continuing to refer to
By way of example, where plurality of video frames 130/230 includes an integer number “N” of video frames, unique identifier 128/228 may be computed as follows:
Unique identifier 128/228=HASH(OUTPUT_PROFILE+Mi+M2+ ⋅ ⋅ ⋅ MN),
Where HASH( ) represents the application of a cryptographic hashing algorithm (e.g. SHA-256) to the elements included within the brackets ( ), OUTPUT_PROFILE is a token representing all parameters associated with a desired output encoding of encoded media file 150/250, each of the Mi represent the canonical media identifier of a respective one of the N video frames, and the + operator denotes string concatenation.
Continuing to refer to
Referring to
Action 464 may be performed by software code 140/240 of system 100, executed by hardware processor 104, by reference to media asset management system 116. For example, in some instances, unique identifier 128/228 previously stored in media asset management system 116 may be cross-referenced to encoded media file 150/250/350 by the entry for unique identifier 128/228 stored in unique identifier database 217, thereby enabling system 100 to identify encoded media content 150/250/350 contemporaneously with determining that unique identifier 128/228 is stored in media asset management system 116. That is to say, although flowchart 460 lists action 464 as following action 463, that representation is merely exemplary, and in some implementations actions 463 and 464 may be performed in parallel.
Continuing to refer to
As noted above, in some implementations, plurality of video frames 130/230 may be encoded to produce encoded media file 150/250/350 as a MXF video track file, for inclusion in a media package such as an IMF package for example. As further noted above, depending upon the particular implementation, there may be advantages to constraining encoded media file 150/250/350 to include a predetermined creative interval of content, to have a predetermined number of video frames, or to include a sequence of video frames having a predetermined time duration. Referring specifically to
Continuing to refer to
It is noted that although flowchart 460 lists action 466 as following action 465, that representation is merely exemplary. Because unique identifier 128/228 is associated with plurality of video frames 130/230, as well as with encoded media file 150/250/350, unique identifier 128/228 associated with encoded media file 150/250/350 is known prior to the encoding performed in action 465 and can be stored, in action 466, before the performance of action 466. In other words, in various implementations of the method outlined by flowchart 460, action 466 may precede action 465, may follow action 465, or may be performed in parallel with action 465, i.e., actions 465 and 466 may be performed contemporaneously.
Referring to
In other use cases, action 467 may follow directly from action 465, or may follow action 466 subsequent to the performance of action 465. In those implementations, hardware processor 104 of system 100 may execute software code 140/240 to provide encoded media file 150/250/350 to client system 120 utilized by user 124, via communication network 112 and network communication links 114. It is noted that, in various implementations, action 467 may precede action 466, may follow action 466, or may be performed in parallel with action 466, i.e., actions 466 and 467 may be performed contemporaneously.
With respect to the method outlined by flowchart 460, it is emphasized that actions 461, 462, 463, 464 and 467, or actions 461, 462, 463, 465, 466, and 467, may be performed in an automated process from which human involvement may be omitted.
Thus, the present application discloses systems and methods for automating avoidance of media file encoding redundancies. The automated solution disclosed by the present application advances the state-of-the-art by providing a media file focused approach to identifying redundant content for inclusion in a media package, thereby advantageously enabling the cost reductions and efficiencies resulting from reuse of existing media file encodings.
From the above description it is manifest that various techniques can be used for implementing the concepts described in the present application without departing from the scope of those concepts. Moreover, while the concepts have been described with specific reference to certain implementations, a person of ordinary skill in the art would recognize that changes can be made in form and detail without departing from the scope of those concepts. As such, the described implementations are to be considered in all respects as illustrative and not restrictive. It should also be understood that the present application is not limited to the particular implementations described herein, but many rearrangements, modifications, and substitutions are possible without departing from the scope of the present disclosure.