The present invention generally relates to a method for automatically adjusting the display of image frames of a video during playback and in particular, for locally accelerating/decelerating visual effect parameters and/or play speed characteristics of any music video in a manner that is temporally proximate to the local beat and/or to the local rhythm of the song used in the music video.
Disclosed are systems, apparatuses, methods, computer-readable medium, and circuits for identifying rhythm temporal locations (trigger locations) in the soundtrack of a multimedia content item and adjusting a playback speed of video frames around the trigger locations. According to at least one example, a method includes: identifying an audio event trigger and a frame playback rate of a multimedia content item; and adjusting a set of image frames of the multimedia content item based on the identified audio event trigger by using a groover graph to locally accelerate or decelerate the frame playback rate for the set of image frames surrounding the audio event trigger, wherein the groover graph includes a set of time offsets that can be applied to a timing of the image frames with respect to a temporal position of the audio event trigger, and wherein shifting is greatest for times nearest to the temporal position.
A similar method can be applied to groove shader parameters instead of the play speed of a video. According to at least one example, a method includes: identifying an audio event trigger and a shader parameter course; wiring a shader to one or more event vectors to automatically trigger certain graphical effects at times associated with the one or more of the event vectors during playback of an adjusted multimedia content item, wherein the shader is a function used to modify an appearance of a set of image frames; and locally accelerating or decelerating, at the identified audio event trigger, the shader parameter course by using a groover graph.
In another example, a program for identifying rhythm temporal locations (trigger locations) in the soundtrack of a multimedia content item and adjusting a playback speed of video frames around the trigger locations is provided that includes a storage (e.g., a memory configured to store data, such as virtual content data, one or more images, etc.) and one or more processors (e.g., implemented in circuitry) coupled to the memory and configured to execute instructions and, in conjunction with various components (e.g., a network interface, a display, an output device, etc.), cause the processors to perform operations comprising: identifying a first audio event trigger and a frame playback rate of a multimedia content item; adjusting a set of image frames of the multimedia content item based on the identified first audio event trigger by using a groover graph to locally accelerate or decelerate the frame playback rate for the set of image frames surrounding the identified first audio event trigger, wherein the groover graph includes a set of time offsets that can be applied to a timing of the image frames with respect to a temporal position of the identified first audio event trigger, and wherein shifting is greatest for times nearest to the temporal position; identifying a second audio event trigger and a shader parameter course; wiring a shader to one or more event vectors to automatically trigger certain graphical effects at times associated with the one or more of the event vectors during playback of an adjusted multimedia content item, wherein the shader is a function used to modify an appearance of a second set of image frames; and locally accelerating or decelerating, at the identified second audio event trigger, the shader parameter course by using the groover graph.
The detailed description set forth below is intended as a description of various configurations of the subject technology and is not intended to represent the only configurations in which the subject technology can be practiced. The appended drawings are incorporated herein and constitute a part of the detailed description. The detailed description includes specific details for the purpose of providing a more thorough understanding of the subject technology. However, it will be clear and apparent that the subject technology is not limited to the specific details set forth herein and may be practiced without these details. In some instances, structures and components are shown in block diagram form in order to avoid obscuring the concepts of the subject technology.
Aspects of the disclosed technology provide solutions for enhancing a user's experience of content playback, such as that of a user that is viewing multimedia content such as a music video, on a display device, such as a smartphone or tablet computer. Although some of the examples described herein are discussed in relation to a mobile device, such as a smartphone, it is understood that the various aspects of the disclosed invention can be implemented on any device for which display parameters (e.g., a frame display speed) can be adjusted.
In some aspects, the disclosed technology identifies rhythm temporal locations in the soundtrack of a multimedia content item (e.g., audio event triggers) and adjusts a playback speed of video frames around the trigger locations. Depending on the desired implementation, triggers can correspond with the playback of different instrument types, or instrument combinations. However, in at least some approaches, triggers can represent locations in a song's playback duration where rhythm audio events (e.g. kick hits, snare hits, etc.) and/or singular audio events occur. In some aspects, the playback speed of a music video can be increased by advancing the timing of displayed frames before and after the occurrence of an audio event, i.e., a trigger. As discussed in further detail below, the audio triggers can correspond with rhythm audio events, such as kick hits and/or snare hits, etc.
In some aspects, the disclosed technology identifies rhythm temporal locations in the soundtrack of a multimedia content item (e.g., audio event triggers) and adjusts a playback speed of shader parameters around the trigger locations. Depending on the desired implementation, triggers can correspond with the playback of different instrument types, or instrument combinations. However, in at least some approaches, triggers can represent locations in a song's playback duration where rhythm audio events (e.g. kick hits, snare hits, etc.) and/or singular audio events occur. In some aspects, the original graph of shader parameters can be accelerated (respectively decelerated) before (respectively after) the occurrence of an audio event, i.e., a trigger, or the otherwise. As discussed in further detail below, the audio event triggers can correspond with rhythm audio events, such as kick hits and/or snare hits, etc.
In some implementations, trigger locations can be determined using a beat and/or rhythm decomposition process (e.g., a beat decomposition process), for example, that generates event vectors including time-index information. The vectors or numeric arrays can therefore indicate the temporal location of the triggers. Additional details regarding processes for analyzing and identifying audio artifacts in a musical composition (e.g., an audio file) are discussed in relation to U.S. application Ser. No. 16/503,379, entitled “BEAT DECOMPOSITION TO FACILITATE AUTOMATIC VIDEO EDITING,” which is herein incorporated by reference in its entirety.
As illustrated in example 200, the magnitude of frame advancement can decrease after the trigger event has passed. For example, a magnitude of playback time advancement of frame E is greater than a magnitude of playback time advancement of frame F, etc. Once the trigger 206 has passed, frame playback can resume to a normal speed. For example, frame H of grooved video 204, and frame H of un-grooved video 202 are played at the same time location.
With the above condition, a groover graph can be used to ensure that an overall duration of the media content playback is the same as that of the original media file. In this manner, lip-synced videos can be kept sensibly in-sync despite modifications that are made to the display of certain frames due to the grooving process. However, the present invention also covers a grooving process that would not be lip-sync compatible.
In some approaches, the processing necessary to determine trigger locations and frame playback rate information (e.g., using a groover graph), can be performed for a batch of media files. By way of example, triggers can be extracted for millions of MP3 files (e.g., using a beat/rhythm tracking process). Frame replay can then be adjusted based on the determined triggers by using a groover graph to locally accelerate/decelerate the frame display rate (e.g., the “play speed”) and or shader parameter courses.
As illustrated in the example of
Changes to display properties can be user-configurable. For example, the magnitude and/or type of display change (as implemented by a shader) can be based on user-selectable parameters, and/or may be dependent on other user-configurable options. For example, display response can be a function of parameters implemented by user-configurable skin options that correspond with the playback of a particular media item, media type, and/or media collection (e.g., a playlist, etc.). Further details regarding the use of user-customizable skins are discussed in relation to U.S. application Ser. No. 16/854,062, entitled “AUTOMATED AUDIO-VIDEO CONTENT GENERATION,” which is herein incorporated by reference in its entirety.
The magnitude of the applied offset (Δt) can correspond with the intensity of the audio waveform at a given trigger position. For example, at trigger time position 502, Δt can be greater if the corresponding audio event is of a high-intensity, and lower if the corresponding audio event is of a low-intensity. In some approaches, a predetermined discrete set of Δt values may be used, depending on the energy intensity of the audio at the trigger time. As illustrated in
In some aspects, functions for calculating Δt values can be based on the ΔtMAX value 504 for a given groover graph, as well as the time position (t) within the graph, as given by equation (1):
It is understood that other mathematical functions may be used, without departing from the scope of the disclosed technology.
As discussed in further detail below, the groover graph can be calculated independently for each new media content item.
In some approaches, it can be useful to attenuate the groover strength when time compressing a groover graph to ensure that the derivative of the frame play speed never occurs with a negative value. In some implementations, attenuation may only be necessary for songs having a quarter note duration less than about 0.5 s. In such instances, the groover strength can be attenuated by scaling the original Δt value by a factor that is based on the quarter note duration, such as a factor of:
Appendix A,
In some approaches, a relative energy level (LVL) for the rhythm audio event (typically, hit/snare) at a given trigger can be calculated and used to attenuate the groover strength. In some approaches, the calculated LVL (ranging from 0 to 1) can be used to scale the time shift value (Δt) by multiplying Δt by LVLY where Y can typically range between 0.50 and 2. An example of a level attenuation that is performed for the groover strength (e.g., based on a calculated LVL value) is depicted with respect to Appendix A,
In some examples, multiple triggers (e.g., 2 or more triggers) may be close to one another in the groover-audio channel. In such instances, the Δt values can be summed at the same time positions. However, where Δt values are summed, it can be useful to cap the total, for example, so that the ΔMAX never exceeds 250 ms on the 500 ms span graph.
In practice, the auto triggered mode 1202 can be used to trigger a specified shader parameter upon the detected occurrence of a predetermined audio event, such as, each time a particular song event (e.g., a kick hit, or snare hit, etc.) is detected. The measure cadence mode 1204 allows the shader parameter to be cadenced along a single measure (e.g., a BAR). In some implementations, the cadence measure can last a predetermined length of time, such as 2 seconds; however, different durations of time are contemplated, without departing from the scope of the disclosed technology. The constant value mode permits the user/artist to apply a constant shader parameter, e.g., for all times throughout the duration of the song. The random mode 1208, allows the shader parameter to be randomly set/selected, for example, within a predetermined interval that is specified by the user/artist, e.g., double slider that is displayed on interface 1200.
In some approaches, shader parameters can be automatically projected on local cadencing events (e.g., those in the current BAR). In order to make the shader parameters better match the musical groove, the bar cadenced shader graphs can be normalized to produce normalized bar cadenced groover graphs, as illustrated in
It is further understood that the processor-based device 1800 may be used in conjunction with one or more other processor-based devices, for example, as part of a computer network or computing cluster. Processor-based device 1800 includes a master central processing unit (CPU) 1862, interfaces 1868, and a bus 1815 (e.g., a PCI bus). CPU 1862 preferably accomplishes all these functions under the control of software including an operating system and any appropriate applications software. CPU 1862 can include one or more processors 1863 such as a processor from the Motorola family of microprocessors or the MIPS family of microprocessors. In an alternative embodiment, processor 1863 is specially designed hardware for controlling the operations of processor-based device 1800. In a specific embodiment, a memory 1861 (such as non-volatile RAM and/or ROM) also forms part of CPU 1862. However, there are many different ways in which memory could be coupled to the system.
Interfaces 1868 can be provided as interface cards (sometimes referred to as “line cards”). Generally, they control the sending and receiving of data packets over the network and sometimes support other peripherals used with the router. Among the interfaces that may be provided are Ethernet interfaces, frame relay interfaces, cable interfaces, DSL interfaces, token ring interfaces, and the like. In addition, various very high-speed interfaces may be provided such as fast token ring interfaces, wireless interfaces, Ethernet interfaces, Gigabit Ethernet interfaces, ATM interfaces, HSSI interfaces, POS interfaces, FDDI interfaces and the like. Generally, these interfaces may include ports appropriate for communication with the appropriate media. In some cases, they may also include an independent processor and, in some instances, volatile RAM. The independent processors may control such communications intensive tasks as packet switching, media control and management. By providing separate processors for the communications intensive tasks, these interfaces allow the master microprocessor 1862 to efficiently perform routing computations, network diagnostics, security functions, etc.
Although the system shown in
Regardless of the network device's configuration, it may employ one or more memories or memory modules (including memory 1861) configured to store program instructions for the general-purpose network operations and mechanisms for roaming, route optimization and routing functions described herein. The program instructions may control the operation of an operating system and/or one or more applications, for example. The memory or memories may also be configured to store tables such as mobility binding, registration, and association tables, etc.
For clarity of explanation, in some instances the present technology may be presented as including individual functional blocks including functional blocks comprising devices, device components, steps or routines in a method embodied in software, or combinations of hardware and software.
In some embodiments the computer-readable storage devices, mediums, and memories can include a cable or wireless signal containing a bit stream and the like. However, when mentioned, non-transitory computer-readable storage media expressly exclude media such as energy, carrier signals, electromagnetic waves, and signals per se.
Methods according to the above-described examples can be implemented using computer-executable instructions that are stored or otherwise available from computer readable media. Such instructions can comprise, for example, instructions and data which cause or otherwise configure a general-purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions. Portions of computer resources used can be accessible over a network. The computer executable instructions may be, for example, binaries, intermediate format instructions such as assembly language, firmware, or source code. Examples of computer-readable media that may be used to store instructions, information used, and/or information created during methods according to described examples include magnetic or optical disks, flash memory, USB devices provided with non-volatile memory, networked storage devices, and so on.
Devices implementing methods according to these disclosures can comprise hardware, firmware and/or software, and can take any of a variety of form factors. Typical examples of such form factors include laptops, smart phones, small form factor personal computers, personal digital assistants, rackmount devices, standalone devices, and so on. Functionality described herein also can be embodied in peripherals or add-in cards. Such functionality can also be implemented on a circuit board among different chips or different processes executing in a single device, by way of further example.
The instructions, media for conveying such instructions, computing resources for executing them, and other structures for supporting such computing resources are means for providing the functions described in these disclosures.
Embodiments within the scope of the present disclosure may also include tangible and/or non-transitory computer-readable storage media or devices for carrying or having computer-executable instructions or data structures stored thereon. Such tangible computer-readable storage devices can be any available device that can be accessed by a general purpose or special purpose computer, including the functional design of any special purpose processor as described above. By way of example, and not limitation, such tangible computer-readable devices can include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other device which can be used to carry or store desired program code in the form of computer-executable instructions, data structures, or processor chip design. When information or instructions are provided via a network or another communications connection (either hardwired, wireless, or combination thereof) to a computer, the computer properly views the connection as a computer-readable medium. Thus, any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of the computer-readable storage devices.
Other embodiments of the disclosure may be practiced in network computing environments with many types of computer system configurations, including personal computers, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. Embodiments may also be practiced in distributed computing environments where tasks are performed by local and remote processing devices that are linked (either by hardwired links, wireless links, or by a combination thereof) through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
The various embodiments described above are provided by way of illustration only and should not be construed to limit the scope of the disclosure. For example, the principles herein apply equally to optimization as well as general improvements. Various modifications and changes may be made to the principles described herein without following the example embodiments and applications illustrated and described herein, and without departing from the spirit and scope of the disclosure.
This application claims priority to U.S. Provisional Application No. 63/220,296 filed Jul. 9, 2021, which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
63220296 | Jul 2021 | US |