An example embodiment of the present invention relates generally to multi-user content and, more particularly, to the composing of audio signals in a multi-user environment.
In multi-user content sharing, users located within an environment may each capture audio and/or visual content of events occurring within the environment with their individual devices. These users may then upload the captured audio/video content to a multi-user content server, where it may be shared with other users. The capturing devices may be arbitrarily positioned throughout the event space to capture the event. Location data and/or positioning data of the devices may be captured along with the audio/visual content and uploaded to the multi-user content server. The multi-user content server may use the location and/or position data to provide various listening and/or viewing points to a user for selection when downloading/streaming the captured content. The multi-user content server may then combine the uploaded content from the plurality of devices to provide rendered event content to users. In this regard, a user may select a particular listening/viewing point for the captured event and the multi-user content server may render mixed content from the uploaded content to reconstruct the event space.
To provide multi-user rendered content for sharing with other users, content from multiple users must first be uploaded to the multi-user content server and may then be combined to provide rendered content to be shared with end users. However, the content may generally be captured by a plurality of devices and the quality may vary among the plurality of different captures of the event. For example, the plurality of devices may generally each independently capture and upload content corresponding to an event and the captured content from a particular device may contain distortions and may vary in quality as devices are moved during capturing of the audio and/or video. To provide a positive user experience of the rendered multi-user content, the uploaded content should be rendered to provide the best quality for each audio and/or video segment.
A method, apparatus and computer program product are therefore provided according to an example embodiment of the present invention in order to capture and share audio and/or video content in a multi-user environment. In this regard, the method, apparatus and computer program product of an example embodiment may compose audio signals for content in a multi-user environment that provide a high quality audio signal that best represents the content as captured and uploaded by the plurality of users. The method, apparatus and computer program product of an example embodiment may analyze the audio signals for a set of uploaded content and determine segments of the analyzed signals that should be pruned. The signal pruning data may be used to replace or enhance segments of one or more signals to generate a composed signal that is to be shared with other end users.
In one embodiment, a method is provided that at least includes receiving content data and assigning two or more corresponding content data into a set of content data. The method of this embodiment also includes generating a first pruning data set for each one of a plurality of signals within the content set and generating a second pruning data set for each one of a plurality of signals within the content set. The method of this embodiment also includes generating a composed signal from the plurality of signals in the content set using the first pruning data set and the second pruning data set; causing the composed signal to be stored; and causing transmission of the composed signal.
In one embodiment, the plurality of signals of the content set comprises audio signals. In one embodiment, the first pruning data set for the one of the plurality of signals comprises segments of said signal that are distorted. In one embodiment, the second pruning data set for the one of the plurality of signals comprises segments of said signal that are degraded.
In some embodiments, generating the first pruning data set further comprises determining an amount of saturation for a signal and wherein the amount of saturation exceeds a threshold parameter. In one embodiment, generating the first pruning data set further comprises determining a spike for a signal and wherein the energy of the signal spike exceeds a threshold parameter. In some embodiments, generating the composed signal using the first pruning data set further comprises replacing segments of the one of the plurality of signals.
In some embodiments, the generating the second pruning data set further comprises analyzing sensor data and signal characteristics for a corresponding segment of a signal. In some embodiments, the sensor data further comprises one or more of compass data, accelerometer data, or gyroscope data corresponding to a capturing period of signal content. In one embodiment, the generating the composed signal using the second pruning data set further comprises enhancing segments of the one of the plurality of signals. In some embodiments, enhancing a segment further comprises weighting corresponding segments for two or more of the plurality of signals and using the weighting during mixing of the two or more of the plurality of signals to generate a composed signal.
In another embodiment, an apparatus is provided that includes at least one processor and at least one memory including computer program instructions with the at least one memory and the computer program instructions configured to, with the at least one processor, cause the apparatus at least to receive content data and assign two or more corresponding content data into a set of content data. The at least one memory and the computer program instructions of this embodiment are also configured to, with the at least one processor, cause the apparatus to generate a first pruning data set for each one of a plurality of signals within the content set and generate a second pruning data set for each one of a plurality of signals within the content set. The at least one memory and the computer program instructions are also configured to, with the at least one processor, cause the apparatus of this embodiment to generate a composed signal from the plurality of signals in the content set using the first pruning data set and the second pruning data set; cause the composed signal to be stored; and cause transmission of the composed signal.
In a further embodiment, a computer program product is provided that includes at least one non-transitory computer-readable storage medium bearing computer program instructions embodied therein for use with a computer with the computer program instructions including program instructions configured to receive content data and assign two or more corresponding content data into a set of content data. The computer program instructions of this embodiment also include program instructions configured to generate a first pruning data set for each one of a plurality of signals within the content set and generate a second pruning data set for each one of a plurality of signals within the content set. The computer program instructions of this embodiment also include program instructions configured to generate a composed signal from the plurality of signals in the content set using the first pruning data set and the second pruning data set; cause the composed signal to be stored; and cause transmission of the composed signal.
In another embodiment, an apparatus is provided that includes at least means for receiving content data and means for assigning two or more corresponding content data into a set of content data. The apparatus of this embodiment also includes means for generating a first pruning data set for each one of a plurality of signals within the content set and means for generating a second pruning data set for each one of a plurality of signals within the content set. The apparatus of this embodiment also includes means for generating a composed signal from the plurality of signals in the content set using the first pruning data set and the second pruning data set; means for causing the composed signal to be stored; and means for causing transmission of the composed signal.
Having thus described certain embodiments of the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
Some embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all, embodiments of the invention are shown. Indeed, various embodiments of the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout. As used herein, the terms “data,” “content,” “information,” and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present invention. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present invention.
Additionally, as used herein, the term ‘circuitry’ refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present. This definition of ‘circuitry’ applies to all uses of this term herein, including in any claims. As a further example, as used herein, the term ‘circuitry’ also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware. As another example, the term ‘circuitry’ as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
As defined herein, a “computer-readable storage medium,” which refers to a non-transitory physical storage medium (e.g., volatile or non-volatile memory device), can be differentiated from a “computer-readable transmission medium,” which refers to an electromagnetic signal.
A method, apparatus and computer program product are provided in accordance with an example embodiment of the present invention to compose audio signals for content in a multi-user environment that provide a high quality audio signal that best represents the content as captured and uploaded by the plurality of users. In this regard, a method, apparatus and computer program product of an example embodiment may analyze the audio signals for a set of uploaded content and prune low quality segments before composing the audio signal that is to be provided to end users.
To provide multi-user rendered content for sharing with other users, content from multiple users must first be uploaded to the multi-user content server and may then be combined to provide rendered content to be shared with end users. However, the content may generally be captured by a plurality of devices and the quality may vary among the plurality of different captures of the event. For example, the plurality of devices may generally each independently capture and upload content corresponding to an event and the captured content from a particular device may contain distortions and may vary in quality as devices are moved during capturing of the audio and/or video. To provide a positive user experience of the rendered multi-user content, the uploaded content should be rendered to provide the best quality for each audio and/or video segment. For example, audible distortions should be minimized in the composition signal and the quality should not vary significantly over time, that is, segments of the composed signal should be comparable to other segments of the signal.
The content captured by one of the plurality of mobile devices 104 may be uploaded immediately or may be stored and uploaded at a future time. The plurality of mobile devices 104 may also record timestamps for the content being captured, and such timestamps may be based on a local device time signal or on external signals, such as timing from Global Positioning System (GPS) signals or Network Time Protocol (NTP) signals. The plurality of mobile devices 104 may also capture position data corresponding to the location where the content is being captured, such as through the use of Global Positioning System (GPS) coordinates, Cellular Identification (Cell-ID), or Assisted GPS (A-GPS). The plurality of mobile devices 104 may also capture direction/orientation data corresponding to the recording direction/orientation, such as by using compass, accelerometer or gyroscope data. The captured content, e.g. audio, video, and/or still image data, from a mobile device 104 is then transmitted through network 108, such as to a multi-user content server 106. In this regard, network 108 may include any wired or wireless communication network including, for example, a wired or wireless local area network (LAN), personal area network (PAN), metropolitan area network (MAN), wide area network (WAN), or the like, as well as any hardware, software and/or firmware required to implement it (such as, e.g., network routers, etc.). For example, network 108 may include a cellular radio access network, an 802.11, 802.16, 802.20, and/or WiMax network. Further, the network 108 may include a public network, such as the Internet, a private network, such as an intranet, or combinations thereof.
The multi-user content server 106 receives the uploaded content from the plurality of mobile devices 104. The captured content may be uploaded to the multi-user content server 106 during or upon the completion of capturing or at a later time than the original capture. The multi-user content server 106 may combine the captured content from one or more mobile devices 104, such as one or more mobile devices that are in close proximity, to provide rendered content to be shared with end users. The end users may be users who uploaded content or other users who wish to receive rendered content from an event.
To provide rendered content, the multi-user content server 106 may first align the content uploaded from a plurality of users into a content set to allow generation of the rendered content using the best media segments from the plurality of users.
After content from multiple users is available at the multi-user content server 106, the content may be rendered such that the downloaded/streamed content utilizes content from the different users in various ways. For example, the content may be rendered so as to provide the best media segments from multiple contributing users to provide the best end user experience of the multi-user rendered content. End users may also be offered content that represents the multi-user content from various points of view that has been created in various manners, such as by equally sharing content from different users, selecting the best view as a function of time, maximizing or minimizing the viewing experience (that is, for each view select the view that is the most different from the different users or for each view select the view that is most similar from the different users), etc.
An end user may select content on the multi-user content server 106 that corresponds to a particular listening and/or viewing position at an event that the end user wishes to receive through end user device 110. The end user device 110 may be embodied as a variety of different mobile devices including as a mobile telephone, a personal digital assistant (PDA), a laptop computer, a tablet computer, a camera, a video recorder, an audio/video player, or any of numerous other computation devices, content generation devices, content consumption devices or combinations thereof. The end user device 110 may alternatively be embodied as a variety of different stationary or fixed computing devices, such as a desktop computer, a television, a game console, a multimedia device, or the like. Multi-user content server 106 may then render content corresponding to the selected listening/viewing position that the end user selected and cause the rendered content to be transmitted to end user device 110. Alternatively, if the proximity of the captured content is small, the multi-user content server 106 may provide only a single listening/viewing position to the end user.
The system of an embodiment of the present invention may include an apparatus 200 as generally described below in conjunction with
It should also be noted that while
Referring now to
In some embodiments, the processor (and/or co-processors or any other processing circuitry assisting or otherwise associated with the processor) may be in communication with the memory device via a bus for passing information among components of the apparatus. The memory device may include, for example, a non-transitory memory, such as one or more volatile and/or non-volatile memories. In other words, for example, the memory device may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g., a computing device like the processor). The memory device may be configured to store information, data, content, applications, instructions, or the like for enabling the apparatus to carry out various functions in accordance with an example embodiment of the present invention. For example, the memory device could be configured to buffer input data for processing by the processor 202. Additionally or alternatively, the memory device could be configured to store instructions for execution by the processor.
In some embodiments, the apparatus 200 may be embodied as a chip or chip set. In other words, the apparatus may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon. The apparatus may therefore, in some cases, be configured to implement an embodiment of the present invention on a single chip or as a single “system on a chip.” As such, in some cases, a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
The processor 202 may be embodied in a number of different ways. For example, the processor may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other processing circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. As such, in some embodiments, the processor may include one or more processing cores configured to perform independently. A multi-core processor may enable multiprocessing within a single physical package. Additionally or alternatively, the processor may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.
In an example embodiment, the processor 202 may be configured to execute instructions stored in the memory device 204 or otherwise accessible to the processor. Alternatively or additionally, the processor may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processor may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the present invention while configured accordingly. Thus, for example, when the processor is embodied as an ASIC, FPGA or the like, the processor may be specifically configured hardware for conducting the operations described herein. Alternatively, as another example, when the processor is embodied as an executor of software instructions, the instructions may specifically configure the processor to perform the algorithms and/or operations described herein when the instructions are executed. However, in some cases, the processor may be a processor of a specific device configured to employ an embodiment of the present invention by further configuration of the processor by instructions for performing the algorithms and/or operations described herein. The processor may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor.
Meanwhile, the communication interface 206 may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data from/to a network and/or any other device or module in communication with the apparatus 200, such as by supporting communications with the multi-user content server 106. In this regard, the communication interface may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network. Additionally or alternatively, the communication interface may include the circuitry for interacting with the antenna(s) to cause transmission of signals via the antenna(s) or to handle receipt of signals received via the antenna(s). In some environments, the communication interface may alternatively or also support wired communication. As such, for example, the communication interface may include a communication modem and/or other hardware/software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB) or other mechanisms.
The apparatus 200 may include a user interface 208 that may, in turn, be in communication with the processor 202 to provide output to the user and, in some embodiments, to receive an indication of a user input. For example, the user interface may include a display and, in some embodiments, may also include a keyboard, a mouse, a joystick, a touch screen, touch areas, soft keys, a microphone, a speaker, or other input/output mechanisms. The processor may comprise user interface circuitry configured to control at least some functions of one or more user interface elements such as a display and, in some embodiments, a speaker, ringer, microphone and/or the like. The processor and/or user interface circuitry comprising the processor may be configured to control one or more functions of one or more user interface elements through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processor (e.g., memory 204, and/or the like).
In some example embodiments, such as instances in which the apparatus is embodied as a mobile device 104, the apparatus 200 may include an audio and video capturing element, such as a camera/microphone 210, video module and/or audio module, in communication with the processor 202. The audio/video capturing element may be any means for capturing an image, video and/or audio for storage, display or transmission. For example, in an example embodiment in which the audio/video capturing element is a camera, the camera may include a digital camera capable of forming a digital image file from a captured image. As such, the camera may include all hardware (for example, a lens or other optical component(s), image sensor, image signal processor, and/or the like) and software necessary for creating a digital image file from a captured image and/or video. Alternatively, the camera may include only the hardware needed to view an image, while a memory device 204 of the apparatus stores instructions for execution by the processor in the form of software necessary to create a digital image file from a captured image. In an example embodiment, the camera may further include a processing element such as a co-processor which assists the processor in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and/or decoder may encode and/or decode according to, for example, a joint photographic experts group (JPEG) standard, a moving picture experts group (MPEG) standard, or other format.
As shown in
In some example embodiments, such as instances in which the apparatus is embodied as a mobile device 104, the apparatus 200 may also include a sensor 212, such as a GPS receiver, a compass, an accelerometer, and/or a gyroscope that may be in communication with the processor 202 and may be configured to receive timing signals and to detect changes in position, motion and/or orientation of the apparatus.
The method, apparatus, and computer program product may now be described in conjunction with the operations illustrated in
As shown in block 304 of
As shown in block 306 of
The apparatus 200 may also include means, such as the processor 202 or the like, for applying the pruning sets to the one or more signals to determine how to compose the signal that is to be shared in the multi-user environment. See block 308 of
As shown in block 310 of
The method, apparatus, and computer program product may now be described in conjunction with the operations illustrated in
As shown in block 404 of
As shown in block 406 of
The apparatus 200 may also include means, such as the processor 202 or the like, for comparing the amount of saturation to a predetermined threshold. See block 408 of
As shown in block 410 of
As shown in block 412 of
As shown in block 414 of
The apparatus 200 may also include means, such as the processor 202, the memory 204, or the like, for mapping the indicated values for each of the one or more frames of the selected signal to continuous segments. See block 416 of
The apparatus 200 may also include means, such as the processor 202, the memory 204, or the like, for adding the mapped segments indicated as saturated to a first pruning data set. See block 418 of
Through the operations of
The method, apparatus, and computer program product may now be described in conjunction with the operations illustrated in
As shown in block 504 of
As shown in block 506 of
For example, clicking analysis is useful for cases where a device, such as mobile device 104, accidentally, or intentionally, hits something causing short duration spikes to the audio signal. In one example embodiment, clicking analysis is performed by comparing the energy of a current frame to the energy of a previous and a next frame. For example, let the current frame be ft and the energy of current frame be eft. The current frame is indicated ‘click’ if:
e
t
f
>E·e
t−1
f and etf>E·et+1f (1)
where E is a level difference multiplier for the previous and next frame, for example, 3 dB.
As shown in block 508 of
As shown in block 510 of
As shown in block 512 of
As shown in block 514 of
The apparatus 200 may also include means, such as the processor 202, the memory 204, or the like, for mapping the indicated values for each of the one or more frames of the selected signal to continuous segments. See block 516 of
The apparatus 200 may also include means, such as the processor 202, the memory 204, or the like, for adding the mapped segments indicated as click to a first pruning data set. See block 518 of
The method, apparatus, and computer program product may now be described in conjunction with the operations illustrated in
As shown in block 604 of
As shown in block 606 of
The apparatus 200 may also include means, such as the processor 202 or the like, for comparing the OOI angles to the recorded compass angle for a time segment of the signal to determine if the recorded compass angle is part of the OOI angle. See block 608 of
As shown in block 610 of
As shown in block 612 of
As shown in block 614 of
The apparatus 200 may also include means, such as the processor 202, the memory 204, or the like, for mapping the indicated values for each of the one or more time segments of the selected signal to continuous segments. See block 616 of
The apparatus 200 may also include means, such as the processor 202, the memory 204, or the like, for adding the mapped segments indicated as non-OOI to a “non-ideal” segment set. See block 618 of
As shown in block 620 of
The apparatus 200 may also include means, such as the processor 202 or the like, for an analyzing accelerometer and/or gyroscope data for a time segment of the signal to determine if the device, such as mobile device 104, was tilted or moving during capturing. See block 622 of
As shown in block 624 of
As shown in block 626 of
As shown in block 628 of
The apparatus 200 may also include means, such as the processor 202, the memory 204, or the like, for mapping the indicated values for each of the one or more time segments of the selected signal to continuous segments. See block 630 of
The apparatus 200 may also include means, such as the processor 202 or the like, for adding the mapped segments indicated as non-ideal to a “non-ideal” segment set for the signal. See block 632 of
The apparatus 200 may also include means, such as the processor 202 or the like, for analyzing the signal characteristics corresponding to the non-ideal time segments. See block 634 of
As shown in block 636 of
As shown in block 638 of
As shown in block 640 of
As shown in block 642 of
The method, apparatus, and computer program product may now be described in conjunction with the operations illustrated in
As shown in block 704 of
The apparatus 200 may also include means, such as the processor 202, or the like, for determining whether the selected signal source has one or more pruning data sets. See block 706 of
As shown in block 708 of
As shown in block 710 of
As shown in block 712 of
The apparatus 200 may also include means, such as the processor 202 or the like, for determining whether the candidate signal source has pruning data that overlaps the corresponding time segment. See block 714 of
As shown in block 716 of
As shown in block 718 of
Returning to block 714, if the candidate signal source does not have pruning data overlapping the corresponding time segment from the selected signal source, operation continues to block 720.
As shown in block 720 of
As shown in block 722 of
As shown in block 724 of
As shown in block 726 of
If it is determined at block 726 that there are no additional time segments having overlapping pruning data, operation continues to block 728 where operation ends.
The method, apparatus, and computer program product may now be described in conjunction with the operations illustrated in
As shown in block 804 of
As shown in block 806 of
If it is determined that any of the signal sources has pruning data overlapping a time segment of its signal, the apparatus 200 may include means, such as the processor 202 or the like, for selecting the first time segment along the common timeline with overlapping pruning data and operation continues to block 808.
As shown in block 808 of
As shown in block 810 of
As shown in block 812 of
As shown in block 814 of
If it is determined at block 814 that there are no additional time segments having overlapping pruning data, operation continues to block 816 where operation ends.
For time period from sA to sB, only source A is used as it is the only available source for that time period. For time period from sB to sC, sources A and B are jointly mixed with the exception of time segment which overlaps with the first pruning data set for source A. This time segment of source A is not used as it contains distorted signal, thus for this time segment only source B is used. For time period from sC to eA, the sources A, B, and C are all jointly mixed with the exception of (1) the time segment which overlaps with pruning data set 1 for source A where only sources B and C are used, and (2) the time segment which overlaps with pruning data set 2 for source C where all three sources are used but source C is weighted such that its contribution to the composed signal is smaller compared to source A and B. For the time period from eA to eB, sources B and C are jointly mixed with the exception of time segment which overlaps with pruning data set 2 for source B where all both sources are used but source B is weighted such that its contribution to the composed signal is smaller compared to source C. For the time period from eB to eC, only source C is used as it is the only available source for that time period.
As described above,
Accordingly, blocks of the flowchart support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flowchart, and combinations of blocks in the flowchart, can be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions.
In some embodiments, certain ones of the operations above may be modified or further amplified. Furthermore, in some embodiments, additional optional operations may be included, such as shown by the blocks with dashed outlines. Modifications, additions, or amplifications to the operations above may be performed in any order and in any combination.
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe example embodiments in the context of certain example combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.