The present disclosure generally relates to the technical field of the Internet and, more particularly, relates to a method and a system for synthesizing audio/video.
In some current application scenarios, usually it is necessary to integrate multiple audio/video signals, so pictures of multiple audio/video signals may be displayed in a same video picture. For example, in the process of video conference and TV broadcasting, usually it is necessary to collect audio/video signals from various angles and scenes; under the control of a preset method or a broadcast control, the collected audio/video signals are then synthesized according to required pictures and sound effects, and finally the synthesized audio/video signals may be provided for users.
However, the existing way of synthesizing audio/video usually requires expensive professional hardware such as broadcast consoles and also requires professional staff to control professional hardware. Consequently, the cost of the existing audio/video synthesis is too high.
The purpose of the present disclosure is to provide a method and a system for synthesizing audio/video, which may reduce the cost in the process of audio/video synthesis.
To achieve above purpose, in one aspect, the present disclosure provides a method for synthesizing audio/video. The method includes: receiving video synthesis instructions sent by a broadcast client, synthesizing a first video stream based on multiple video input streams, and synthesizing a second video stream based on the multiple video streams and the first video stream; receiving audio synthesis instructions from the broadcast client and respectively synthesizing a first audio stream and a second audio stream based on multiple audio input streams; respectively encoding the first video stream, the second video stream, the first audio stream and the second audio stream to correspondingly obtain a first video encoding stream set, a second video encoding stream set, a first audio encoding stream set and a second audio encoding stream set; respectively determining a first video encoding stream and/or a first audio encoding stream from the first video encoding stream set and the first audio encoding stream set, and integrating the first video encoding stream and/or the first audio encoding stream into a first output stream, and providing the first output stream to a user client; and respectively determining a second video encoding stream and/or a second audio encoding stream from the second video encoding stream set and the second audio encoding stream set, and integrating the second video encoding stream and/or the second audio encoding stream into a second output stream, and providing the second output stream to the user client.
To achieve above purpose, in another aspect, the present disclosure provides a system for synthesizing audio/video. The system includes an instruction control module, a data stream synthesis and processing module, a data stream multi-version encoding module and a data merging output module, where: the instruction control module is configured to receive a video synthesis instruction and an audio synthesis instruction from a broadcast client; the data stream synthesis and processing module is configured to synthesize a first video stream based on multiple video input streams and synthesize a second video stream based on the multiple video streams and the first video stream; and configured to respectively synthesize a first audio stream and a second audio stream based on multiple audio input streams; the data stream multi-version encoding module is configured to encode the first video stream and the second video stream respectively to correspondingly obtain a first video encoding stream set and a second video encoding stream set; and configured to encode the first audio stream and the second audio stream respectively to correspondingly obtain a first audio encoding stream set and a second audio encoding stream set; and the data merging output module is configured to determine a first video encoding stream and/or a first audio encoding stream from the first video encoding stream set and the first audio encoding stream set respectively, and integrate the first video encoding stream and/or the first audio encoding stream into a first output stream which is provided to a user client; and also configured to determine a second video encoding stream and/or a second audio encoding stream from the second video encoding stream set and the second audio encoding stream set respectively, and integrate the second video encoding stream and/or the second audio encoding stream into a second output stream which is provided to the broadcast client.
It can be seen from the above that, for the technical solution provided by the present disclosure, the broadcast client only needs to release control instructions in the process of audio/video synthesis, and the audio/video synthesis process may be accomplished in the cloud system. Specifically, the cloud system may synthesize the first video stream provided for the user client to view from multiple video input streams when the cloud system is synthesizing videos. At least one video input stream picture may be displayed simultaneously in the video picture of the first video stream. In addition, the cloud system may further synthesize the second video stream provided for the broadcast client to view, and the video picture of the second video stream may include a video picture for each video input stream in addition to the video picture of the first video stream. In such way, the broadcast control staff may conveniently monitor the video picture viewed by the users and the video pictures of currently available video input streams in real time. When synthesizing the audio, the cloud system may separately synthesize the first audio stream provided to the user client and the second audio stream provided to the broadcast client based on multiple audio input streams. Subsequently, when encoding the video stream and the audio stream, the first video encoding stream set, the second video encoding stream set, the first audio encoding stream set and the second audio encoding stream set may be generated using the multi-version encoding method. Multiple different versions of encoding streams may be included in each set. In such way, the video encoding stream and audio encoding stream may be determined correspondingly from each set according to the coding types required by the user client and the broadcast client, and the video encoding stream and the audio encoding stream may be integrated into one output stream, and the output stream may be provided to the user client and the broadcast client. In such way, the user client and the broadcast client may be prevented from using more bandwidth to load multiple audio and video data, and only one output stream is required to load, which may save the bandwidth for the user client and the broadcast client. In addition, in the prior art, the push stream output end usually only uses one encoding method, and then transcodes with multiple different encoding methods, via a live transcoding server, into live streams which are distributed to different users, which may cause higher live delay and also affect the output stream quality. In the present disclosure, the encoding method of the output stream may be flexibly adjusted according to the required encoding methods of the user client and the broadcast client, so the matching output stream may be provided to the user client and the broadcast client and the transcoding step may be eliminated. In such way, it may not only save the waiting time for users, and also reduce the resource consumption in the audio/video synthesis process. For the technical solution provided by the present application, the broadcast client does not need professional hardware devices, and only needs network communication function and page display function, which may greatly reduce the cost in the audio/video synthesis process and also improve generality of the audio/video synthesis method.
To more clearly illustrate the technical solutions of the present disclosure, the accompanying drawings to be used in the description of the disclosed embodiments are briefly described hereinafter. Obviously, the drawings described below are merely some embodiments of the present disclosure. Other drawings derived from such drawings may be obtained by a person having ordinary skill in the art without creative labor.
To more clearly describe the objectives, technical solutions and advantages of the present disclosure, the present disclosure is further illustrated in detail with reference to the accompanying drawings in conjunction with embodiments.
The present disclosure provides a method for synthesizing audio/video, which may be applied to an audio/video synthesis system. The audio/video synthesis system may be deployed on a cloud server. The server may be an independent server or a distributed server cluster and may be flexibly configured according to required computing resources. Referring to
Referring to
In S1: receiving video synthesis instructions sent by the broadcast client, synthesizing a first video stream based on multiple video input streams, and synthesizing a second video stream based on the multiple video streams and the first video stream.
In one embodiment, the cloud server may receive a pull-stream instruction sent by the broadcast client and the pull-stream instruction may point to multi-channel audio/video data streams. In such way, the cloud server may acquire the multi-channel audio/video data streams and decode the acquired audio/video data streams. The multi-channel audio/video data streams may be data streams required in an audio/video synthesis process. After acquiring the audio data stream and the video data stream from decoding, the cloud server may separately cache the decoded audio data stream and video data stream, and subsequently call the required audio data stream and/or video data stream independently.
In one embodiment, the broadcast client may send a video synthesis instruction to the cloud server. After receiving the video synthesis instruction, the cloud server may read each video data stream from the cache of the video data stream. Each video data stream from reading the cache may be used as multiple input streams in step S1.
In one embodiment, the cloud server may synthesize two different video pictures. One of the video pictures may be available for viewing by users. Referring to
In one embodiment, another video picture synthesized by the cloud server may be provided to broadcast staff for viewing. The broadcast staff need to monitor the video picture viewed by users and also need to view the video pictures of currently available video input streams, so may further synthesize the video picture. The video picture viewed by the broadcast staff may be shown in
In one embodiment, whether the first video stream or the second video stream is synthesized, it always involves a process of integrating multiple video pictures into one video picture. Specifically, based on the resolution of the integrated video picture, a background picture matching the resolution may be pre-created. The background picture may be a solid color picture generally. For example, the background picture may be a black background picture. Then, for each video picture to be integrated, the integration parameters of each video picture may be determined separately. The integration parameters may include a picture size, a location, an overlay level, etc. The picture size may represent the size of the video picture to be integrated in the integrated picture; the location may represent the specific location of the video picture to be integrated in the integrated picture; the overlay level may control the overlay order of multiple video pictures to be integrated in the integrated picture, that is, if there is an overlap of the video pictures of two input streams in the integrated picture, the overlay level may determine which video picture is at above and which video picture is at below. In such way, after the integration parameters of each video picture to be integrated are determined, each video picture to be integrated may be added onto the background picture to form the integrated video picture according to the integration parameters.
It should be mentioned that the significance of configuring above solid color background picture is sometimes the video picture to be integrated may not fill entire integrated video picture, so there is necessary to use a solid color background picture as the background color to completely show the integrated video picture. In addition, the solid color background picture may be removed by a post processing in some application scenarios and some customized effect pictures may be added to the removed region. For example, a green background picture may be removed from the integrated video picture using the chroma keying technique and the removed area may be filled with an effect picture which matches the theme of the video picture.
In one embodiment, before the synthesis of multi-channel input streams, each input stream may be pre-processed. The pre-processing includes, but is not limited to, noise removal, background filtering, transparency setting, and contrast enhancement. In addition, after the main picture is synthesized, the main picture may be further post-processed. The post-processing includes, but is not limited to, adding image watermarks, adding texts, and adding preset picture effects (such as live virtual gift effects).
In S2: receiving the audio synthesis instruction from the broadcast client and respectively synthesizing the first audio stream and the second audio stream based on multiple audio input streams.
In one embodiment, the cloud server may also synthesize multiple audio input streams according to the audio synthesis instruction from the broadcast client. Identically, the synthesized audio streams may be separately provided to the user client and the broadcast client. The audio stream provided to the user client may be used as a main audio which is the first audio stream described in step S2; while the audio stream provided to the broadcast client may be used as a user audio which is the second audio stream described in step S2.
In one embodiment, the main audio and the user audio may be synthesized by using multiple audio input streams acquired from the above-mentioned cache of audio data streams. Specifically, the audio synthesis instruction may include synthesis parameters of the main audio. Audio frames of required audio input streams may be acquired from multiple audio input streams according to the synthesis parameters of the main audio. Then, the selected audio frames may be pre-processed, including but not limited to audio volume adjustment, and pitch conversion. Next, the pre-processed audio frames may be mixed according to the mixing parameter of the main audio. The mixing process may include a blending of different sound channels and a mixing of loudness. After the main audio is obtained by synthesizing, the main audio may be post-processed, and the post-processing includes, but is not limited to, adding preset sound effects such as whistles, and applause and cheers. In such way, the first audio stream provided to the user client may be generated.
In one embodiment, the cloud server may determine whether the audio synthesis instructions include an audio copy instruction when synthesizing the second audio stream. If included, the first audio stream may be copied, and the copied data may be used as the second audio stream. If not included, the user audio synthesis may be accomplished according to the user audio synthesis parameters included in the audio synthesis instructions and the above-mentioned process of synthesizing the main audio.
In one embodiment, after synthesizing the second audio stream, staff at the broadcast client may audition the second audio stream and may further modify the second audio stream. Specifically, the cloud server may receive regulation instructions including audio synthesis parameters from the broadcast client. The audio synthesis parameters in the regulation instructions may be used for the cloud server to adjust the second audio stream. For example, the cloud server may remove partial sound effects in the second audio stream or add new sound effects or modify partial sound effects. After the adjustments, the cloud server may feedback the adjusted second audio stream to the broadcast client. After receiving the adjusted second audio stream by the broadcast client, staff may continue to audition. If the adjusted second audio stream meets expectations, staff may send an audio synchronization instruction to the cloud server via the broadcast client. After receiving the audio synchronization instruction sent by the broadcast client, the cloud server may adjust the first audio stream provided to the user client and provide the adjusted first audio stream to the user client according to the audio synthesis parameters used for adjusting the second audio stream. In such way, the audio stream provided to the user client may be auditioned and modified in the broadcast client in advance. After completing of the modification, the first audio stream provided to the user client may be processed identically according to the audio synthesis parameters used for the modification, which may ensure that the sound effects heard by users meet expectations of the staff.
In another embodiment, in addition of receiving the above-mentioned second audio stream, the broadcast client may also monitor the first audio stream received by the use client. Specifically, the broadcast client may send an audio switching instruction to the cloud server. After receiving the audio switching instruction, the cloud server may respond to the audio switching instruction and send the first output stream, which is provided to the user client, to the broadcast client. In such way, the broadcast client may monitor the sound effect that users may hear. After the broadcast client send the audio switching instruction to the cloud client again, the cloud server may provide the second audio stream to the broadcast client again. In such way, the broadcast client may switch back and forth between the first audio stream and the second audio stream.
It can be seen that two sets of audio and video data for different purposes may be synthesized according to the above-mentioned technical solution of the present disclosure. One set may be provided to the user client and another set may be provided to the broadcast client. Staff who control the online synthesis of live content may view the main picture seen by viewers and also may view the real-time picture of currently available video input stream through viewing the user picture, so the whole situation may be overviewed. At the same time, staff may hear the audio output to viewers, switch to user's audio, and also test and audition the user's audio. The synthesis parameters of the user's audio may be sent to the synthesis parameters of the main audio to adjust the main audio when the audition is satisfied.
In S3: respectively encoding the first video stream, the second video stream, the first audio stream and the second audio stream to correspondingly obtain a first video encoding stream set, a second video encoding stream set, a first audio encoding stream set and a second audio encoding stream set.
In one embodiment, after completing the synthesis of above-mentioned video picture or audio, the cloud server may encode the generated first video stream, the second video stream, the first audio stream and the second audio stream. In the existing audio/video synthesis, generally only one version of audio/video data may be encoded, and a network relay server performs transcoding on multiple different audio/video attributes after the pushing. However, this existing method has some disadvantages. For example, transcoding on multiple different audio/video attributes by the relay server may cause picture quality loss by two encoding/decoding processes and also cause high delay. In one embodiment, in order to adapt to different terminals (such as set-top boxes, personal computers, and smart phones) and different internet access circumstances (such as optical fibers, and mobile cellular networks), a multi-version encoding may be performed on the synthesized audio stream and video stream.
In one embodiment, when the multi-version encoding is performed on the first audio stream and the second audio stream, firstly, audio data with multiple different sampling rates and sound channels may be generated by switching sampling rates and sound channels in the audio multi-version encoding parameter set. Then the audio data for each sampling rate and sound channel may be encoded according to different audio encoding settings. The different audio encoding settings include, but are not limited to, different encoding rates, and encoding formats.
In one embodiment, when the multi-version encoding is performed on the first video stream and the second video stream, firstly, video frames with multiple different resolutions may be generated by zooming resolutions in the video multi-version encoding parameter set. Then, the video frames with each different resolution may be encoded according to different video coding settings such as frame rates, encoding formats, encoding rates etc.
In one embodiment, when performing the multi-version encoding on the synthesized audio/video streams, the multi-version encoding parameters may be adjusted in real-time according to different user clients. Specifically, video encoding parameters and audio encoding parameters required for each output stream may be acquired to determine a required encoding parameter set. The required encoding parameter set may summarize the audio encoding parameters and video encoding parameters for current output stream. Then, the required encoding parameter set may be compared with the current encoding parameter set. The current encoding parameter set may be the encoding parameter set currently used by the cloud server. If these two sets are inconsistent with each other, it indicates that, comparing to the current encoding parameter set, the output stream corresponding to the current user client has changed. At this time, video encoding parameters and/or audio encoding parameters newly added to the required encoding parameter set may be determined, and also the newly added video encoding parameters and/or audio encoding parameters may be added into the current encoding parameter set. In addition, target video encoding parameters and/or target audio encoding parameters, included in the current encoding parameter set but not included in the required encoding parameter set, may be determined, and the target video encoding parameters and/or the target audio encoding parameters may be removed from the current encoding parameter set. In such way, the encoding parameters in the current encoding parameter set may be added and deleted correspondingly. The current encoding parameter set after the above-mentioned adjustment may only include the required encoding parameters of the current output stream. In such way, the first video stream, the second video stream, the first audio stream and the second audio stream may be encoded respectively according to the video encoding parameters and audio encoding parameters in the current encoding parameter set after adjustment.
Since it is the multi-version encoding, each audio stream/video stream may correspond multiple different encoding versions, so that the first video encoding stream set, the second video encoding stream set, the first audio encoding stream set, and the second audio encoding stream set may be obtained correspondingly. Each set may include multiple different versions of encoding streams.
In S4: respectively determining a first video encoding stream and/or a first audio encoding stream from the first video encoding stream set and the first audio encoding stream set, and the first video encoding stream and/or the first audio encoding stream may be integrated into a first output stream and the first output stream are provided to the user client.
In S5: respectively determining a second video encoding stream and/or a second audio encoding stream from the second video encoding stream set and the second audio encoding stream set, and the second video encoding stream and/or the second audio encoding stream may be integrated into a second output stream and the second output stream are provided to the user client.
In one embodiment, when the user client or the broadcast client is pushing the output steam, adaptive audio/video encoding streams may be selected correspondingly from the encoding streams according to the encoding/encoding versions supported by the user client and the broadcast client. Specifically, the first video encoding stream and/or the first audio encoding stream may be determined from the first video encoding stream set and the first audio encoding stream set respectively according to the output stream provided to the user client, the first video encoding stream and/or the first audio encoding stream may be integrated into the first output stream which may be provided to the user client. It should be mentioned that only the audio stream, not the video stream, may be selected when integrating the first output stream. It may be used for applications such as internet radio stations etc. in case of audio-only situation. More than one audio stream or video stream may also be selected in case of multiple audio tracks or multiple video tracks, and the user client may freely switch audio and video tracks. Even only the video stream, not the audio stream, may be selected for output in case of similar silent effect.
Correspondingly, the second video encoding stream and/or the second audio encoding stream may be determined from the second video encoding stream set and the second audio encoding stream set respectively according to the output stream provided to the broadcast client, and the second video encoding stream and/or the second audio encoding stream may be integrated into the second output stream which may be provided to the broadcast client.
In one embodiment, for each output stream, the audio stream and video stream are selected from the encoding stream sets correspondingly and are pushed according to the push stream address corresponding to the output stream after the integration, which corresponds to live scenarios; it may be saved as local files after the integration, which corresponds to on-demand playback and review scenarios, for example. In the processing of pushing to the user client and/or the broadcast client, the cloud server may receive instructions in real-time of adding, deleting, modifying push stream addresses and push stream merging parameters sent from the user client or the broadcast client, and so make corresponding changes in real-time.
In one embodiment, the required output stream set and the current output stream set may be compared when the first output stream is provided to the user client and the second output stream is provided to the broadcast client. If these two sets are inconsistent with each other, a newly added output stream in the required output stream set may be determined, and additional output push stream connections may be established according to the push stream address of the newly added output stream. These additional established output push stream connections may correspond to the user client and/or the broadcast client to provide the newly added output stream to the user client and/or the broadcast client. In addition, a target output stream included in the current output stream set but not included in the required output stream set may be determined, and the push stream connections of the target output stream may be cancelled to stop providing the target output stream.
In one embodiment, before providing the newly added output stream to the user client and/or the broadcast client, the integration parameters corresponding to each newly added output stream may be configured. The integration parameters may be used to limit the video encoding stream and/or the audio encoding stream included in the newly added output stream. In such way, the audio/video stream may be selected correspondingly from the encoding stream set according to the integration parameters.
As can be seen from the above, the present disclosure supports multiple output streams and each output stream may have information such as different resolutions, encoding rates, and sampling rates. When receiving instructions of adding, deleting and modifying the output stream settings, the cloud sever may analyze the required multi-version encoding settings and then compare with the currently used multi-version encoding settings. In such way, the cloud server may newly add, change or cancel the corresponding multi-version encoding settings in real-time, and also add, cancel or modify output push stream and related parameters.
The present disclosure also provides an audio/video synthesis system and this system may be deployed in a cloud server. Referring to
The instruction control module is configured to receive a video synthesis instruction and an audio synthesis instruction from the broadcast client.
In one embodiment, the data stream synthesis and processing module may also a video picture synthesis and processing module and a sound effect synthesis and processing module; the data stream multi-version encoding module may also include a video multi-version encoding module and an audio multi-version encoding module.
The video picture synthesis and processing module is configured to synthesize a first video stream based on multiple video input streams and synthesize a second video stream based on multiple video streams and the first video stream.
The sound effect synthesis and processing module is configured to synthesize a first audio stream and a second audio stream respectively based on multiple audio input streams.
The video multi-version encoding module is configured to encode the first video stream and the second video stream respectively to correspondingly obtain a first video encoding stream set and a second video encoding stream set.
The audio multi-version encoding module is configured to encode the first audio stream and the second audio stream respectively to correspondingly obtain a first audio encoding stream set and a second audio encoding stream set.
The data merging output module is configured to determine the first video encoding stream and/or the first audio encoding stream from the first video encoding stream set and the first audio encoding stream set respectively, and integrate the first video encoding stream and/or the first audio encoding stream into a first output stream which is provided to the user client; the data merging output module is also configured to determine the second video encoding stream and/or the second audio encoding stream from the second video encoding stream set and the second audio encoding stream set respectively, and integrate the second video encoding stream and/or the second audio encoding stream into a second output stream which is provided to the broadcast client.
Referring to
a data input module configured to receive a pull stream instruction from the broadcast client and acquire multiple audio and video data streams.
In addition, the system may further include a decoding cache module, which is configured to decode the audio/video data stream into a video data stream and an audio data stream, and cache the decoded video data stream and the audio data stream separately.
Correspondingly, multiple video input streams and multiple audio input streams are read from caches of the video data stream and the audio data stream respectively.
The above-mentioned video picture synthesis and processing module, the sound effect synthesis and processing module, the video multi-version coding module and the audio multi-version coding module may be integrated into the audio/video synthesis and coding module.
Referring to
The main picture and multiple video input streams may be input together and the user picture may be synthesized using a user picture synthesis module when synthesizing the user picture. Identically, the user picture may be further post-processed using a user picture post-processing module. The post-processing includes, but is not limited to, adding picture watermark, adding text, and adding preset screen effect (such as live virtual gift effect).
Referring to
Then, the main audio and the user audio are synthesized respectively by a main sound effect synthesis module and a user sound effect synthesis module. Specifically, the pre-processed audio frames may be mixed according to the mixing parameters of the main audio and the user audio. The mixing process may include a blending of different sound channels and a mixing of loudness. After synthesizing the main audio and the user audio, the main audio and the user audio may be post-processed respectively by the main sound effect post-processing module and the user sound effect post-processing module. The post-processing includes, but is not limited to, adding external preset sounds such as applause, cheers, whistling, and any audio preset effects.
In one embodiment, the video picture synthesis processing module may also be configured to integrate the video picture of the first video stream and video pictures of multiple video input streams into one video picture, and the video stream corresponding to the integrated video picture is used as the second video stream.
In one embodiment, the video picture synthesis and processing module includes:
an integration parameter determination unit which is configured to pre-create a background picture matching the resolution of the integrated video picture, and determine integration parameters of each video picture to be integrated, where the integration parameters include a picture size, a location and an overlay order; and
a picture addition unit which is configured to add each video picture to be integrated onto the background picture to form the integrated video picture according to the integration parameters.
In one embodiment, the system may further include:
an audio adjustment module which is configured to receive regulation instructions including audio synthesis parameters sent by the broadcast client, and adjust the second audio stream according to the audio synthesis parameters, and feedback the adjusted second audio stream to the broadcast client; and
an audio synchronization module which is configured receive audio synchronization instructions sent by the broadcast client, and adjust the first audio stream according to the audio synthesis parameters, and provide the adjusted first audio stream to the user client.
In one embodiment, the system may further include:
a parameter acquisition module which is configured to acquire required video encoding parameters and audio encoding parameters for each output stream to determine a required encoding parameter set;
a parameter addition module which is configured to compare the required encoding parameter set with the current encoding parameter set, and if these two sets are inconsistent with each other, determine newly added video encoding parameters and/or audio encoding parameters in the required encoding parameter set, and add the newly added video encoding parameters and/or audio encoding parameters into the current encoding parameter set;
a parameter deletion module which is configured to determine target video encoding parameters and/or target audio encoding parameters included in the current encoding parameter set but not included in the required encoding parameter set and remove the target video encoding parameters and/or target audio encoding parameters from the current encoding parameter set; and
an encoding module which is configured to encode the first video stream, the second video stream, the first audio stream and the second audio stream respectively according to the video encoding parameters and the audio encoding parameters in the current encoding parameter set after the adjustment.
In one embodiment, the system may further include:
an output stream addition module which is configured to compare the required output stream set and the current output stream set, and if these two sets are inconsistent with each other, determine the newly added output stream in the required output stream set and establish additional output push stream connections according to the push stream addresses of the newly added output streams, where these additional output stream connections may correspond to the user client and/or the broadcast client and provide newly added output stream to the user client and/or the broadcast client; and
an output deletion module which is configured to determine the target output stream included in the current output stream set but not included in the required output stream set and cancel push stream connections corresponding to the target output stream to stop providing the target output stream.
Referring to
The memory 104 may also be used to store software programs and modules of application software, and the processor 102 may execute a variety of functional applications and data processing by running the software programs and modules which are stored in the memory 104. The memory 104 may include high-speed random-access memory and may also include non-volatile memory such as one or more magnetic storage devices, flash memory or other non-volatile solid-state memory. In some examples, the processor 104 may further include remote memory relative to the processor 102 and the remote memory may connect to the computer terminal 10 via a network. The above-mentioned network examples include, but not limited to, the Internet, enterprise intranets, local area networks, mobile communication networks and combinations thereof.
The transmission device 106 is used receive or transmit data via a network. The above-mentioned specific network examples may further include a wireless network provided by a communication provider of the computer terminal 10. In an example, the transmission device 106 may include a network interface controller (NIC) which may communicate with the Internet by connecting with other network devices via a base station. In an example, the transmission device 106 may be a radio frequency (RF) module which may communicate with the Internet via a wireless method.
It can be seen from the above that, for the technical solution provided by the present disclosure, the broadcast client only needs to release control instructions in the process of audio/video synthesis, and the audio/video synthesis process may be accomplished in the cloud system. Specifically, the cloud system may synthesize the first video stream provided for the user client to view from multiple video input streams when the cloud system is synthesizing videos. At least one video input stream picture may be displayed simultaneously in the video picture of the first video stream. In addition, the cloud system may further synthesize the second video stream provided for the broadcast client to view, and the video picture of the second video stream may include a video picture for each video input stream in addition to the video picture of the first video stream. In such way, the broadcast control staff may conveniently monitor the video picture viewed by the users and the video pictures of currently available video input streams in real time. When synthesizing the audio, the cloud system may separately synthesize the first audio stream provided to the user client and the second audio stream provided to the broadcast client based on multiple audio input streams. Subsequently, when encoding the video stream and the audio stream, the first video encoding stream set, the second video encoding stream set, the first audio encoding stream set and the second audio encoding stream set may be generated using the multi-version encoding method. Multiple different versions of encoding streams may be included in each set. In such way, the video encoding stream and audio encoding stream may be determined correspondingly from each set according to the coding types required by the user client and the broadcast client, and the video encoding stream and the audio encoding stream may be integrated into one output stream, and the output stream may be provided to the user client and the broadcast client. In such way, the user client and the broadcast client may be prevented from using more bandwidth to load multiple audio and video data, and only one output stream is required to load, which may save the bandwidth for the user client and the broadcast client. In addition, in the prior art, the push stream output end usually only uses one encoding method, and then transcodes with multiple different encoding methods, via a live transcoding server, into live streams which are distributed to different users, which may cause higher live delay and also affect the output stream quality. In the present disclosure, the encoding method of the output stream may be flexibly adjusted according to the required encoding methods of the user client and the broadcast client, so the matching output stream may be provided to the user client and the broadcast client and the transcoding step may be eliminated. In such way, it may not only save the waiting time for users, and also reduce the resource consumption in the audio/video synthesis process. For the technical solution provided by the present application, the broadcast client does not need professional hardware devices, and only needs network communication function and page display function, which may greatly reduce the cost in the audio/video synthesis process and also improve generality of the audio/video synthesis method.
In addition, in the general audio/video synthesis process, the staff console normally displays the synthesized viewer picture and pictures of all input streams by separately pulling each input stream and the synthesized output stream. This approach has two problems:
1) the console needs to pull multiple input streams, which makes a high demand for the console bandwidth; and
2) there is no guarantee that, in each stream picture and the synthesized live content pictures displayed by the console, each stream picture is consistent.
The user picture of the present disclosure may combine the synthesized output picture (main picture) and the currently required input stream picture into one video frame, so the front-end broadcast client only needs to pull one user picture stream to achieve the function of conventional broadcast console. In such way, on the one hand, the network bandwidth of the broadcast client is saved; on the other hand, all input streams are acquired from the cloud server and synthesized in the cloud server, which may ensure the synchronization of all stream pictures.
Through the descriptions of aforementioned embodiments, those skilled in the art may clearly understand that the embodiments may be implemented by means of software in conjunction with an essential common hardware platform or may be simply implemented by hardware. Based on such understanding, the essential part of the aforementioned technical solutions or the part that contribute to the prior art may be embodied in the form of software products. The software products may be stored in computer readable storage media, such as ROM/RAM, magnetic disk, and optical disk, and may include a plurality of instructions to enable a computer device (may be a personal computer, a server, or a network device) to execute the methods described in various embodiments or parts of the embodiments.
The foregoing are merely certain preferred embodiments of the present disclosure, and are not intended to limit the present disclosure. Without departing from the spirit and principles of the present disclosure, any modifications, equivalent substitutions, and improvements, etc. shall fall within the scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
201810179713.7 | Mar 2018 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2018/081554 | 4/2/2018 | WO | 00 |