This application claims benefit of priority to Korean Patent Application No. 10-2023-0021021, filed on Feb. 16, 2023, the entire contents of which are incorporated herein by reference.
The present invention relates to a method for wireless Wi-Fi transmission of digital sound with sound synchronization technology applied, and more particularly, to wireless Wi-Fi transmission of digital sound with sound synchronization technology applied, which is a method of transmitting a digital audio signal, such as audio of a digital cinema content standard, as well as audio of a microphone or other analog type via a wireless network by loading a synchronization signal on the digital audio signal and the audio of the microphone or other analog type, and transmitting the audio signals via Wi-Fi so that video and sound may be synchronized by utilizing the loaded synchronization signal, capable of realizing video and sound in various environments by more conveniently and easily implementing all videos including audio without restrictions such as cable connection.
Regarding a conventional drive-in theater, after a ticket office is installed at an entrance of a wide empty lot and a large screen is installed in front of the empty lot, a car is parked in front of the screen so that video projected on the screen is watched in the car, and audio is transmitted in an FM radio so that a car stereo's radio may receive and listen to the audio.
However, since the FM radio structure of the projection system of the conventional drive-in theater may not meet radio station installation operations and rules of the current Radio Act and the Enforcement Decree of the Radio Act, and does not correspond to wireless stations that may be opened without reporting, legal wireless sound transmission in a drive-in theater is virtually non-existent.
Accordingly, in a situation where the need for a non-face-to-face and non-contact means is emerging, a realistically safe method of attracting and gathering customers is required, so demand for drive-in theaters is increasing. Therefore, it is necessary to propose a method for transmitting a sound signal that may legally operate a drive-in theater by replacing the existing FM propagation method.
However, when transmitting the digital sound signal to a wireless network using Wi-Fi, this audio transmission interval varies depending on the network environment or situations, and due to this variability, the synchronization of the video and audio may be out of sync. Moreover, when a size per packet of audio transmitted wirelessly is too low, the quality of the audio is degraded, and when a size per packet of audio transmitted wirelessly is too large, problems that are difficult to cope with transmission errors may occur. It can be seen that the detailed causes may include a case where a network rate is slowed down due to reasons such as environment, a case where a packet loss occurs due to network conditions, a case where buffer underrun occurs, or the like. In addition, when sound is wirelessly transmitted via Wi-Fi in addition to the existing wireless transmission environment of sound through FM, a time difference between two sounds may occur due to the difference in signal processing between digital and analog.
Accordingly, there is a growing need for technology development to solve the problem with the audio being out of synchronization for smooth video viewing even when audio is wirelessly transmitted.
Meanwhile, the above-described background art is technical information retained by the inventor to derive the present disclosure or acquired by the inventor while deriving the present disclosure, and thus should not be construed as art that was publicly known prior to the filing date of the present disclosure.
One aspect of the present invention provides a technology of synchronizing video and sound when converting a sound signal of digital content according to the international standard (Digital Cinema Package (DCI)) for digital cinema content without modulation and streaming so that a mobile device may receive a sound signal of a digital content layer through Wi-Fi communication, and more specifically, a method for wireless Wi-Fi transmission of digital sound with sound synchronization technology applied capable of maintaining synchronization of the video and audio in an optimal state by causing a streaming server to transmit specific information data to a receiving device in every packet transmission and an audio play app of the receiving device to analyze information data of packet data being implemented/executed and the received packet data and cope with immediately when synchronization is out of sync, simultaneously support a digital output and a general analog output of a sound signal so that they may be used in a wireless transmission environment through an existing FM, and correcting a time difference between analog and digital by confirming that the time difference between the digital type and the analog type does not occur.
Technical problems of the present disclosure are not limited to the above-mentioned objects. That is, other technical problems that are not mentioned may be obviously understood by those skilled in the art from the following description.
In an aspect of the present disclosure, a system for wireless Wi-Fi transmission of digital sound with sound synchronization technology applied includes: a digital projection device that stores and outputs digital content; a streaming server that is connected to the digital projection device and transmitting digital cinema contents received from the digital projection device to an access point; and a mobile device that receives a sound signal of a digital cinema content transmitted from the streaming server through the access point, and is connected to a sound output device through wired or wireless communication so that a sound signal of the digital cinema content is output through the sound output device.
The streaming server may simultaneously support digital output and general analog output of a sound signal so that they may be used in a wireless transmission environment through existing FM, and at this time, store the analog audio signal in a buffer until a time when a digitally converted audio signal is output according to a program recorded in a built-in digital signal processor (DSP) so that a time difference between the digital and analog types does not occur.
The streaming server may convert an audio signal of the digital cinema content into a packet in units of predetermined frames, and transmit the converted packet to the access point before a video signal of the digital cinema content by a predetermined time, and transmit the converted packet to each packet to the access point along with a packet number assigned to each packet, and
The mobile device may
The detailed description of the present invention set forth below refers to the accompanying drawings, which show by way of illustration specific embodiments in which the invention may be practiced. These embodiments will be described in detail for those skilled in the art in order to practice the present invention. It should be appreciated that various exemplary embodiments of the present invention are different from each other, but do not have to be exclusive. For example, specific shapes, structures, and characteristics described in the present specification may be implemented in another exemplary embodiment without departing from the spirit and the scope of the present invention in connection with an exemplary embodiment. In addition, it should be understood that a position or an arrangement of individual components in each disclosed exemplary embodiment may be changed without departing from the spirit and the scope of the present invention. Therefore, a detailed description described below should not be construed as being restrictive. In addition, the scope of the present invention is defined only by the accompanying claims and their equivalents if appropriate. Similar reference numerals will be used to describe the same or similar functions throughout the accompanying drawings.
Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings.
In the case of using the wireless digital transmission system of sound using Wi-Fi disclosed in Korea Patent Publication No. 10-2252541, which is the prior application of the present applicant, to output sound of a specific video, such as in a real drive-in theater, the system for wireless Wi-Fi transmission of digital sound with a sound synchronization technology applied according to the present invention solves the problem that a speed of audio gradually S becomes slower or faster to maintain the synchronization of video and audio in an optimal state when receiving and outputting audio from a real-time streaming server via a wireless network on a device such as a smartphone.
Referring to
The digital projection device 300 may include a server device in the form of a box having a plurality of input/output terminals, and may store and output digital cinema content. For example, the digital projection apparatus 100 is a general digital projection system, and may scan a digitally recorded image signal on a screen and simultaneously reproduce a recorded sound signal.
The streaming server 100 is connected to the digital projection device 300 and converts and outputs digital cinema content received from the digital projection device 300.
The streaming server 100 may be in the form of a computer, a server, or an engine, and may be called other terms such as a device, an apparatus, a terminal, user equipment (UE), a mobile station (MS), a mobile terminal (MT), a user terminal (UT), a subscriber station (SS), a wireless device, personal digital assistant (PDA), a wireless modem, a handheld device, etc.
The streaming server 100 may adopt various types of operating systems, and may be loaded and execute a program for converting a sound signal according to a wireless digital transmission method of sound using Wi-Fi according to an embodiment of the present invention. For example, as the operating system, Linux, which is efficient in terms of memory usage, while processing sound signals in various formats, including Digital Cinema Initiatives (DCI) for digital cinema content, may be applied, but is not limited thereto, and window and MacOS may be applied.
The mobile device 200 is a device capable of communication and input/output of information, and may be equipped with, for example, a smart phone, a tablet PC, etc.
The mobile device 200 may be loaded with and execute an application for receiving and reproducing a sound signal, and may receive a sound signal streamed from the streaming server 100 using the corresponding application. In this case, the mobile device 200 may receive a sound signal streamed from the streaming server 100 through a Wi-Fi communication network, and it is preferable that a version of 802.11 ac or higher is applied to the Wi-Fi communication network. This is to prevent data transmission speed from being lowered even when a plurality of mobile devices 200 are simultaneously connected.
The mobile device 200 is connected to a sound output device through wired or wireless communication so that a sound signal of cinema content is output through the sound output device.
Here, the sound output device may be a speaker provided inside a vehicle, but is not limited thereto, and the type of the sound output device, such as a speaker provided in another device or a speaker installed in a movable form, is not limited thereto.
The system for synchronizing video and sound in wireless digital transmission of sound using Wi-Fi according to the embodiment of the present invention is applied to a drive-in theater system, and video signals of digital cinema content are output to a large screen provided outside the vehicle, and the sound signal of the digital content is output from a sound output device installed inside the vehicle or another sound output device provided in the drive-in theater system.
Here, by minimizing data processing steps through digital signal processing without modulation of a digital sound signal into an analog form, it is possible to minimize sound data loss or distortion, promote a sound quality improvement effect, and utilize a sound field effect.
In addition, by transmitting a sound signal through a Wi-Fi communication network to replace the existing FM communication method that violates the Enforcement Decree of the Radio Waves Act, it is possible to implement a legal drive-in theater operation, and furthermore, to prevent indiscriminate waste of radio wave resources.
Hereinafter, the method for wireless digital transmission of sound using Wi-Fi according to the embodiment of the present invention will be described in detail.
Referring to
The outputting, by the digital projection device 300, the digital cinema content according to the DCI (S10) may include outputting, by the digital projection device 300, a video signal and a sound signal of the digital cinema content.
Here, the video signal of the digital cinema content may be projected onto a large screen provided in the drive-in theater.
Also, the sound signal of the digital cinema content may be input to the streaming server 100 through a cable connected to an output terminal of the digital projection device 300. In this case, the sound signal of the digital cinema content may adopt a DCI sound format. As illustrated in
Referring to
Referring to
Referring to
The streaming server 100 may be connected to the digital projection device 300 through a connection cable and receive the sound signal output from the digital projection device 300. In this case, the connection cable may be connected to the output terminal of the digital projection device 300 as described above, and it is recommended to adopt an AES/EBU method to prevent distortion or conversion of a 16-channel sound signal according to the DCI sound format.
The streaming server 100 may convert the sound signal received from the digital projection apparatus 300 into a form capable of being output from the mobile device 200 through a channel coupling method. That is, the streaming server 100 may convert the 16-channel sound signal according to the DCI sound format into a 2-channel sound signal. Table 1 below is an example of a channel coupling map used by the streaming server 100 to convert the 16-channel sound signal according to the DCI sound format into a 2-channel sound signal.
The streaming server 100 may encode the converted 2-channel sound signal into a form capable of real-time streaming.
In this case, it is preferable that the streaming server 100 encodes the sound signal into an OPUS form. The OPUS codec method has a fast encoding speed and there is no difference in quality according to bitrate. This is because, when compared to WAV, MP3, AAC, Real Audio, etc., the sound quality is the best.
The streaming server 100 may transmit the encoded sound signal to the mobile device 200. In this case, the sound signal processing device may stream the sound signal in a real-time protocol (RTP) method.
The real-time transmission protocol shows excellent performance in synchronization at a maximum delay speed of 80 ms, and since it is only responsible for the transmission of streaming data, one-way transmission from a server to a client is possible, so the resource consumption required for signal processing is low and thus a delay speed may be stably managed.
Here, the streaming server 100 may synchronize the video signal projected on the screen and the sound signal transmitted to the mobile device 200. A detailed synchronization method of the sound signal will be described later.
The receiving, by the mobile device 200, the sound signal from the streaming server 100 and outputting the sound signal through the sound output device provided in the vehicle 30 or the drive-in theater system may include receiving and outputting, by the mobile device 200, a sound signal streamed from the streaming server 100 through a Wi-Fi communication network.
The mobile device 200 may be loaded with an application for receiving and reproducing a sound signal and executed, and may receive the sound signal streamed from the streaming server 100 using the corresponding application. In this case, the mobile device 200 may receive the sound signal streamed from the streaming server 100 through the Wi-Fi communication network, and it is preferable that the version of 802.11 ac or higher is applied to the Wi-Fi communication network. This is to prevent data transmission speed from being lowered even when a plurality of mobile devices 200 are simultaneously connected.
The mobile device 200 may implement a codec for reproducing the sound signal in the OPUS form using the application for receiving and reproducing the sound signal, and may be connected to a sound output device provided inside the vehicle 30 or another sound output device provided in the drive-in theater system through wired or wireless communication such as aux, USB, or Bluetooth, thereby reproducing a sound signal through a speaker provided in the vehicle 30.
Meanwhile, as described above, the system for synchronizing video and sound in wireless digital transmission of sound using Wi-Fi according to the present invention may synchronize sound signals and video signals in various ways.
In one embodiment, the streaming server 100 may perform flow control on a time code of the sound signal output from the digital projection apparatus 100 to confirm whether there is a synchronization difference between the video signal projected on the screen and the sound signal transmitted to the mobile device 200. For example, the streaming server 100 may monitor the degree of change in the packet reception interval of the sound signal transmitted to the mobile device 200 using real-time control protocol (RTCP), and when the packet reception interval is confirmed to be greater than or equal to a threshold value, it may be considered that the synchronization difference between the video signal scanned on the screen and the sound signal transmitted to the mobile device 200 has occurred.
When the streaming server 100 confirms that there is the synchronization difference between the video signal scanned on the screen and the sound signal transmitted to the mobile device 200, the synchronization may be adjusted by streaming the sound signal on the time code of the video signal.
In one embodiment, the streaming server 100 transmits specific information data to the receiving device at every packet transmission, and the audio play app of the receiving device 200 may analyze the information data of the packet data being implemented/executed and the received packet data and cope with immediately when the synchronization is out of sync, thereby maintaining the synchronization of the video and audio in an optimal state.
First, as illustrated in
On the other hand, as illustrated in
In order to solve this problem, the system according to the present invention records the received packets in the device memory in the form of a loop array as illustrated in
The mobile device starts to play sound when the sum of the received packets recorded in the form of a loop array fills up the audio buffer of the memory (S300).
Since the size of the audio buffer differs depending on the device and the performance of the access point differs, a sound output timing control function may be implemented in the application so that a user may synchronize the video and sound of the mobile device by confirming the first video and the sound of the mobile device (first user synchronizes the video and sound once, S400).
From the time when the sound is implemented, the audio playback time and the currently implementing packet location are recorded (S500).
In the process of synchronizing video and sound in wireless digital transmission of sound using Wi-Fi according to the present invention described above, in most cases where the synchronization of video and audio is out of sync, the optimal synchronization technology is implemented according to the situation in each part of the system to solve the problem.
Hereinafter, a detailed algorithm of the synchronization adjustment method for each situation will be described.
1. A Method for Adjusting Synchronization Error according to network speed difference
In the step S100 described above, packets are transmitted at intervals of 20 ms based on 960 frames, and the audio playback time is confirmed as in {circle around (5)}. Therefore, it is possible to calculate the audio playback time/20 ms=the number of packets required for audio transmission (e.g., 1 second=1000 ms=50 packets).
While the streaming is in progress, the device application compares the required number of packets in {circle around (1)} above with the total number of received packets in real time.
In this case, 1-1) when the number of required packets is less than the total number of received packets (the number of required packets>the total number of received packets), it is determined that the transmitted packet has not been received yet due to the network speed delay. In this case, the application of the smart device refreshes the network of the device (reconnects after termination). In this case, the total number of packets calculated in step S200 described above increases by a certain number (for example, +1), and the packet location measured in step S500 described above is also adjusted to the packet increased by the corresponding packet.
On the other hand, 1-2) when the number of required packets is greater than the total number of received packets (the number of required packets<the total number of received packets), it is determined that packets congested due to the network speed delay rush in at once and are received. In this case, the application of the smart device refreshes the network of the device (reconnects after termination). In this case, the total number of packets calculated in step S200 described above decreases by a certain number (for example, −1), and the packet location measured in step S500 described above is also adjusted to the packet decreased by the corresponding packet.
When the sequence number added to the packet header transmitted from the streaming server in step S200 described above and the sequence number confirmed in the application of the mobile device are not consecutive, the smart device determines this as a packet loss.
In this case, the lost packet in the mobile application is replaced by the next packet, transmits sound, and refreshes the network of the device (reconnect after termination).
3. When Synchronization is Out of Sync Due to Buffer underrun Caused by Continuous Network Delay.
The buffer underrun is a case where there is not enough packet data left in the audio buffer because the size of the recorded packet buffer is smaller than the size of the audio buffer, and a plurality of mobile devices try to maintain packets in the buffer as much as possible by adjusting the output speed when such a buffer underrun occurs, so problems such as sound delay occur.
In order to prevent the above problem, the mobile device records a size of accumulated buffer data transmitted to the audio buffer from the memory where the packet is received, records a size of actual output and implemented audio data in the audio buffer, and calculates the size of the implemented audio data from the size of the accumulated buffer data to calculate a size of a buffer waiting for audio transmission.
The mobile device may compare the calculated size of the standby buffer with a minimum buffer size required to implement the sound, and when the size of the standby buffer is smaller than the size of the minimum buffer, record the same packet as the currently playing received packet once more, thereby prevent the underrun of the buffer.
In some other embodiments, the method for synchronizing video and sound in wireless digital transmission of sound using Wi-Fi according to another embodiment of the present invention may further include receiving and analyzing, by the streaming server 100, the evaluation of digital cinema content from the mobile device 200.
The streaming server 100 may request and receive an evaluation of digital cinema content through an application for receiving and reproducing sound signals executed in the mobile device 200. For example, the evaluation items may include input of comments on contents of the digital cinema content, input of comments on projection of the digital cinema contents, and the like, and evaluation result comments in the form of text for each item may be input.
The streaming server 100 may calculate the rating of the digital cinema content by analyzing the evaluation result text for the digital cinema content received from the mobile device 200.
For example, the streaming server 100 may build an artificial neural network that extracts contextual information from input data.
Here, the input data may be evaluation result text for digital cinema content.
The streaming server 100 may accumulate and store the evaluation result text for the digital cinema content received from the mobile device 200, and extract the stored evaluation result text for the digital cinema content as training data.
The streaming server 100 may build a neural network that extracts contextual information about the input data by learning the training data using the Word2Vec algorithm.
The Word2Vec algorithm may include a neural network language model (NNLM). The neural network language model is basically a neural network including an input layer, a projection layer, a hidden layer, and an output layer. The neural network language model is what is used to vectorize words. Since the neural network language model is a well-known technology, a detailed description thereof will be omitted.
The Word2vec algorithm is for text mining and is an algorithm that determines proximity by looking at the relationship between the front and back of each word. The Word2vec algorithm is an unsupervised learning algorithm. As the name suggests, the Word2vec algorithm may be a metric technique that expresses the meaning of a word in a vector form. The Word2vec algorithm may represent each word as a vector in a space of about 200 dimensions. Using the Word2vec algorithm, a vector corresponding to a word may be obtained for each word.
The Word2vec algorithm may dramatically improve precision in the field of natural language processing compared to other conventional algorithms. The Word2vec may train the meaning of words by using the relationship between words in sentences of an input corpus and adjacent words. The Word2vec algorithm is based on an artificial neural network and starts from the premise that words with the same context have close meanings. The Word2vec algorithm performs training through text documents, and allows the artificial neural network to train, as related words, other words that appear nearby (about 5 to 10 words before and after) a word. Since words with related meanings are highly likely to appear close to each other in the document, two words may gradually have close vectors in the process of repeating learning.
The learning method of the Word2vec algorithm includes a continuous bag of words (CBOW) method and a skip-gram method. The CBOW method predicts a target word using the context created by surrounding words. The skip-gram method predicts words that may come around based on one word. For large datasets, the skip-gram method is known to be more accurate.
Therefore, in the embodiment of the present invention, the Word2vec algorithm using the skip-gram method is used. For example, when the training is successfully completed through the Word2vec algorithm, similar words in a high-dimensional space may be located nearby. According to the above-described Word2vec algorithm, the calculated vector value may be similar to a word having a closer distribution of neighboring words in a learning document, and words having similar calculated vector values may be regarded as similar. Since the Word2vec algorithm is a well-known technology, a detailed description of vector value calculation will be omitted.
The streaming server 100 may divide a rating level into a plurality, and set evaluation criterion text corresponding to each rating level. For example, the streaming server 100 may acquire evaluation standard text for each rating level by accessing an external server to which ratings have been assigned by experts according to the evaluation result of the digital cinema content.
The streaming server 100 may input evaluation criterion text for each rating level to the neural network, and extract reference vector values for each rating level representing context information about the evaluation standard text for each rating level.
The streaming server 100 may extract an evaluation result vector value representing context information by inputting the evaluation result text for the digital cinema content received from the mobile device 200 to the neural network.
The streaming server 100 may calculate a similarity between the evaluation result vector value and a plurality of reference vector values, and extract a reference vector value having the highest similarity with the evaluation result vector value among the plurality of reference vector values. In this case, as a similarity calculation method, a Euclidean distance, a cosine similarity, a Tanimoto coefficient, and the like may be adopted.
The streaming server 100 may calculate the rating level corresponding to the reference vector value having the highest similarity with the evaluation result vector value as the rating of the currently playing digital cinema content.
The streaming server 100 may match and store the calculated ratings of the digital cinema content, and may guide ratings for each digital cinema content through an application running on the mobile device 200.
In some other embodiments, the streaming server 100 extracts keywords from the evaluation result text (e.g., review data) for the digital cinema content received from the mobile device 200, and classifies a plurality of text data for each evaluation item as a result of comparing the extracted keywords with a pre-trained keyword dictionary.
The streaming server 100 calculates continuity scores between keywords extracted from text data classified into the same category, and extracts text data having a continuity score higher than a preset reference score as representative text data.
In this case, the continuity score is calculated according to Equation 1 below.
Here, Cn denotes a continuity score of nth text data, Vcn denotes an embedding vector of a keyword extracted from the nth text data, Vcf denotes an embedding vector for a word immediately preceding the keyword extracted from the nth text data, and Vcb denotes an embedding vector for the word immediately after the keyword extracted from the nth text data.
For example, when the nth text data is review data composed of words A, B, C, D, and E, and word C among the words is extracted as a keyword, Vcn may be determined as an embedding vector for the word C, and Vcf may be determined as an embedding vector for the word B, and Vcb may be determined as an embedding vector for the word D.
Thereafter, the streaming server 100 calculates an individual evaluation score for each extracted representative text data, and calculates an average value for a plurality of individual evaluation scores as a final evaluation score of the digital cinema content.
In this case, the individual evaluation score is calculated according to Equation 2 below.
Here, ri denotes an individual evaluation score of an i-th representative text data, Vj denotes an embedding vector for a j-th word of the i-th representative text data, Vzj denotes an average value of embedding vectors for words other than the j-th word of the i-th representative text data, and aj denotes a weight value for the j-th word of the i-th representative text data.
In this way, the streaming server 100 may improve the reliability of the evaluation result by calculating the evaluation score of the digital cinema content through big data analysis and the above Equation based on the text data collected for each digital cinema content.
As described above, the technology according to the present invention may be implemented as an application or implemented in the form of program instructions that may be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include a program command, a data file, a data structure, or the like, alone or a combination thereof.
The program commands recorded in the computer-readable recording medium may be especially designed and constituted for the present invention or be known to those skilled in a field of computer software.
Examples of the computer-readable recording medium may include a magnetic medium such as a hard disk, a floppy disk, or a magnetic tape, an optical recording medium such as a compact disk read only memory (CD-ROM) or a digital versatile disk (DVD), a magneto-optical medium such as a floptical disk, and a hardware device specially configured to store and execute program commands, such as a read only memory (ROM), a random access memory (RAM), a flash memory, or the like.
Examples of the program commands include a high-level language code capable of being executed by a computer using an interpreter, or the like, as well as a machine language code made by a compiler. The above-described hardware device may be constituted to be operated as one or more software modules to perform processing according to the present disclosure, and vice versa.
Although the embodiments of the present invention have been disclosed hereinabove, it may be understood by those skilled in the art that the present invention may be variously modified and altered without departing from the scope and spirit of the present invention described in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0021021 | Feb 2023 | KR | national |