One aspect of the present invention relates to a medium processing device, a medium processing method, and a medium processing program.
In recent years, a video/audio reproduction device that digitizes a video/audio obtained by capturing and recording at a certain point, transmits the digitized video/audio to a remote location in real time via a communication line such as an Internet Protocol (IP) network, and reproduces the video/audio in the remote location has been used. For example, public viewing or the like in which a video/audio of a sports competition held at a competition site or a video/audio of a music concert held at a concert site are transmitted to a remote location in real time is actively performed. Such video/audio transmission is not limited to one-to-one unidirectional transmission. Bidirectional transmission is also performed in which a video/audio is transmitted from a site where a sports competition is held (hereinafter, referred to as an event site) to a plurality of remote locations, a video of an audience who enjoys the event and an audio of cheers and the like are obtained by capturing and recording in each of the plurality of remote locations, the video/audio is transmitted to the event site or another remote location, and the video/audio are output from a large video display device or a speaker at each base.
By such bidirectional transmission of a video/audio, players (or performers) and an audience in an event site, and viewers in a plurality of remote locations can obtain a realistic feeling and a sense of unity as if they were in the same space (event site) and having the same experience even though they were physically located away from each other.
A real-time transport protocol (RTP) is often used in real-time transmission of a video/audio by an IP network, but a data transmission time between two bases varies depending on a communication line or the like connecting the two bases. For example, a case is considered in which a video/audio obtained by capturing and recording at a time T at an event site A is transmitted to two remote locations B and C, and a video/audio obtained by capturing and recording in each of the remote locations B and C is returned and transmitted to the event site A. The video/audio obtained by capturing and recording at the time T and transmitted from the event site A in the remote location B is reproduced at a time Tb1, and a video/audio obtained by capturing and recording at the time Tb1 in the remote location B is returned and transmitted to the event site A and reproduced at a time Tb2 at the event site A. At this time, the video/audio obtained by capturing and recording at the time T and transmitted at the event site A may be reproduced at a time Tc1 (≠Tb1) in the remote location C and a video/audio obtained by capturing and recording at the time Tc1 in the remote location C may be returned and transmitted to the event site A and reproduced at a time Tc2 (≠Tb2) at the event site A.
In such a case, players (or performers) and an audience in the event site A view videos/audios indicating how viewing in the plurality of remote locations has reacted to an event experienced by themselves at the time T at different times (time Tb2 and time Tc2). For the players (or performers) and the audience in the event site A, enhancing a sense of unity with the audience in the remote locations may be difficult because lack of intuitive comprehension or unnaturalness of connection with their own experience is caused. Furthermore, also when the video/audio transmitted from the event site A and the video/audio transmitted from the remote location B are reproduced in the remote location C, the audience in the remote location C may feel the above-described lack of intuitive comprehension or unnaturalness.
In order to eliminate such lack of intuitive comprehension or unnaturalness, conventionally, a method of synchronously reproducing a plurality of videos/plurality of audios transmitted from a plurality of remote locations in the event site A is used. In a case where reproduction timings of videos/audios are synchronized, time synchronization is performed using a network time protocol (NTP), a precision time protocol (PTP), or the like so that both the transmission side and the reception side manage the same time information, and video/audio data is packetized into RTP packets at the time of transmission. At this time, in general, an absolute time of a moment of sampling the videos/audios is provided as an RTP time stamp, and the reception side delays at least one or more videos and audios of videos and audios on the basis of the time information to adjust timings, and synchronizes the videos/audios (Non Patent Literature 1).
Non Patent Literature 1: Synchronization for Acoustic Signals over IP Network (Tokumoto, Ikedo, Kaneko, Kataoka, the transactions of the Institute of Electronics, Information and Communication Engineers D-II Vol. J87-D-II No. 9 pp. 1870-1883)
However, in the conventional video/audio reproduction synchronization method, a reproduction timing is matched with a video or audio having the largest delay time, and there is an issue that the real-time property of reproduction timings of videos/audios is lost, and discomfort felt by viewers is difficult to be reduced. That is, reproduction of videos/audios needs to be devised so that the above-described discomfort felt by viewers when a plurality of videos/audios transmitted from a plurality of bases is reproduced at different times is reduced.
The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technology of reducing discomfort felt by viewers when a plurality of videos/audios transmitted from a plurality of bases at different times is reproduced.
In an embodiment of the present invention, a medium processing device is a device of a first base, including a reception unit that receives a packet that stores a second medium acquired in a second base at a time at which a first medium acquired at a first time in the first base is reproduced in the second base, and a processing unit that generates a third medium from the second medium according to a processing mode based on a second time associated with reception of a packet that stores the second medium and the first time, and outputs the third medium to a presentation device.
According to one aspect of the present invention, discomfort felt by viewers when a plurality of videos/audios transmitted from a plurality of bases at different times is reproduced can be reduced.
Hereinafter, some embodiments according to the present invention will be described with reference to the drawings.
Time information uniquely determined with respect to an absolute time at which a video/audio is obtained by capturing and recording in a base O serving as an event site such as a competition site or a concert site is provided to a video/audio transmitted to bases R1 to Rn (n is an integer of 2 or more) in a plurality of remote locations. In each of the bases R1 to Rn, a video/audio obtained by capturing and recording at a time at which a video/audio having the time information is reproduced is associated with the time information. When reproducing a video/audio transmitted from each of the bases R1 to Rn in the base O, processing is performed on the video/audio on the basis of the time information and the reproduction is performed.
The time information is transmitted and received between the base O and each of the bases R1 to Rn by any one of the following means. The time information is associated with a video/audio obtained by capturing and recording in each of the bases R1 to Rn.
(1) The time information is stored in a header extension area of an RTP packet transmitted and received between the base O and each of the bases R1 to Rn. For example, the time information is in an absolute time format (hh:mm:ss.fff format), but may be in a millisecond format.
(2) The time information is described using an application-defined (APP) in an RTP control protocol (RTCP) transmitted and received at constant intervals between the base O and each of the bases R1 to Rn. In this example, the time information is in a millisecond format.
(3) The time information is stored in a session description protocol (SDP) that describes initial value parameters exchanged between the base O and each of the bases R1 to Rn at the start of transmission. In this example, the time information is in a millisecond format.
A first embodiment is an embodiment in which videos/audios transmitted from the bases R1 to R2 back in the base O are processed and reproduced.
Time information used for processing a video/audio is stored in a header extension area of an RTP packet transmitted and received between the base O and each of the bases R1 to Rn. For example, the time information is in an absolute time format (hh:mm:ss.fff format).
Although a video and an audio are described as being transmitted and received in RTP packetization, the present invention is not limited thereto. The video and audio may be processed and managed by the same functional unit/database (DB). Both the videos and audios may be stored in one RTP packet and transmitted and received. The video and audio are examples of a medium.
The medium processing system S includes a plurality of electronic devices included in the base O, a plurality of electronic devices included in each of the bases R1 to Rn, and a time distribution server 10. The electronic devices in each of the bases and the time distribution server 10 can communicate with each other via an IP network.
The base O includes a server 1, an event video capturing device 101, a return video presentation device 102, an event audio recording device 103, and a return audio presentation device 104. The base O is an example of a first base.
The server 1 is an electronic device that controls each of the electronic devices included in the base O. The server 1 is an example of a medium processing device.
The event video capturing device 101 is a device including a camera that captures a video of the base O. The event video capturing device 101 is an example of a video capturing device.
The return video presentation device 102 is a device including a display that reproduces and displays a video returned and transmitted from each of the bases R1 to Rn to the base O. For example, the display is a liquid crystal display. The return video presentation device 102 is an example of a video presentation device or a presentation device.
The event audio recording device 103 is a device including a microphone that records an audio of the base O. The event audio recording device 103 is an example of an audio recording device.
The return audio presentation device 104 is a device including a speaker that reproduces and outputs an audio returned and transmitted from each of the bases R1 to Rn to the base O. The return audio presentation device 104 is an example of an audio presentation device or a presentation device.
A configuration example of the server 1 will be described.
The server 1 includes a control unit 11, a program storage unit 12, a data storage unit 13, a communication interface 14, and an input/output interface 15. The elements included in the server 1 are connected to each other via a bus.
The control unit 11 corresponds to a central part of the server 1. The control unit 11 includes a processor such as a central processing unit (CPU). The control unit 11 includes a read only memory (ROM) as a nonvolatile memory area. The control unit 11 includes a random access memory (RAM) as a volatile memory area. The processor deploys the ROM or a program stored in the program storage unit 12 in the RAM. The processor executes a program deployed in the RAM, thereby the control unit 11 implements each functional unit described below. The control unit 11 is included in a computer.
The program storage unit 12 includes a non-volatile memory capable of writing and reading as needed, such as a hard disk drive (HDD) or a solid state drive (SSD) as a storage medium. The program storage unit 12 stores programs necessary for executing various types of control processing. For example, the program storage unit 12 stores a program for causing the server 1 to execute processing by each functional unit to be described below implemented by the control unit 11. The program storage unit 12 is an example of a storage.
The data storage unit 13 includes a non-volatile memory capable of writing and reading as needed, such as an HDD or an SSD as a storage medium. The data storage unit 13 is an example of a storage or a storage unit.
The communication interface 14 includes various interfaces that communicatively connect the server 1 to other electronic devices using a communication protocol defined by the IP network.
The input/output interface 15 is an interface that enables communication between the server 1 and each of the event video capturing device 101, the return video presentation device 102, the event audio recording device 103, and the return audio presentation device 104. The input/output interface 15 may include an interface for wired communication or an interface for wireless communication.
Note that a hardware configuration of the server 1 is not limited to the above-described configuration. The server 1 can appropriately omit and change the above-described components and add a new component.
The base R1 includes a server 2, a video presentation device 201, an offset video capturing device 202, a return video capturing device 203, an audio presentation device 204, and a return audio recording device 205. The base R1 is an example of a second base different from the first base.
The server 2 is an electronic device that controls each of the electronic devices included in the base R1.
The video presentation device 201 is a device including a display that reproduces and displays a video transmitted from the base O to the base R1. The video presentation device 201 is an example of the presentation device.
The offset video capturing device 202 is a device capable of recording a capturing time. The offset video capturing device 202 is a device including a camera installed so as to be able to capture the entire video display area of the video presentation device 201. The offset video capturing device 202 is an example of the video capturing device.
The return video capturing device 203 is a device including a camera that captures a video of the base R1. For example, the return video capturing device 203 captures a video of a state of the base R1 where the video presentation device 201 that reproduces and displays a video transmitted from the base O to the base R1 is installed. The return video capturing device 203 is an example of the video capturing device.
The audio presentation device 204 is a device including a speaker that reproduces and outputs an audio transmitted from the base O to the base R1. The audio presentation device 204 is an example of the presentation device.
The return audio recording device 205 is a device including a microphone that records an audio of the base R1. For example, the return audio recording device 205 records an audio of a state of the base R1 where the audio presentation device 204 that reproduces and outputs an audio transmitted from the base O to the base R1 is installed. The return audio recording device 205 is an example of the audio recording device.
A configuration example of the server 2 will be described.
The server 2 includes a control unit 21, a program storage unit 22, a data storage unit 23, a communication interface 24, and an input/output interface 25. The elements included in the server 2 are connected to each other via a bus.
The control unit 21 can be formed similarly to the control unit 11. A processor deploys a ROM or a program stored in the program storage unit 22 in a RAM. The processor executes a program deployed in the RAM, thereby the control unit 21 implements each functional unit described below. The control unit 21 is included in a computer.
The program storage unit 22 can be formed similarly to the program storage unit 12.
The data storage unit 23 can be formed similarly to the data storage unit 13.
The communication interface 24 can be formed similarly to the communication interface 14. The communication interface 14 includes various interfaces that communicatively connect the server 2 to other electronic devices.
The input/output interface 25 can be formed similarly to the input/output interface 15. The input/output interface 25 enables communication between the server 2 and each of the video presentation device 201, the offset video capturing device 202, the return video capturing device 203, the audio presentation device 204, and the return audio recording device 205.
Note that a hardware configuration of the server 2 is not limited to the above-described configuration. The server 2 can appropriately omit and change the above-described components and add a new component.
Since hardware configurations of a plurality of electronic devices included in each of bases R2 to Rn are similar to those of the base R1 described above, description thereof will be omitted.
The time distribution server 10 is an electronic device that manages a reference system clock. The reference system clock is an absolute time.
The server 1 includes a time management unit 111, an event video transmission unit 112, a return video reception unit 113, a return video processing unit 114, an event audio transmission unit 115, a return audio reception unit 116, and a return audio processing unit 117. Each functional unit is implemented by execution of a program by the control unit 11. It can also be said that each functional unit is included in the control unit 11 or the processor. Each functional unit can be read as the control unit 11 or the processor.
The time management unit 111 performs time synchronization with the time distribution server 10 using a known protocol such as NTP or PTP, and manages the reference system clock. The time management unit 111 manages the same reference system clock as the reference system clock managed by the server 2. The reference system clock managed by the time management unit 111 and the reference system clock managed by the server 2 are time-synchronized.
The event video transmission unit 112 transmits an RTP packet that stores a video Vsignal1 output from the event video capturing device 101 to each of servers of the bases R1 to Rn via the IP network. The video Vsignal1 is a video acquired at a time Tvideo that is an absolute time in the base O. Acquiring the video Vsignal1 includes capturing the video Vsignal1 by the event video capturing device 101. Acquiring the video Vsignal1 includes sampling the video Vsignal1 obtained by capturing by the event video capturing device 101. A time Tvideo is provided to the RTP packet that stores the video Vsignal1. The time Tvideo is a time at which the video Vsignal1 is acquired in the base O. The time Tvideo is time information for processing return videos at the base O. The video Vsignal1 is an example of a first video. The time Tvideo is an example of a first time. An RTP packet is an example of a packet. The event video transmission unit 112 is an example of a transmission unit.
The return video reception unit 113 receives an RTP packet that stores a video Vsignal1 from each of servers of the bases R1 to Rn via the IP network. The video Vsignal2 is a video acquired in any one of the bases R1 to Rn at a time at which the video Vsignal1 is reproduced in this base. Acquiring the video Vsignal2 includes capturing the video Vsignal2 by the return video capturing device 203. Acquiring the video Vsignal2 includes sampling the video Vsignal2 obtained by capturing by the return video capturing device 203. The time Tvideo is provided to the RTP packet that stores the video Vsignal2. The video Vsignal2 is an example of a second video. The return video reception unit 113 is an example of a reception unit.
The return video processing unit 114 generates a video Vsignal3 from the video Vsignal2 and outputs the video Vsignal3 to the return video presentation device 102. The video Vsignal3 is an example of a third video. The return video processing unit 114 is an example of a processing unit.
The event audio transmission unit 115 transmits an RTP packet that stores an audio Asignal1 output from the event audio recording device 103 to each of the servers of the bases R1 to Rn via the IP network. The audio Asignal1 is an audio acquired at a time Taudio that is an absolute time in the base O. Acquiring the audio Asignal1 includes recording the audio Asignal1 by the event audio recording device 103. Acquiring the audio Asignal1 includes sampling the audio Asignal1 obtained by recording by the event audio recording device 103. The time Taudio is provided to the RTP packet that stores the audio Asignal1. The time Taudio is a time at which the audio Asignal1 is acquired in the base O. The time Taudio is time information for processing return audios at the base O. The audio Asignal1 is an example of a first audio. The time Taudio is an example of the first time. The event audio transmission unit 115 is an example of the transmission unit.
The return audio reception unit 116 receives the RTP packet that stores the audio Asignal2 from each of the servers of the bases R1 to R2 via the IP network. The audio Asignal2 is an audio acquired in any one of the bases R1 to Rn at a time at which the audio Asignal1 is reproduced in this base. Acquiring the audio Asignal2 includes recording the audio Asignal2 by the return audio recording device 205. Acquiring the audio Asignal2 includes sampling the audio Asignal2 obtained by recording by the return audio recording device 205. The time Taudio is provided to the RTP packet that stores the audio Asignal2. The audio Asignal2 is an example of a second audio. The return audio reception unit 116 is an example of the reception unit.
The return audio processing unit 117 generates an audio Asignal3 from the audio Asignal2 and outputs the audio Asignal3 to the return audio presentation device 104. The audio Asignal3 is an example of a third audio. The return audio processing unit 117 is an example of the processing unit.
The server 2 includes a time management unit 211, an event video reception unit 212, a video offset calculation unit 213, a return video transmission unit 214, an event audio reception unit 215, a return audio transmission unit 216, a video time management DB 231, and an audio time management DB 232. Each functional unit is implemented by execution of a program by the control unit 21. It can also be said that each functional unit is included in the control unit 21 or a processor. Each functional unit can be read as the control unit 21 or the processor. The video time management DB 231 and the audio time management DB 232 are implemented by the data storage unit 23.
The time management unit 211 performs time synchronization with the time distribution server 10 using a known protocol such as NTP or PTP, and manages the reference system clock. The time management unit 211 manages the same reference system clock as the reference system clock managed by the server 1. The reference system clock managed by the time management unit 211 and the reference system clock managed by the server 1 are time-synchronized.
The event video reception unit 212 receives an RTP packet that stores a video Vsignal1 from the server 1 via the IP network. The event video reception unit 212 outputs the video Vsignal1 to the video presentation device 201. The video offset calculation unit 213 calculates a presentation time t1 that is an absolute time at which the video Vsignal1 is reproduced by the video presentation device 201.
The return video transmission unit 214 transmits an RTP packet that stores a video Vsignal2 to the server 1 via the IP network. The RTP packet that stores the video Vsignal2 includes a time Tvideo associated with the presentation time t1 that matches a time t that is an absolute time at which the video Vsignal2 is obtained by capturing.
The event audio reception unit 215 receives an RTP packet that stores an audio Asignal1 from the server 1 via the IP network. The event audio reception unit 215 outputs the audio Asignal1 to the audio presentation device 204.
The return audio transmission unit 216 transmits an RTP packet that stores an audio Asignal2 to the server 1 via the IP network. The RTP packet that stores the audio Asignal2 includes a time Taudio.
The video time management DB 231 is a DB that stores times Tvideo acquired from the video offset calculation unit 213 and presentation times t1 in association with each other.
The video time management DB 231 includes a video synchronization reference time column and a presentation time column. The video synchronization reference time column stores the times Tvideo. The presentation time column stores the presentation times t1.
The audio time management DB 232 is a DB that stores times Taudio acquired from the event audio reception unit 215 and audios Asignal1 in association with each other.
The audio time management DB 232 includes an audio synchronization reference time column and an audio data column. The audio synchronization reference time column stores the times Taudio. The audio data column stores the audios Asignal1.
Each of the servers of the bases R2 to R2 includes functional units and a DB similar to those of the server 1 of the base R1, and performs processing similar to that of the server 1 of the base R1. Description of a processing flow and a DB structure of the functional units included in each of the servers of the bases R2 to Rn will be omitted.
Hereinafter, operation of the base O and the base R1 will be described as an example. Operation of the bases R2 to Rn may be similar to operation of the base R1, and description thereof will be omitted. The notation of the base R1 may be read as the bases R2 to Rn.
Video processing of the server 1 in the base O will be described.
The event video transmission unit 112 transmits an RTP packet that stores a video Vsignal1 to the server 2 of the base R1 via the IP network (step S11). A typical example of processing of step S11 will be described below.
The return video reception unit 113 receives an RTP packet that stores a video Vsignal2 from the server 2 of the base R1 via the IP network (step S12). A typical example of processing of step S12 will be described below.
The return video processing unit 114 generates a video Vsignal3 from the video Vsignal2 according to a processing mode based on a current time Tn and a time Tvideo associated with reception of the RTP packet that stores the video Vsignal2 by the return video reception unit 113. The return video processing unit 114 outputs the video Vsignal3 to the return video presentation device 102 (step S13). A typical example of processing of step S13 will be described below.
Video processing of the server 2 in the base R1 will be described.
The event video reception unit 212 receives an RTP packet that stores a video Vsignal1 from the server 1 via the IP network (step S14). A typical example of processing of step S14 will be described below.
The video offset calculation unit 213 calculates a presentation time t1 at which the video Vsignal1 is reproduced by the video presentation device 201 (step S15). A typical example of processing of step S15 will be described below.
The return video transmission unit 214 transmits an RTP packet that stores a video Vsignal2 to the server 1 via the IP network (step S16). A typical example of processing of step S16 will be described below.
Hereinafter, typical examples of processing of steps S11 to S13 of the server 1 described above and processing of steps S14 to S16 of the server 2 described above will be described. In order of the processing in chronological order, the processing in step S11 of the server 1, the processing in step S14 of the server 2, the processing in step S15 of the server 2, the processing in step S16 of the server 2, the processing in step S12 of the server 1, and the processing in step S13 of the server 1 will be described in this order.
The event video transmission unit 112 acquires a video Vsignal1 output from the event video capturing device 101 at constant intervals Ivideo (step S111).
The event video transmission unit 112 generates an RTP packet that stores the video Vsignal1 (step S112). In step S112, for example, the event video transmission unit 112 stores the acquired video Vsignal1 in an RTP packet. The event video transmission unit 112 acquires a time Tvideo that is an absolute time at which the video Vsignal1 is sampled from the reference system clock managed by the time management unit 111. The event video transmission unit 112 stores the acquired time Tvideo in the header extension area of the RTP packet.
The event video transmission unit 112 sends out the generated RTP packet that stores the video Vsignal1 to the IP network (step S113).
The event video reception unit 212 receives an RTP packet that stores a video Vsignal1 sent out from the event video transmission unit 112 via the IP network (step S141). The event video reception unit 212 acquires the video Vsignal1 stored in the received RTP packet that stores the video Vsignal1 (step S142).
The event video reception unit 212 outputs the acquired video Vsignal1 to the video presentation device 201 (step S143). The video presentation device 201 reproduces and displays the video Vsignal1.
The event video reception unit 212 acquires a time Tvideo stored in the header extension area of the received RTP packet that stores the video Vsignal1 (step S144).
The event video reception unit 212 delivers the acquired video Vsignal1 and time Tvideo to the video offset calculation unit 213 (step S145).
The video offset calculation unit 213 acquires a video Vsignal1 and a time Tvideo from the event video reception unit 212 (step S151).
The video offset calculation unit 213 calculates a presentation time t1 on the basis of the acquired video Vsignal1 and a video input from the offset video capturing device 202 (step S152). In step S152, for example, the video offset calculation unit 213 extracts a video frame including the video Vsignal1 from the video obtained by capturing by the offset video capturing device 202 using a known image processing technology. The video offset calculation unit 213 acquires a capturing time provided to the extracted video frame as the presentation time t1. The capturing time is an absolute time.
The video offset calculation unit 213 stores the acquired time Tvideo in the video synchronization reference time column of the video time management DB 231 (step S153).
The video offset calculation unit 213 stores the acquired presentation time t1 in the presentation time column of the video time management DB 231 (step S154).
The return video transmission unit 214 acquires a video Vsignal2 output from the return video capturing device 203 at the constant intervals Ivideo (step S161). The video Vsignal2 is a video acquired in the base R1 at a time at which the video presentation device 201 reproduces a video Vsignal1 in the base R1.
The return video transmission unit 214 calculates a time t that is an absolute time at which the acquired video Vsignal2 is obtained by capturing (step S162). In step S162, for example, in a case where a time code Tc (absolute time) representing a capturing time is provided to the video Vsignal2, the return video transmission unit 214 acquires the time t as t=Tc. In a case where the time code Tc is not provided to the video Vsignal2, the return video transmission unit 214 acquires a current time Tn from the reference system clock managed by the time management unit 211. The return video transmission unit 214 acquires the time t as t=Tn−tvideo_offset using a predetermined value tvideo_offset (positive number).
The return video transmission unit 214 refers to the video time management DB 231 and extracts a record having a time t1 that matches the acquired time t (step S163).
The return video transmission unit 214 refers to the video time management DB 231 and acquires a time Tvideo in the video synchronization reference time column of the extracted record (step S164).
The return video transmission unit 214 generates an RTP packet that stores the video Vsignal2 (step S165). In step S165, for example, the return video transmission unit 214 stores the acquired video Vsignal2 in an RTP packet. The return video transmission unit 214 stores the acquired time Tvideo in the header extension area of the RTP packet.
The return video transmission unit 214 sends out the generated RTP packet that stores the video Vsignal2 to the IP network (step S166).
The return video reception unit 113 receives an RTP packet that stores a video Vsignal2 sent out from the return video transmission unit 214 via the IP network (step S121).
The return video reception unit 113 acquires the video Vsignal2 stored in the received RTP packet that stores the video Vsignal2 (step S122).
The return video reception unit 113 acquires a time Tvideo stored in the header extension area of the received RTP packet that stores the video Vsignal2 (step S123).
The return video reception unit 113 delivers the acquired video Vsignal2 and time Tvideo to the return video processing unit 114 (step S124).
The return video processing unit 114 acquires a video Vsignal2 and a time Tvideo from the return video reception unit 113 (step S131).
The return video processing unit 114 acquires a current time Tn from the reference system clock managed by the time management unit 111 (step S132). The current time Tn is a time associated with reception of an RTP packet that stores the video Vsignal2 by the return video reception unit 113. The current time Tn can also be referred to as a reception time of the RTP packet that stores the video Vsignal2. The current time Tn can also be referred to as a reproduction time of a video Vsignal3 generated on the basis of the video Vsignal2. The current time Tn associated with the reception of the RTP packet that stores the video Vsignal2 is an example of a second time.
The return video processing unit 114 generates the video Vsignal3 from the acquired video Vsignal2 according to a processing mode based on the acquired current time Tn and time Tvideo (step S133). In step S133, for example, the return video processing unit 114 determines the processing mode of the video Vsignal2 on the basis of a value of a difference between the current time Tn and the time Tvideo, that is, a value of (Tn−Tvideo) (ms). The return video processing unit 114 changes the processing mode of the video Vsignal2 on the basis of the value of the (Tn−Tvideo). The return video processing unit 114 changes the processing mode so as to lower the quality of the video as the value of the difference increases. The processing mode may include both performing processing on the video Vsignal2 and not performing processing on the video Vsignal2. The processing mode includes a degree of processing on the video Vsignal2. In a case where the return video processing unit 114 performs processing on the video Vsignal2, the video Vsignal3 is different from the video Vsignal2. In a case where the return video processing unit 114 does not perform processing on the video Vsignal2, the video Vsignal3 is the same as the video Vsignal2.
The return video processing unit 114 performs processing of reducing the visibility when reproduction is performed by the return video presentation device 102. When the value of the (Tn−Tvideo) is so small that viewers do not feel discomfort by reproduction of the video Vsignal2 by the return video presentation device 102, the return video processing unit 114 does not perform processing on the video Vsignal2. Furthermore, even in a case where the value of the (Tn−Tvideo) is too large, the return video processing unit 114 performs processing on the video Vsignal2 so that the video does not become completely visually unrecognizable. For example, a case of processing of changing the display size of the video Vsignal2 will be described. Assuming that a horizontal pixel of the video Vsignal2 is w and a vertical pixel is h, a horizontal pixel w′ and a vertical pixel h′ of the video Vsignal3 generated according to the processing mode are as follows.
The processing is not limited to the above as the change of the quality of the video, and may be, in addition to the above display size change, blurring an image by a Gaussian filter, lowering the luminance of an image, or the like. The processing may use other processing as long as the processed video Vsignal3 has lower visibility than the video Vsignal2.
The return video processing unit 114 outputs the generated video Vsignal3 to the return video presentation device 102 (step S134). The return video presentation device 102 reproduces and displays the video Vsignal3 based on the video Vsignal2 returned and transmitted from the base R1 to the base O.
Audio processing of the server 1 in the base O will be described.
The event audio transmission unit 115 transmits an RTP packet that stores an audio Asignal1 to the server 2 of the base R1 via the IP network (step S17). A typical example of processing of step S17 will be described below.
The return audio reception unit 116 receives an RTP packet that stores an audio Asignal2 from the server 2 of the base R1 via the IP network (step S18). A typical example of processing of step S18 will be described below.
The return audio processing unit 117 generates an audio Asignal3 from the audio Asignal2 according to a processing mode based on a current time Tn and a time Taudio associated with reception of the RTP packet that stores the audio Asignal2 by the return audio reception unit 116. The return audio processing unit 117 outputs the audio Asignal3 to the return audio presentation device 104 (step S19). A typical example of processing of step S19 will be described below.
Audio processing of the server 2 in the base R1 will be described.
The event audio reception unit 215 receives an RTP packet that stores an audio Asignal1 from the server 1 via the IP network (step S20). A typical example of processing of step S20 will be described below.
The return audio transmission unit 216 transmits an RTP packet that stores an audio Asignal2 to the server 1 via the IP network (step S21). A typical example of processing of step S21 will be described below.
Hereinafter, typical examples of processing of steps S17 to S19 of the server 1 described above and processing of steps S20 to S21 of the server 2 described above will be described. In order of the processing in chronological order, the processing in step S17 of the server 1, the processing in step S20 of the server 2, the processing in step S21 of the server 2, the processing in step S18 of the server 1, and the processing in step S19 of the server 1 will be described in this order.
The event audio transmission unit 115 acquires an audio Asignal1 output from the event audio recording device 103 at the constant interval Iaudio (step S171).
The event audio transmission unit 115 generates an RTP packet that stores the audio Asignal1 (step S172). In step S172, for example, the event audio transmission unit 115 stores the acquired audio Asignal1 in an RTP packet. The event audio transmission unit 115 acquires a time Taudio that is an absolute time at which the acquired audio Asignal1 is sampled from the reference system clock managed by the time management unit 111. The event audio transmission unit 115 stores the acquired time Taudio in the header extension area of the RTP packet.
The event audio transmission unit 115 sends out the generated RTP packet that stores the audio Asignal1 to the IP network (step S173).
The event audio reception unit 215 receives an RTP packet that stores an audio Asignal1 sent out from the event audio transmission unit 115 via the IP network (step S201).
The event audio reception unit 215 acquires the audio Asignal1 stored in the received RTP packet that stores the audio Asignal1 (step S202).
The event audio reception unit 215 outputs the acquired audio Asignal1 to the audio presentation device 204 (step S203). The audio presentation device 204 reproduces and outputs the audio Asignal1.
The event audio reception unit 215 acquires a time Taudio stored in the header extension area of the received RTP packet that stores the audio Asignal1 (step S204).
The event audio reception unit 215 stores the acquired audio Asignal1 and time Taudio in the audio time management DB 232 (step S205). In step S205, for example, the event audio reception unit 215 stores the acquired time Taudio in the audio synchronization reference time column of the audio time management DB 232. The event audio reception unit 215 stores the acquired audio Asignal1 in the audio data column of the audio time management DB 232.
The return audio transmission unit 216 acquires an audio Asignal2 output from the return audio recording device 205 at the constant interval Iaudio (step S211). The audio Asignal2 is an audio acquired in the base R1 at a time at which the audio presentation device 204 reproduces an audio Asignal1 in the base R1.
The return audio transmission unit 216 refers to the audio time management DB 232 and extracts a record having audio data including the acquired audio Asignal2 (step S212). The audio Asignal2 acquired by the return audio transmission unit 216 includes the audio Asignal1 reproduced by the audio presentation device 204 and an audio generated at the base R1 (cheers of the audience at the base R1 and the like). In step S212, for example, the return audio transmission unit 216 separates the two audios by a known audio analysis technology. The return audio transmission unit 216 identifies the audio Asignal1 reproduced by the audio presentation device 204 by separating the audios. The return audio transmission unit 216 refers to the audio time management DB 232 and searches for audio data that matches the identified audio Asignal1 reproduced by the audio presentation device 204. The return audio transmission unit 216 refers to the audio time management DB 232 and extracts a record having the audio data that matches the identified audio Asignal1 reproduced by the audio presentation device 204.
The return audio transmission unit 216 refers to the audio time management DB 232 and acquires a time Taudio in the audio synchronization reference time column of the extracted record (step S213).
The return audio transmission unit 216 generates an RTP packet that stores the audio Asignal2 (step S214). In step S214, for example, the return audio transmission unit 216 stores the acquired audio Asignal2 in an RTP packet. The return audio transmission unit 216 stores the acquired time Taudio in the header extension area of the RTP packet.
The return audio transmission unit 216 sends out the generated RTP packet that stores the audio Asignal2 to the IP network (step S215).
The return audio reception unit 116 receives an RTP packet that stores an audio Asignal2 sent out from the return audio transmission unit 216 via the IP network (step S181).
The return audio reception unit 116 acquires the audio Asignal2 stored in the received RTP packet that stores the audio Asignal2 (step S182).
The return audio reception unit 116 acquires a time Taudio stored in the header extension area of the received RTP packet that stores the audio Asignal2 (step S183).
The return audio reception unit 116 delivers the acquired audio Asignal2 and time Taudio to the return audio processing unit 117 (step S184).
The return audio processing unit 117 acquires the audio Asignal2 and time Taudio from the return audio reception unit 116 (step S191).
The return audio processing unit 117 acquires a current time Tn from the reference system clock managed by the time management unit 111 (step S192). The current time Tn is a time associated with reception of an RTP packet that stores an audio Asignal2 by the return audio reception unit 116. The current time Tn can also be referred to as a reception time of the RTP packet that stores the audio Asignal2. The current time Tn can also be referred to as a reproduction time of an audio Asignal3 generated on the basis of the audio Asignal2. The current time Tn associated with the reception of the RTP packet that stores the audio Asignal2 is an example of the second time.
The return audio processing unit 117 generates the audio Asignal3 from the acquired audio Asignal2 according to a processing mode based on the acquired current time Tn and time Taudio (step S193). In step S193, for example, the return audio processing unit 117 determines the processing mode of the audio Asignal2 on the basis of a value of a difference between the current time Tn and the time Taudio, that is, a value of (In-Taudio) (ms). The return audio processing unit 117 changes the processing mode of the audio Asignal2 on the basis of the value of the (Tn−Taudio). The return audio processing unit 117 changes the processing mode so as to lower the quality of the audio as the value of the difference increases. The processing mode may include both performing processing on the audio Asignal2 and not performing processing on the audio Asignal2. The processing mode includes a degree of processing on the audio Asignal2. In a case where the return audio processing unit 117 performs processing on the audio Asignal2, the audio Asignal3 is different from the audio Asignal2. In a case where the return audio processing unit 117 does not perform processing on the audio Asignal2, the audio Asignal3 is the same as the audio Asignal2.
The return audio processing unit 117 performs processing of reducing the audibility when reproduction is performed by the return audio presentation device 104. When the value of the (Tn−Taudio) is so small that viewers do not feel discomfort by reproduction of the audio Asignal2 by the return audio presentation device 104, the return audio processing unit 117 does not perform processing on the audio Asignal2. Furthermore, even in a case where the value of the (Tn−Taudio) is too large, the return audio processing unit 117 performs processing on the audio Asignal2 so that the audio does not become completely auditorily unrecognizable. For example, a case of processing of changing the strength of the audio Asignal2 will be described. Assuming that the strength of the audio Asignal2 is s, the strength s′ of the audio Asignal3 generated according to the processing mode is as follows.
The processing is not limited to the above as the change of the quality of the audio, and may be, in addition to the change of the strength of the sound, gradual reduction of a component of a high frequency by low-pass filtering in which a threshold is smaller as the value of the (Tn−Taudio) (ms) is larger. In the processing, other processing may be used as long as the audibility of the processed audio Asignal3 is lower than that of the audio Asignal2 such that the larger the value of the (Tn−Taudio) (ms), the farther the sound is felt to be heard.
The return audio processing unit 117 outputs the generated audio Asignal3 to the return audio presentation device 104 (step S194). The return audio presentation device 104 reproduces and outputs the audio Asignal3 based on the audio Asignal2 returned and transmitted from the base R1 to the base O.
As described above, in the first embodiment, the server 1 generates a video Vsignal3 from a video Vsignal2 according to a processing mode based on a current time Tn and a time Tvideo. In a typical example, the server 1 changes the processing mode based on a value of a difference between the current time Tn and the time Tvideo. The server 1 may change the processing mode so as to lower the quality of the video as the value of the difference increases. In this manner, the server 1 can process a video such that the video is inconspicuous when reproduced. In general, in a case where a video projected on a screen or the like is viewed from a certain point X, the video can be clearly visually recognized as long as the distance from the point X to the screen is within a certain range. On the other hand, as the distance increases, the video is small and blurred and is difficult to be visually recognized.
The server 1 generates an audio Asignal3 from an audio Asignal2 according to a processing mode based on a current time Tn and a time Taudio. In a typical example, the server 1 changes the processing mode based on a value of a difference between the current time Tn and the time Taudio. The server 1 may change the processing mode so as to lower the quality of the audio as the value of the difference increases. In this manner, the server 1 can process an audio such that the audio is hard to be heard when reproduced. In general, in a case where an audio reproduced by a speaker or the like is listened to from the certain point X, the audio can be clearly auditorily recognized at the same time as generation of a sound source as long as the distance from the point X to the speaker (sound source) is within a certain range. On the other hand, as the distance increases, the sound is delayed from the reproduction time of the sound and attenuated, and transmitting and listening to the sound are difficult.
The server 1 can reduce discomfort due to the magnitude of a data transmission delay time while conveying the state of viewers at a physically separated base by performing processing of reproducing viewing as described above on the basis of the current time Tn and the time Tvideo or the current time Tn and the time Taudio.
In this manner, the server 1 can reduce discomfort felt by viewers when a plurality of videos/audios transmitted from a plurality of bases at different times is reproduced.
A second embodiment is an embodiment in which, when a video/audio transmitted from a base O and videos/audios transmitted from a plurality of bases of remote locations other than a base R are reproduced in the base R of a certain remote location, the videos/audios transmitted from the plurality of remote locations other than the base R are processed and reproduced.
Time information used for processing a video/audio is stored in a header extension area of an RTP packet transmitted and received between the base O and each of the bases R1 to Rn. For example, the time information is in an absolute time format (hh:mm:ss.fff format).
Hereinafter, two bases R1 and R2 will be mainly described as remote locations, and processing of reproducing a video/audio transmitted from the base O and a video/audio transmitted from the base R1 in the base R2 will be described. Description of reception processing of videos/audios returned and transmitted from the base R1 and the base R2 in the base O, reception processing and processing of a video/audio transmitted from the base R2 in the base R1, and transmission processing of a video/audio obtained by capturing and recording in the base R2 in the base R2 to the base O and the base R1 will be omitted.
Although a video and an audio are described as being transmitted and received in RTP packetization, the present invention is not limited thereto. The video and audio may be processed and managed by the same functional unit/database (DB). Both the video and audio may be stored in one RTP packet and transmitted and received.
In the second embodiment, the same components as those of the first embodiment are denoted by the same reference signs, and description thereof will be omitted. In the second embodiment, differences from the first embodiment will be mainly described.
The medium processing system S includes a plurality of electronic devices included in the base O, a plurality of electronic devices included in each of the bases R1 to Rn, and a time distribution server 10. The electronic devices in each of the bases and the time distribution server 10 can communicate with each other via an IP network.
The base O includes a server 1, an event video capturing device 101, and an event audio recording device 103 as in the first embodiment. The base O is an example of a first base.
As in the first embodiment, the base R1 includes a server 2, a video presentation device 201, an offset video capturing device 202, and an audio presentation device 204. Unlike the first embodiment, the base R1 includes a video capturing device 206 and an audio recording device 207. The base R1 is an example of a second base.
The video capturing device 206 is a device including a camera that captures a video of the base R1. For example, the video capturing device 206 captures a video of a state of the base R1 where the video presentation device 201 that reproduces and displays a video transmitted from the base O to the base R1 is installed. The video capturing device 206 is an example of the video capturing device.
The audio recording device 207 is a device including a microphone that records an audio of the base R1. For example, the audio recording device 207 records an audio of a state of the base R1 where the audio presentation device 204 that reproduces and outputs an audio transmitted from the base O to the base R1 is installed. The audio recording device 207 is an example of the audio recording device.
The base R2 includes a server 3, a video presentation device 301, an offset video capturing device 302, an audio presentation device 303, and an offset audio recording device 304. The base R2 is an example of a third base different from the first base and the second base.
The server 3 is an electronic device that controls each of the electronic devices included in the base R2. The server 3 is an example of a medium processing device.
The video presentation device 301 is a device including a display that reproduces and displays a video transmitted from the base O to the base R2 and a video transmitted from the base R1 and each of bases R3 to Rn to the base R2. The video presentation device 301 is an example of the presentation device.
The offset video capturing device 302 is a device capable of recording a capturing time. The offset video capturing device 302 is a device including a camera installed so as to be able to capture the entire video display area of the video presentation device 301. The offset video capturing device 302 is an example of the video capturing device.
The audio presentation device 303 is a device including a speaker that reproduces and outputs an audio transmitted from the base O to the base R2 and an audio transmitted from the base R1 and each of the bases R3 to Rn to the base R2. The audio presentation device 303 is an example of the presentation device.
The offset audio recording device 304 is a device capable of recording a recording time. The offset audio recording device 304 is a device including a microphone installed so as to be able to record an audio reproduced by the audio presentation device 303. The offset audio recording device 304 is an example of an audio recording device.
A configuration example of the server 3 will be described.
The server 3 includes a control unit 31, a program storage unit 32, a data storage unit 33, a communication interface 34, and an input/output interface 35. The elements included in the server 3 are connected to each other via a bus.
The control unit 31 can be formed similarly to the control unit 11. A processor deploys a ROM or a program stored in the program storage unit 32 in a RAM. The processor executes a program deployed in the RAM, thereby the control unit 31 implements each functional unit described below. The control unit 31 is included in a computer.
The program storage unit 32 can be formed similarly to the program storage unit 12.
The data storage unit 33 can be formed similarly to the data storage unit 13.
The communication interface 34 can be formed similarly to the communication interface 14. The communication interface 34 includes various interfaces that communicatively connect the server 3 to other electronic devices.
The input/output interface 35 can be formed similarly to the input/output interface 15. The input/output interface 35 enables communication between the server 3 and each of the video presentation device 301, the offset video capturing device 302, the audio presentation device 303, and the offset audio recording device 304.
Note that a hardware configuration of the server 3 is not limited to the above-described configuration. The server 3 can appropriately omit and change the above-described components and add a new component.
As in the first embodiment, the server 1 includes a time management unit 111, an event video transmission unit 112, and an event audio transmission unit 115. Each functional unit is implemented by execution of a program by the control unit 11. It can also be said that each functional unit is included in the control unit 11 or the processor. Each functional unit can be read as the control unit 11 or the processor.
As in the first embodiment, the server 2 includes a time management unit 211, an event video reception unit 212, a video offset calculation unit 213, an event audio reception unit 215, a video time management DB 231, and an audio time management DB 232. Unlike the first embodiment, the server 2 includes a video transmission unit 217 and an audio transmission unit 218. Each functional unit is implemented by execution of a program by the control unit 21. It can also be said that each functional unit is included in the control unit 21 or a processor. Each functional unit can be read as the control unit 21 or the processor. The video time management DB 231 and the audio time management DB 232 are implemented by the data storage unit 23.
The video transmission unit 217 transmits an RTP packet that stores a video Vsignal2 to the server 3 via the IP network. The RTP packet that stores the video Vsignal2 includes a time Tvideo associated with the presentation time t1 that matches a time t that is an absolute time at which the video Vsignal2 is obtained by capturing. The video Vsignal2 is an example of a second video. An RTP packet is an example of a packet. The time Tvideo is an example of a first time.
The audio transmission unit 218 transmits an RTP packet that stores an audio Asignal2 to the server 3 via the IP network. The RTP packet that stores the audio Asignal2 includes a time Taudio. The audio Asignal2 is an example of a second audio. The time Taudio is an example of the first time.
The server 3 includes a time management unit 311, an event video reception unit 312, a video offset calculation unit 313, a video reception unit 314, a video processing unit 315, an event audio reception unit 316, an audio offset calculation unit 317, an audio reception unit 318, an audio processing unit 319, a video time management DB 331, and an audio time management DB 332. Each functional unit is implemented by execution of a program by the control unit 31. It can also be said that each functional unit is included in the control unit 31 or a processor. Each functional unit can be read as the control unit 31 or the processor. The video time management DB 331 and the audio time management DB 332 are implemented by the data storage unit 33.
The time management unit 311 performs time synchronization with the time distribution server 10 using a known protocol such as NTP or PTP, and manages the reference system clock. The time management unit 311 manages the same reference system clock as the reference system clock managed by the server 1 and the server 2. The reference system clock managed by the time management unit 311 and the reference system clock managed by the server 1 and the server 2 are time-synchronized.
The event video reception unit 312 receives an RTP packet that stores a video Vsignal1 from the server 1 via the IP network. The event video reception unit 312 outputs the video Vsignal1 to the video presentation device 301. The event video reception unit 312 is an example of a first reception unit. The video Vsignal1 is an example of a first video.
The video offset calculation unit 313 calculates a presentation time t1 that is an absolute time at which the video Vsignal1 is reproduced by the video presentation device 301 in the base R2. The video offset calculation unit 313 is an example of a calculation unit. The presentation time t1 is an example of a third time.
The video reception unit 314 receives an RTP packet that stores a video Vsignal2 from each of servers of the base R1 and the bases R3 to Rn via the IP network. The video reception unit 314 is an example of a second reception unit.
The video processing unit 315 generates a video Vsignal3 from the video Vsignal2 and outputs the video Vsignal3 to the video presentation device 301. The video processing unit 315 is an example of a processing unit. The video Vsignal3 is an example of a third video.
The event audio reception unit 316 receives an RTP packet that stores an audio Asignal1 from the server 1 via the IP network. The event audio reception unit 316 outputs an audio Asignal1 to the audio presentation device 303. The event audio reception unit 316 is an example of the first reception unit. The audio Asignal1 is an example of a first audio.
The audio offset calculation unit 317 calculates a presentation time t2 that is an absolute time at which the audio Asignal1 is reproduced by the audio presentation device 303 in the base R2. The audio offset calculation unit 317 is an example of a calculation unit. The presentation time t2 is an example of the third time.
The audio reception unit 318 receives an RTP packet that stores an audio Asignal2 from each of the servers of the base R1 and the bases R3 to Rn via the IP network. The audio reception unit 318 is an example of the second reception unit.
The audio processing unit 319 generates an audio Asignal3 from the audio Asignal2 and outputs the audio Asignal3 to the audio presentation device 303. The audio processing unit 319 is an example of a processing unit. The audio Asignal3 is an example of a third audio.
The video time management DB 331 may have a data structure similar to that of the video time management DB 231. The video time management DB 331 is a DB that stores times Tvideo acquired from the video offset calculation unit 313 and presentation times t1 in association with each other. The video time management DB 331 is an example of a storage unit.
The audio time management DB 332 is a DB that stores times Taudio acquired from the audio offset calculation unit 317 and presentation times t2 in association with each other. The audio time management DB 332 is an example of a storage unit.
The audio time management DB 332 includes an audio synchronization reference time column and a presentation time column. The audio synchronization reference time column stores the times Taudio. The presentation time column stores the presentation times t2.
Hereinafter, operation of the base O, the base R1, the base R2 will be described as an example.
Video processing of the server 1 in the base O will be described.
The event video transmission unit 112 transmits an RTP packet that stores a video Vsignal1 to each of servers of the bases R1 to Rn via the IP network. A time Tvideo is provided to the RTP packet that stores the video Vsignal1. The time Tvideo is time information for processing a video in each of bases (R1, R2, . . . , Rn) other than the base O. The processing of the event video transmission unit 112 may be similar to the processing described in the first embodiment using
Video processing of the server 2 in the base R1 will be described.
The event video reception unit 212 receives an RTP packet that stores a video Vsignal1 from the server 1 via the IP network (step S22).
A typical example of the processing of the event video reception unit 212 in step S22 may be similar to the processing described in the first embodiment using
The video offset calculation unit 213 calculates a presentation time t1 at which the video Vsignal1 is reproduced by the video presentation device 201 (step S23).
A typical example of the processing of the video offset calculation unit 213 in step S23 may be similar to the processing described in the first embodiment using
The video transmission unit 217 transmits an RTP packet that stores a video Vsignal2 to the server 3 via the IP network (step S24).
A typical example of the processing of the video transmission unit 217 in step S24 may be similar to the processing of the return video transmission unit 214 described in the first embodiment using
Description of the processing of the video transmission unit 217 will be omitted by the notation of the “return video capturing device 203” and the “return video transmission unit 214” being replaced with the “video capturing device 206” and the “video transmission unit 217” in the description using
Video processing of the server 3 in the base R2 will be described.
The event video reception unit 312 receives an RTP packet that stores a video Vsignal1 from the server 1 via the IP network (step S25).
A typical example of the processing of the event video reception unit 312 in step S25 may be similar to the processing of the event video reception unit 212 described in the first embodiment using
Description of the processing of the event video reception unit 312 will be omitted by the notation of the “video presentation device 201”, the “event video reception unit 212”, and the “video offset calculation unit 213”, being replaced with the “video presentation device 301”, the “event video reception unit 312”, and the “video offset calculation unit 313” in the description using
The video offset calculation unit 313 calculates a presentation time t1 at which the video Vsignal1 is reproduced by the video presentation device 301 (step S26).
A typical example of the processing of the video offset calculation unit 313 in step S26 may be similar to the processing of the video offset calculation unit 213 described in the first embodiment using
Description of the processing of the video offset calculation unit 313 will be omitted being the notation of the “offset video capturing device 202”, the “event video reception unit 212”, the “video offset calculation unit 213”, and the “video time management DB 231” being replaced with the “offset video capturing device 302”, the “event video reception unit 312”, the “video offset calculation unit 313”, and the “video time management DB 331” in the description using
The video reception unit 314 receives an RTP packet that stores a video Vsignal2 from the server 2 of the base R1 via the IP network (step S27).
A typical example of the processing of the video reception unit 314 in step S27 may be similar to the processing of the return video reception unit 113 described in the first embodiment using
Description of the processing of the video reception unit 314 will be omitted by the notation of the “return video reception unit 113”, the “return video processing unit 114” and the “return video transmission unit 214” being replaced with the “video transmission unit 217”, the “video reception unit 314”, and the “video processing unit 315” in the description using
The video processing unit 315 generates a video Vsignal3 from the video Vsignal2 according to a processing mode based on a current time Tn and the presentation time t1 associated with reception of the RTP packet that stores the video Vsignal2 by the return video reception unit 314. The video processing unit 315 outputs the video Vsignal3 to the video presentation device 301 (step S28).
The video processing unit 315 acquires a video Vsignal2 and a time Tvideo from the video reception unit 314 (step S281).
The video processing unit 315 refers to the video time management DB 331 and extracts a record having a video synchronization reference time that matches the acquired time Tvideo (step S282).
The video processing unit 315 refers to the video time management DB 331 and acquires a presentation time t1 in the presentation time column of the extracted record (step S283).
The video processing unit 315 acquires a current time Tn from the reference system clock managed by the time management unit 311 (step S284). The current time Tn is a time associated with reception of an RTP packet that stores the video Vsignal2 by the video reception unit 314. The current time Tn can also be referred to as a reception time of the RTP packet that stores the video Vsignal2. The current time Tn can also be referred to as a reproduction time of a video Vsignal3 generated on the basis of the video Vsignal2. The current time Tn associated with the reception of the RTP packet that stores the video Vsignal2 is an example of a second time.
The video processing unit 315 generates the video Vsignal3 from the acquired video Vsignal2 according to a processing mode based on the acquired current time Tn and presentation time t1 (step S285). In step S285, for example, the video processing unit 315 determines the processing mode of the video Vsignal2 on the basis of a value of a difference between the current time Tn and presentation time t1, that is, a value of (Tn−t1) (ms). The video processing unit 315 changes the processing mode of the video Vsignal2 on the basis of the value of the (Tn−t1). The video processing unit 315 changes the processing mode so as to lower the quality of the video as the value of the difference increases. The processing mode may include both performing processing on the video Vsignal2 and not performing processing on the video Vsignal2. The processing mode includes a degree of processing on the video Vsignal2.
The video processing unit 315 performs processing of reducing the visibility when reproduction is performed by the video presentation device 301. When the value of the (Tn−t1) is so small that viewers do not feel discomfort by reproduction of the video Vsignal3 by the video presentation device 301, the video processing unit 315 does not perform processing on the video Vsignal2. Furthermore, even in a case where the value of the (Tn−t1) is too large, the video processing unit 315 performs processing on the video Vsignal2 so that the video does not become completely visually unrecognizable. For example, a case of processing of changing the display size of the video Vsignal2 will be described. Assuming that a horizontal pixel of the video Vsignal2 is w and a vertical pixel is h, a horizontal pixel w′ and a vertical pixel h′ of the video Vsignal3 generated according to the processing mode are as follows.
The processing is not limited to the above as the change of the quality of the video, and may be, in addition to the above display size change, blurring an image by a Gaussian filter, lowering the luminance of an image, or the like. The processing may use other processing as long as the processed video Vsignal3 has lower visibility than the video Vsignal2.
The video processing unit 315 outputs the generated video Vsignal3 to the video presentation device 301 (step S286). The video presentation device 301 reproduces and displays the video Vsignal3 based on the video Vsignal2 returned and transmitted from the base R1 and each of the bases R3 to Rn to the base R2.
Audio processing of the server 1 in the base O will be described.
The event audio transmission unit 115 transmits an RTP packet that stores an audio Asignal1 to each of the servers of the bases R1 to Rn via the IP network. The time Taudio is provided to the RTP packet that stores the audio Asignal1. The time Taudio is time information for processing an audio in each of the bases (R1, R2, . . . , Rn) other than the base O. The processing of the event audio transmission unit 115 may be similar to the processing described in the first embodiment using
Audio processing of the server 2 in the base R1 will be described.
The event audio reception unit 215 receives an RTP packet that stores an audio Asignal1 from the server 1 via the IP network (step S29).
A typical example of the processing of the event audio reception unit 215 in step S29 may be similar to the processing described in the first embodiment using
The audio transmission unit 218 transmits an RTP packet that stores an audio Asignal2 to the server 3 via the IP network (step S30).
A typical example of the processing of the audio transmission unit 218 in step S30 may be similar to the processing of the return audio transmission unit 216 described in the first embodiment using
Description of the processing of the audio transmission unit 218 will be omitted by the notation of the “return audio recording device 205” and the “return audio transmission unit 216” being replaced with the “audio recording device 207” and the “audio transmission unit 218” in the description using
Audio processing of the server 3 in the base R2 will be described.
The audio offset calculation unit 317 calculates a presentation time t2 at which the audio Asignal1 is reproduced by the audio presentation device 303 (step S32). A typical example of processing of step S32 will be described below.
The audio reception unit 318 receives an RTP packet that stores an audio Asignal2 from the server 2 of the base R1 via the IP network (step S33).
A typical example of the processing of the audio reception unit 318 in step S33 may be similar to the processing of the return audio reception unit 116 described in the first embodiment using
Description of the processing of the audio reception unit 318 will be omitted by the notation of the “return audio reception unit 116”, the “return audio processing unit 117”, and the “return audio transmission unit 216” being replaced with the “audio reception unit 318”, the “audio processing unit 319”, and the “audio transmission unit 218” in the description using
The audio processing unit 319 generates an audio Asignal3 from the audio Asignal2 according to a processing mode based on a current time Tn and the presentation time t2 associated with reception of the RTP packet that stores the audio Asignal2 by the audio reception unit 318. The audio processing unit 319 outputs the audio Asignal3 to the audio presentation device 303 (step S34). A typical example of processing of step S34 will be described below.
The event audio reception unit 316 receives an RTP packet that stores an audio Asignal1 sent out from the event audio transmission unit 115 via the IP network (step S311).
The event audio reception unit 316 acquires the audio Asignal1 stored in the received RTP packet that stores the audio Asignal1 (step S312).
The event audio reception unit 316 outputs the acquired audio Asignal1 to the audio presentation device 303 (step S313). The audio presentation device 303 reproduces and outputs the audio Asignal1.
The event audio reception unit 316 acquires a time Taudio stored in the header extension area of the received RTP packet that stores the audio Asignal1 (step S314).
The event audio reception unit 316 delivers the acquired audio Asignal1 and the time Taudio to the audio offset calculation unit 317 (step S315).
The audio offset calculation unit 317 acquires an audio Asignal1 and a time Taudio from the event audio reception unit 316 (step S321).
The audio offset calculation unit 317 calculates a presentation time t2 on the basis of the acquired audio Asignal1 and an audio input from the offset audio recording device 304 (step S322). The audio acquired by the offset audio recording device 304 includes the audio Asignal1 reproduced by the audio presentation device 303 and an audio generated at the base R2 (cheers of the audience at the base R2 and the like). In step S322, for example, the audio offset calculation unit 317 separates the two audios by a known audio analysis technology. The audio offset calculation unit 317 acquires the presentation time t2 that is an absolute time at which the audio Asignal1 is reproduced by the audio presentation device 303 by separating the audios.
The audio offset calculation unit 317 stores the acquired time Taudio in the audio synchronization reference time column of the audio time management DB 332 (step S323).
The audio offset calculation unit 317 stores the acquired presentation time t2 in the presentation time column of the audio time management DB 332 (step S324).
The audio processing unit 319 acquires an audio Asignal2 and a time Taudio from the audio reception unit 318 (step S341).
The audio processing unit 319 refers to the audio time management DB 332 and extracts a record having the audio synchronization reference time that matches the acquired time Taudio (step S342).
The audio processing unit 319 refers to the audio time management DB 332 and acquires a presentation time t2 in the presentation time column of the extracted record (step S343).
The audio processing unit 319 acquires a current time Tn from the reference system clock managed by the time management unit 311 (step S344). The current time Tn is a time associated with reception of an RTP packet that stores an audio Asignal2 by the audio reception unit 318. The current time Tn can also be referred to as a reception time of the RTP packet that stores the audio Asignal2. The current time Tn can also be referred to as a reproduction time of an audio Asignal3 generated on the basis of the audio Asignal2. The current time Tn associated with the reception of the RTP packet that stores the audio Asignal2 is an example of the second time.
The audio processing unit 319 generates the audio Asignal3 from the acquired audio Asignal2 according to a processing mode based on the acquired current time Tn and presentation time t2 (step S345). In step S345, for example, the audio processing unit 319 determines the processing mode of the audio Asignal2 on the basis of a value of a difference between the current time Tn and presentation time t2, that is, a value of (Tn−t2) (ms). The audio processing unit 319 changes the processing mode of the audio Asignal2 on the basis of the value of the (Tn−t2). The audio processing unit 319 changes the processing mode so as to lower the quality of the audio as the value of the difference increases. The processing mode may include both performing processing on the audio Asignal2 and not performing processing on the audio Asignal2. The processing mode includes a degree of processing on the audio Asignal2.
The audio processing unit 319 performs processing of reducing the audibility when reproduction is performed by the audio presentation device 303. When the value of the (Tn−t2) is so small that viewers do not feel discomfort by reproduction of the audio Asignal2 by the audio presentation device 303, the audio processing unit 319 does not perform processing on the audio Asignal2. Furthermore, even in a case where the value of the (Tn−t2) is too large, the audio processing unit 319 performs processing on the audio Asignal2 so that the audio does not become completely auditorily unrecognizable. For example, a case of processing of changing the strength of the audio Asignal2 will be described. Assuming that the strength of the audio Asignal2 is s, the strength s′ of the audio Asignal3 generated according to the processing mode is as follows.
The processing is not limited to the above as the change of the quality of the audio, and may be, in addition to the change of the strength of the sound, gradual reduction of a component of a high frequency by low-pass filtering in which a threshold is smaller as the value of the (Tn−t2) (ms) is larger. In the processing, other processing may be used as long as the audibility of the processed audio Asignal3 is lower than that of the audio Asignal2 such that the larger the value of the (Tn−t2) (ms), the farther the sound is felt to be heard.
The audio processing unit 319 outputs the generated audio Asignal3 to the audio presentation device 303 (step S346). The audio presentation device 303 reproduces and outputs the audio Asignal3 based on the audio Asignal2 returned and transmitted from the base R1 and each of the bases R3 to Rn to the base R2.
As described above, in the second embodiment, the server 3 generates a video Vsignal3 from a video Vsignal2 according to a processing mode based on a current time Tn and a presentation time t1. In a typical example, the server 3 changes the processing mode based on a value of a difference between the current time Tn and the presentation time t1. The server 3 may change the processing mode so as to lower the quality of the video as the value of the difference increases. In this manner, the server 3 can process a video such that the video is inconspicuous when reproduced. In general, in a case where a video projected on a screen or the like is viewed from a certain point X, the video can be clearly visually recognized as long as the distance from the point X to the screen is within a certain range. On the other hand, as the distance increases, the video is small and blurred and is difficult to be visually recognized.
The server 3 generates an audio Asignal3 from an audio Asignal2 according to a processing mode based on a current time Tn and a presentation time t2. In a typical example, the server 3 changes the processing mode based on a value of a difference between the current time Tn and the presentation time t2. The server 3 may change the processing mode so as to lower the quality of the audio as the value of the difference increases. In this manner, the server 3 can process an audio such that the audio is hard to be heard when reproduced. In general, in a case where an audio reproduced by a speaker or the like is listened to from the certain point X, the audio can be clearly auditorily recognized at the same time as generation of a sound source as long as the distance from the point X to the speaker (sound source) is within a certain range. On the other hand, as the distance increases, the sound is delayed from the reproduction time of the sound and attenuated, and transmitting and listening to the sound are difficult.
The server 3 can reduce discomfort due to the magnitude of a data transmission delay time while conveying the state of viewers at a physically separated base by performing processing of reproducing viewing as described above on the basis of the current time Tn and the presentation time t2 or the current time Tn and the presentation time t1.
In this manner, the server 3 can reduce discomfort felt by viewers when a plurality of videos/audios transmitted from a plurality of bases at different times is reproduced.
The medium processing device may be implemented by one device as described in the above examples, or may be implemented by a plurality of devices in which functions are distributed.
The program may be transferred in a state of being stored in an electronic device, or may be transferred in a state of not being stored in an electronic device. In the latter case, the program may be transferred via a network or may be transferred in a state of being recorded in a recording medium. The recording medium is a non-transitory tangible medium. The recording medium is a computer-readable medium. The recording medium is only required to be a medium that can store a program and can be read by a computer, such as a CD-ROM or a memory card, and any form can be used.
Although the embodiments of the present invention have been described in detail above, the above description is merely an example of the present invention in all respects. It goes without saying that various improvements and modifications can be made without departing from the scope of the present invention. That is, in carrying out the present invention, a specific configuration according to the embodiment may be appropriately employed.
In short, the present invention is not limited to the above-described embodiments without any change, and can be embodied by modifying the constituents without departing from the concept of the invention at the implementation stage. Various inventions can be implemented by appropriately combining a plurality of the constituents disclosed in the above-described embodiments. For example, some constituents may be omitted from all the constituents described in the embodiments. The constituents in different embodiments may be appropriately combined.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/025655 | 7/7/2021 | WO |