An embodiment of the present disclosure relates to a video processing method and a video processing apparatus.
An information processing apparatus of Japanese Unexamined Patent Application Publication No. 2020-190978 that displays a virtual viewpoint video discloses a configuration in which a relation condition for regulating a positional relationship between attention objects, and information on two or more objects among objects present in a viewpoint-dimensional space for regulating the predetermined viewpoint associated with the relation condition are included as a viewpoint change requirement, and a viewpoint is changed when the relation condition is satisfied and the viewpoint change requirement is established.
The information processing apparatus of Japanese Unexamined Patent Application Publication No. 2020-190978 does not change a viewpoint position according to a factor of a change in a sound, such as a change in a venue of a live event.
An embodiment of the present disclosure is directed to provide a video processing method to change a viewpoint position automatically according to the factor of a change in a sound in the live event and to achieve a meaningful change in the viewpoint position as the live event.
A video processing method according to an embodiment of the present disclosure places an object in a three-dimensional space and outputs a video of a live event as viewed from a viewpoint position set in the three-dimensional space. The video processing method sets a first viewpoint position based on a first factor being a factor related to the object, outputs the video of the live event as viewed from the first viewpoint position, receives a second factor being a factor related to a sound of the live event, calculates a relative position to the object according to the second factor, changes the first viewpoint position to a second viewpoint position based on the calculated relative position, and outputs the video of the live event as viewed from the second viewpoint position.
According to an embodiment of the present disclosure, a meaningful change in a viewpoint position as a live event is able to be achieved.
The video processing apparatus 1 includes a personal computer, a smartphone, a tablet computer, or the like.
The communicator 11 communicates with a different apparatus such as a server. The communicator 11 has a wireless communication function such as Bluetooth (registered trademark) or Wi-Fi (registered trademark), for example, and a wired communication function such as a USB or a LAN.
The display 15 includes an LCD, an OLED, or the like. The display 15 displays a video that the processor 12 has outputted.
The user I/F 16 is an example of an operator. The user I/F 16 includes a mouse, a keyboard, a touch panel, or the like. The user I/F 16 receives an operation of a user. It is to be noted that the touch panel may be stacked on the display 15.
The audio I/F 17 has an analog audio terminal, a digital audio terminal, or the like and connects an acoustic device. In the present embodiment, the audio I/F 17 connects headphones 20 as an example of the acoustic device and outputs an audio signal to the headphones 20.
The processor 12 includes a CPU, a DSP, or an SoC (System on a Chip) and corresponds to the processor of the present invention. The processor 12 reads out a program from the flash memory 14 being a storage medium, temporarily stores the program in the RAM 13, and thus performs various operations. It is to be noted that the program does not need to be stored in the flash memory 14. The processor 12, for example, may download the program from the different apparatus such as a server and may temporarily store the program in the RAM 13, when necessary.
The processor 12 obtains data related to a live event through the communicator 11. The data related to a live event includes space information, model data of an object, position information on an object, motion data of an object, and audio data of a live event.
The space information is information that shows a shape of a three-dimensional space corresponding to a live venue such as a live music club or a concert hall, for example, and is indicated by a three-dimensional coordinate of which the origin is located at a certain position. The space information may be coordinate information based on 3DCAD data of a live venue such as a real concert hall or may be logical coordinate information on a certain unreal live venue (information normalized by 0 to 1).
The model data of an object is three-dimensional CG image data and includes a plurality of image parts.
The position information on an object is information that shows a position of the object in the three-dimensional space. The position information on an object according to the present embodiment is indicated by a time-series three-dimensional coordinate according to time elapsed from the start time of a live event. The object may be an object that, like a device such as a speaker, does not change the position from the start to the end of the live event or may be an object that, like a performer, changes the position along the time sequence.
The motion data of the object is data showing a placement relationship of the plurality of image parts for moving the model data, and the motion data according to the present embodiment is time-series data according to the time elapsed from the start time of the live event. The motion data is set to the object that, like a performer, changes the position along the time sequence.
The audio data of the live event includes an audio signal of a stereo (L, R) channel, for example. The processor 12 reproduces the audio data of the live event and outputs the audio data to the headphones 20 through the audio I/F 17. The processor 12 may perform effect processing such as reverb processing on the audio signal. The reverb processing is processing to simulate reverberation (a reflected sound) of a certain room by adding a predetermined delay time to a received audio signal and generating a level-adjusted pseudo reflected sound. The processor 12 performs effect processing suitable for a live venue. For example, in a case in which the live venue shown in the space information is a small venue, the processor 12 generates a low-level pseudo reflected sound with short delay time and reverberation time. In contrast, in a case in which the live venue shown in the space information is a large venue, the processor 12 generates a high-level pseudo reflected sound with long delay time and reverberation time.
In addition, the audio data of the live event may include an audio signal associated with each object that emits a sound. For example, speaker objects 53 and 54 are objects that emit a sound. The processor 12 may perform localization processing such that the sound of the speaker objects 53 and 54 may be localized in the positions of the speaker objects 53 and 54 and may generate the audio signal of the stereo (L, R) channel for outputting to the headphones 20. The details of the localization processing will be described below.
The processor 12 places an object in a three-dimensional space R1 as shown in
The three-dimensional space R1 of
The processor 12 places the stage object 55 in the center in a left-right direction (an X direction), in the rearmost in a front-rear direction (a Y direction), and at the lowest position in a height direction (a Z direction) of the three-dimensional space R1. In addition, the processor 12 places the speaker object 54 in the leftmost in the left-right direction (the X direction), at the most rear in the front-rear direction (the Y direction), and at the lowest position in the height direction (the Z direction) of the three-dimensional space R1. In addition, the processor 12 places the speaker object 53 in the rightmost in the left-right direction (the X direction), in the rearmost in the front-rear direction (the Y direction), and at the lowest position in the height direction (the Z direction) of the three-dimensional space R1.
In addition, the processor 12 places the performer object 51 on a left side in the left-right direction (the X direction), in the rearmost in the front-rear direction (the Y direction), and on the stage object 55 in the height direction (the Z direction) of the three-dimensional space R1. In addition, the processor 12 places the performer object 52 on a right side in the left-right direction (the X direction), in the rearmost in the front-rear direction (the Y direction), and on the stage object 55 in the height direction (the Z direction) of the three-dimensional space R1.
Then, the processor 12 receives a first factor being a factor related to an object (S11) and sets a first viewpoint position 50A based on the first factor (S12). For example, the first factor is a factor related to the stage object 55. The processor 12 receives the designation of “a front position away from the stage object 55 by a predetermined distance” as the first factor in the first processing of starting the application program.
Alternatively, a user may designate a specific object through the user I/F 16. For example, the processor 12 displays a schematic view of the three-dimensional space R1 of
The first viewpoint position 50A set as described above corresponds to a position of the user who views and listens to content, in the three-dimensional space R1. The processor 12 outputs the video of the live event as viewed from the first viewpoint position 50A as shown in
It is to be noted that the video may be outputted and displayed on the display 15 of the own apparatus or may be outputted to the different apparatus through the communicator 11.
Next, the processor 12 determines whether or not to have received the second factor being a factor related to the sound of the live event (S14). For example, the user performs an operation of changing the present live venue to a different live venue, through the user I/F 16.
The processor 12, until determining to have received the second factor in determination of S14, repeats the processing to output the video based on the first viewpoint position (S14: NO->S13). As described above, the position information and motion data of an object are time-series data according to the time elapsed from the start time of the live event. Therefore, according to the lapse of the live event, the performer object 51 and the performer object 52 change the position or the motion.
The second factor related to the sound of the live event is a change in the live venue, for example. As described above, the processor 12 generates a pseudo reflected sound according to the size of the live venue shown in the space information. Therefore, as the live venue is changed, the sound of the live event also changes. Therefore, the change in the live venue is a factor related to the sound of the live event.
The three-dimensional space R2 of
The processor 12, when receiving the operation to change from the three-dimensional space R1 to the three-dimensional space R2 in the determination of S14 (S14: YES), calculates a relative position to a certain object (the stage object 55 in the above example) and changes the first viewpoint position 50A to the second viewpoint position 50B based on the calculated relative position (S15).
Specifically, the processor 12 first obtains coordinates of the performer object 51, the performer object 52, the speaker object 53, the speaker object 54, and the stage object 55 in the three-dimensional space R2.
The processor 12 transforms the coordinates of the speaker object 53 from the first coordinates in the three-dimensional space R1 to the second coordinates in the three-dimensional space R2. In the example shown by
The processor 12 obtains a centroid G of the eight reference points before transformation and a centroid G′ of the eight reference points after transformation, and then generates triangular meshes by using these centroids as centers.
The processor 12 transforms an internal space of a triangle before transformation and an internal space of a triangle after transformation by a predetermined coordinate transformation. The transformation uses an affine transformation, for example. The affine transformation is an example of a geometric transformation. The affine transformation expresses an x coordinate (x′) and a y coordinate (y′) after transformation as a function of the x coordinate (x) and the y coordinate (y) before transformation, respectively. In other words, the affine transformation performs the coordinate transformation by the following equations: x′=ax+by+c and y′=dx+ey+f. The coordinates of the three apexes of the triangle before transformation and the coordinates of the three apexes of the triangle after transformation are able to uniquely obtain coefficients a to f. The processor 12 transforms the first coordinates into the second coordinates by similarly obtaining an affine transformation coefficient for all the triangles. It is to be noted that the coefficients a to f may be obtained by the least-squares method.
It is to be noted that the meshes may be meshes of any other polygonal shape other than a triangle, or a combination of the polygonal shape. For example, the processor 12, as shown in
x′=x0+(x1−x0)x+(x3−x0)y+(x0−x1+x2−x3)xy
y′=y0+(y1−y0)x+(y3−y0)y+(y0−y1+y2−y3)xy
The transformation method may be any other geometric transformation such as isometric mapping, homothetic transformation, or projective transformation. For example, the projective transformation may be expressed by the following equations: x′=(ax+by+c)/(gx+hy+1); and y′=(dx+ey+f)/(gx+hy+1). The coefficients are obtained in the same way as in a case of the above affine transformation. For example, the eight coefficients (a to h) that configure the quadrangular projective transformation are uniquely obtained by a set of eight simultaneous equations. Alternatively, the coefficients may be obtained by the least-squares method.
Although the above example shows the coordinate transformation in the two-dimensional space (X, Y), the coordinate in the three-dimensional space (X, Y, Z) is also transformable by the same method.
Accordingly, each object is transformed into the coordinates according to the shape of the three-dimensional space R2.
In addition, in this example, the processor 12 also transforms the coordinates of the first viewpoint position 50A based on the above transformation method. Accordingly, the processor 12 obtains the second viewpoint position 50B being a relative position to the stage object 55 in the three-dimensional space R2.
Then, the processor 12 outputs a video of the live event as viewed from the second viewpoint position 50B as shown in
However, the coordinates of the performer object 51, the performer object 52, the speaker object 53, the speaker object 54 and the stage object 55 are transformed according to the change in the live venue. Therefore, respective objects are placed at positions away from each other, according to the three-dimensional space R2 larger than the three-dimensional space R1.
In this way, the video processing apparatus 1 according to the present embodiment is able to change a viewpoint position automatically according to the factor of a change in a sound in the live event and to achieve a meaningful change in the viewpoint position as the live event.
The processor 12 obtains an audio signal according to a live event and performs acoustic processing on the obtained audio signal. The processor 12, in a case in which a live venue is changed, generates a reflected sound according to the changed live venue. For example, the processor 12 generates a high-level pseudo reflected sound with long delay time and reverberation time according to the three-dimensional space R2 larger than the three-dimensional space R1.
The processor 12 may perform acoustic processing using the first viewpoint position 50A and the second viewpoint position 50B as a listening point. The acoustic processing using the first viewpoint position 50A and the second viewpoint position 50B as the listening point is localization processing described above, for example.
The processor 12, when outputting the video of the live event as viewed from the first viewpoint position 50A, performs localization processing such that the sound of the speaker objects 53 and 54 may be localized in the positions of the speaker objects 53 and 54, for example (S21). It is to be noted that either the processing of S13 or S21 may be first performed or both may be concurrently performed. In addition, the performer object 51 and the performer object 52 are also objects that emit a sound, so that the processor 12 may perform localization processing such that the sound of each of the performer object 51 and the performer object 52 may be localized in the positions of the performer object 51 and the performer object 52.
The processor 12 performs localization processing based on HRTF (Head Related Transfer Function), for example. The HRTF expresses a transfer function from a virtual sound source position to the right ear and left ear of a user. As shown in
Then, the processor 12, after transforming the coordinates of each object, performs acoustic processing using the second viewpoint position 50B as the listening point (S22). It is to be noted that either the processing of S16 or S22 may be first performed or both may be concurrently performed.
The processor 12 performs the binaural processing to convolve the HRTF so as to localize at a position on the left side in front of the user, on the audio signal corresponding to the speaker object 53. However, the position of the speaker object 53 as viewed from the second viewpoint position 50B in the three-dimensional space R2 is farther to the left side and in a depth direction than the position of the speaker object 53 as viewed from the first viewpoint position 50A in the three-dimensional space R1. Therefore, the processor 12 performs the binaural processing to convolve the HRTF so as to localize the sound of the speaker object 53 at a position far to the left side and in the depth direction. Similarly, the processor 12, on an audio signal corresponding to the speaker object 54, performs the binaural processing to convolve the HRTF so as to localize the sound of the speaker object 54 at a position far to the right side and in the depth direction.
In this manner, while the position of the object changes visually according to the change in the live venue, the localization position of the sound of the object also changes. Therefore, the video processing apparatus 1 according to the present embodiment is able to achieve a visually and auditorily meaningful change in a viewpoint position as a live event.
In addition, the processor 12 may perform acoustic processing using the first viewpoint position 50A and the second viewpoint position 50B as the listening point also in processing to generate the reflected sound in the reverb processing.
The reflected sound in the reverb processing is generated by convolving an impulse response previously measured at an actual live venue, for example, into an audio signal. The processor 12 performs the binaural processing to convolve the HRTF into the audio signal of a live event based on the impulse response previously measured at a live venue and the position of each reflected sound corresponding to the impulse response. The position of the reflected sound in an actual live venue is obtainable by installing a plurality of microphones at listening points of the actual live venue and measuring the impulse response. Alternatively, the reflected sound may be generated based on simulation. The processor 12 may calculate a delay time, level, and arrival direction of the reflected sound, based on the position of a sound source, the position of a wall surface of the live venue based on 3DCAD data, or the like, and the position of the listening point.
The processor 12, in a case in which the live venue is changed, obtains the impulse response according to the changed live venue, performs the binaural processing, and then performs the acoustic processing to generate the reflected sound according to the changed live venue. The change in the live venue also changes the position of each reflected sound in the impulse response. The processor 12 performs the binaural processing to convolve the HRTF into the audio signal of the live event, based on the position of each changed reflected sound.
As a result, while the live venue is visually changed, the reflected sound is also changed to a sound according to the live venue after the change. Therefore, the video processing apparatus 1 according to the present embodiment is able to achieve a visually and auditorily meaningful change in a viewpoint position as a live event and give a new customer experience to the user.
It is to be noted that the reflected sound in the reverb processing includes an early reflected sound and a late reverberant sound. Therefore, the processor 12 may change processing differently on the early reflected sound and the late reverberant sound. For example, the early reflected sound may be generated by simulation and the late reverberant sound may be generated based on the impulse response measured at the live venue.
The above embodiment showed a change in the live venue as an example of the second factor being a factor related to the sound of the live event. In Modification 1, the second factor is a factor of a musical change at the live event. The factor of a musical change is a transition, for example, from a state in which all performers are playing or singing to a state of a solo part for a certain performer. Alternatively, the factor of a musical change also includes a transition from a solo part for a certain performer to a solo part for another performer.
In Modification 1, the data related to a live event includes information that shows timing of a musical change and information that shows content of a musical change. The processor 12, based on the information, calculates a relative position to an object and changes the first viewpoint position 50A to the second viewpoint position 50B based on the calculated relative position. For example, in a case in which the data related to a live event includes information that shows transition timing to a guitar solo, the processor 12 calculates a relative position to the performer object 51 and changes the first viewpoint position 50A to the second viewpoint position 50B based on the relative position.
In this manner, the video processing apparatus 1 according to Modification 1, since, at the timing of the transition to a guitar solo, the second viewpoint position 50B is set at the relative position to the performer object 51 of a guitar, in a case in which the factor of a musical change at a live event occurs, such as the transition to a solo part or the like, is able to achieve a meaningful change in the viewpoint position as the live event.
Although the stage object 55 is designated as the first factor in the above embodiment, for example, the processor 12, in the processing of S11 in
The video processing apparatus 1 of Modification 2 receives an adjustment operation of the first viewpoint position 50A or the second viewpoint position 50B from a user. The processor 12 changes the first viewpoint position 50A or the second viewpoint position 50B according to the received adjustment operation. For example, the processor 12, in a case of receiving an operation of adjusting the height of the second viewpoint position 50B to a high position, changes the second viewpoint position 50B to a high position.
The video processing apparatus 1 of Modification 3 records information that shows a viewpoint position as time-series data.
For example, the processor 12, in a case of receiving a selection of the performer object 52 of a favorite vocal from a user in the processing of S11 in
Alternatively, the processor 12, as with Modification 2, in a case of receiving an adjustment operation of the second viewpoint position 50B from a user, may record the adjusted second viewpoint position 50B as time-series data.
As a result, the user can construct camera work data to which the own adjustment history is further added, at the viewpoint position that changes automatically according to an element related to the sound of a live event. The user can also release the constructed camera work data or can also obtain currently released camera work data of a different user and view, listen to, and enjoy a live event by the camera work data of a different user.
In the above embodiment, the first viewpoint position 50A and the second viewpoint position 50B correspond to the position of the user who views and listens to content, in the three-dimensional space. In contrast, in the video processing apparatus 1 of Modification 4, the first viewpoint position 50A and the second viewpoint position 50B are set at a different position from the position of the user who views and listens to content.
In addition, in this example, the processor 12 obtains information that shows a user position of a different user and places a different user object 75. The user position of the different user is received by the video processing apparatus of each user. The user position of the different user is obtained through a server.
In the example of
Herein, as shown in Modification 1, even when the processor 12 sets the second viewpoint position 50B at the relative position to the performer object 51 of a guitar at timing of transition to a guitar solo, the position of the user object 70 or the different user object 75 does not change.
For example, at a live event, a specific event may be completed by a plurality of participants in certain timing. For example, a participatory type event of participants such that a penlight owned by each participant in a live event is turned on in a designated color at a designated position to provide a specific pattern throughout a live venue may be performed.
The video processing apparatus 1 according to Modification 4 may also receive such a participatory type event of participants as a second factor being a factor related to a live event and may change the first viewpoint position to the second viewpoint position according to the second factor. For example, the video processing apparatus 1, in a case in which all users who participate in the live event are placed at designated user positions, sets the second viewpoint position 50B at a position from which the entire live venue is able to be overviewed.
The video processing apparatus 1 according to Modification 4 receives a user position in addition to the first viewpoint position 50A and the second viewpoint position 50B and places the user object 70 and the different user object 75. As a result, in a case in which an event such that a specific pattern is provided throughout a live venue, as described above, is completed, the pattern is also able to be overviewed.
The description of the present embodiments is illustrative in all points and should not be construed to limit the present invention. The scope of the present invention is defined not by the foregoing embodiments but by the following claims. Further, the scope of the present invention includes the scopes of the claims and the scopes of equivalents.
For example, the present embodiment shows a musical live event in which a singer and a guitarist perform, as an example of a live event. However, a live event also includes such an event as a play, a musical, a lecture, a reading, or a game tournament. For example, for an event in the game tournament, a virtual live venue in which a plurality of player objects, a screen object that displays a game screen, a chairperson object, an audience object, a speaker object that emits the sound of a game, and the like are placed is able to be set. In a case in which a user designates the first factor to designate a first player object, the video processing apparatus outputs the video of the live venue in which the first player has been captured. Then, as the second factor related to the live event, in a case in which a second player wins, for example, the video processing apparatus outputs the video in which the second player has been captured. In this manner, the video processing apparatus is also able to achieve a meaningful change in a viewpoint position as a live event, in the other types of events.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2022-047792 | Mar 2022 | JP | national |
This application is a continuation of PCT Application No. PCT/JP2023/009400, filed on Mar. 10, 2023, which claims priority to Japanese Application No. 2022-047792, filed on Mar. 24, 2022. The contents of these applications are incorporated herein by reference in their entirety.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/JP2023/009400 | Mar 2023 | WO |
| Child | 18893334 | US |