This Nonprovisional application claims priority under 35 U.S.C. § 119(a) to Japanese Patent Application No. 2022-044126 filed on Mar. 18, 2022, the entire content of which is hereby incorporated by reference.
An embodiment of the present disclosure relates to an information processing method and an information processing apparatus.
International Publication No. 2021/241421 discloses a sound processing apparatus that obtains an image of an acoustic space. The sound processing apparatus sets a plane and a virtual speaker from the image of the acoustic space. The sound processing apparatus calculates sound pressure distribution from characteristics of the virtual speaker, and generates an image in which the sound pressure distribution is overlapped with the plane.
Japanese Unexamined Patent Application Publication No. 2008-035251 discloses a speaker apparatus and a remote controller. The speaker apparatus measures a position of the remote controller. The speaker apparatus directs a sound beam to the position of the remote controller.
A user cannot visually recognize a direction of the sound beam to be outputted from an acoustic device such as a speaker.
An embodiment of the present disclosure is directed to provide an information processing method in which a user can visually recognize a direction of a sound beam to be outputted from an acoustic device such as a speaker.
An information processing method according to an embodiment of the present disclosure obtains first position information that indicates a position of at least one of a ceiling surface, a wall surface, or a floor surface in a predetermined space, obtains second position information that indicates a position of an acoustic device that outputs a sound beam in the predetermined space, and obtains direction information that indicates a direction of the sound beam to be outputted from the acoustic device; calculates a locus of the sound beam to be outputted from the acoustic device, based on the first position information, the second position information, and the direction information that have been obtained; and generates a sound beam image that shows the locus of the sound beam, based on a result of calculation.
According to the information processing method according to an embodiment of the present disclosure, a user can visually recognize a direction of a sound beam to be outputted from a speaker.
Hereinafter, MR (Mixed Reality) goggles 1 that execute an information processing method according to a first embodiment will be described with reference to the drawings.
The MR goggles 1 are an example of an information processing apparatus. A user wearing the MR goggles 1 can visually recognize an image being displayed on the MR goggles 1 while visually recognizing a real space through the MR goggles 1.
As shown in
As shown in
The communication interface 10 may be a network interface or the like. The communication interface 10 communicates with the speaker 2 by wireless such as Wi-Fi (registered trademark) or Bluetooth (registered trademark), for example.
The flash memory 11 stores various programs. The various programs may include a program that operates the MR goggles 1, for example.
The RAM 12 temporarily stores a predetermined program stored in the flash memory 11.
The processor 13 executes various types of processing by reading out the predetermined program stored in the flash memory 11 to the RAM 12. It is to be noted that the processor 13 does not necessarily need to execute the program stored in the flash memory 11. The processor 13, for example, may download a program from a device (a server or the like, for example) outside the MR goggles 1 through the communication interface 10, and may read out a downloaded program to the RAM 12.
The display 14 displays various information based on an operation of the processor 13. In the present embodiment, the display 14 of the MR goggles 1 is an organic EL display including a half mirror and a light emitting element, for example. The user can see a display content (an image or the like) reflected by the half mirror. The half mirror transmits light incident from the front of the user. Therefore, the user can also visually recognize the real space through the half mirror.
The sensor 15 senses an environment around the MR goggles 1 to obtain data. In the present embodiment, the MR goggles 1, as shown in
In addition, as shown in
It is to be noted that the sensor 15 may not necessarily be a stereo camera. The sensor 15 may be LiDAR (Light Detection And Ranging) or the like, for example. The LiDAR, by obtaining time from irradiation of laser light to detection of the laser light reflected by an object (the speaker 2, the ceiling surface CS, the wall surface WS or the floor surface FS), measures a distance with the object.
The speaker 2 outputs a sound on the basis of an audio signal. The speaker 2 outputs the sound beam B1 with a directivity (see
The communication interface 20 may be a network interface or the like. The communication interface 20 communicates with the MR goggles 1 by wireless such as Wi-Fi (registered trademark) or Bluetooth (registered trademark), for example, or by wire.
The user interface 21 receives various operations from a user. The user interface 21 may be a remote controller, for example. The user sets an angle (an angle seen from the speaker 2) at which the sound beam B1 is outputted, by operating (button operating or the like) the remote controller.
In the present embodiment, the speaker 2 is placed on the ceiling surface CS configuring the space Sp, for example (see
The flash memory 22 stores various programs. The various programs may include a program that operates the speaker 2, for example.
The RAM 23 temporarily stores a predetermined program stored in the flash memory 22.
The audio interface 24 receives an audio signal from an apparatus different from the speaker 2 by wireless such as Wi-Fi (registered trademark) or Bluetooth (registered trademark) or by wire. The apparatus different from the speaker 2 may be a not-shown PC, a smartphone, or the like, for example.
The processor 25 executes various types of processing by reading out the predetermined program stored in the flash memory 22 to the RAM 23. The processor 25 may be a CPU or a DSP (Digital Signal Processor), for example. It is to be noted that the processor 25 may include both the CPU and the DSP. It is to be noted that the processor 25 does not necessarily need to execute the program stored in the flash memory 22. The processor 25, for example, may download a program from a device (a server or the like, for example) outside the speaker 2 through the communication interface 20, and may read out a downloaded program to the RAM 23.
The processor 25 receives information (hereinafter referred to as direction information DI) that indicates a direction of the sound beam B1 to be outputted from the speaker 2 according to the operation received by the user interface 21. The direction information DI specifically indicates an angle θ, angle φ, or the like.
The processor 25 performs signal processing on a digital audio signal received through the audio interface 24. The signal processing may include processing to generate the sound beam B1, for example. The processor 25 adjusts a delay amount based on received direction information DI so that a phase of a sound to be outputted from each of the plurality of speaker units 28 may be aligned in a predetermined direction. In such a case, the processor 25 performs delay control based on an adjusted delay amount, to an audio signal to be supplied to each of the plurality of speaker units 28. As a result, a sound to be outputted from each of the plurality of speaker units 28 is mutually strengthened in the predetermined direction. In other words, the processor 25 performs the delay control to the audio signal to be supplied to each of the plurality of speaker units 28 so that a sound may be mutually strengthened in a direction (the angle θ and the angle φ) that has been set by the user.
The plurality of DA converters 26 receive the digital audio signal on which the signal processing has been performed, by the processor 25. The plurality of DA converters 26 obtain an analog audio signal by DA converting a received digital audio signal. The plurality of DA converters 26 send the analog audio signal to the plurality of amplifiers 27.
The plurality of amplifiers 27 amplify the received analog audio signal. Each of the plurality of amplifiers 27 sends an amplified analog audio signal to each of the plurality of speaker units 28.
The plurality of speaker units 28 emit a sound, based on the analog audio signal received from the plurality of amplifiers 27.
It is to be noted that the speaker 2 does not necessarily need to receive a direction in which the sound beam B1 is outputted, based on a user operation to the user interface 21. The speaker 2 may receive information according to the direction in which the sound beam B1 is outputted from a not-shown PC, a smartphone, or the like, through the communication interface 20, for example. In such a case, the PC, the smartphone, or the like installs an application program for setting the direction in which the sound beam B1 is outputted, for example. The application program receives the direction information DI according to an operation from a user. The application program sends the direction information DI to the speaker 2.
Hereinafter, processing (hereinafter referred to as processing P) according to visualization of the sound beam B1 in the MR goggles 1 will be described with reference to the drawings.
The processor 13, as shown in
The processor 13 starts the processing P when the MR goggles 1 start up or a predetermined application program according to the processing P is executed, for example (
After a start, the obtainer 130, as shown in
Next, the obtainer 130 performs image processing (first image processing of the present disclosure) to recognize the ceiling surface CS, the wall surface WS, or the floor surface FS from the image data DD (first image data obtained by capturing the ceiling surface CS, the wall surface WS, or the floor surface FS) (
Subsequently, the obtainer 130 obtains position information FLI (first position information in the present disclosure) that indicates a position of the ceiling surface CS, the wall surface WS, or the floor surface FS in a predetermined space (
The obtainer 130 similarly obtains the position information FLI on each surface (the wall surface WS and the floor surface FS). The MR goggles 1 are able to automatically obtain the position information FLI by the first image processing.
Subsequently, the obtainer 130 performs image processing (second image processing of the present disclosure) to recognizes the speaker 2 (the acoustic device) from the image data DD (second image data obtained by capturing the speaker 2) (
It is to be noted that the MR goggles 1, as with the first image processing, for example, may recognize the speaker 2 by object recognition processing by artificial intelligence. In such a case, the obtainer 130 recognizes the speaker 2 by using a learned model learned by machine learning a relationship between an inputted image and an object such as the speaker 2.
Subsequently, the obtainer 130 obtains position information SLI (second position information) that indicates the position of the speaker 2 that outputs the sound beam B1 in the space Sp (inside the predetermined space) (
Subsequently, the obtainer 130 obtains direction information DI that indicates the direction of the sound beam B1 to be outputted from the speaker 2 (
Subsequently, the calculator 131, as shown in
The calculator 131 calculates the direction in which the sound beam B1 in the space Sp is outputted, based on the direction information DI. Specifically, the calculator 131 obtains the angle θ and the angle φ from the speaker 2 as the direction information DI. The angle θ and the angle φ are angles in the polar coordinate system with reference to the position of the speaker 2. Therefore, the calculator 131 obtains a slope (l, m, n) in the three-dimensional rectangular coordinate system corresponding to the angle θ and the angle φ. The calculator 131 defines a straight line (x, y, z)=(x1, y1, z1)+t(l, m, n) (t is any value) passing through the position (x1, y1, z1) of the speaker 2. In addition, the calculator 131 obtains coordinates Cd2 of an intersecting position at which the straight line intersects the floor surface FS or the wall surface WS (see
Lastly, the generator 132 generates a sound beam image that shows the locus of the sound beam B1, based on a result of calculation of the locus of the sound beam B1 (
The above processing from step S11 to step S18 completes execution of a series of processing P in the MR goggles 1 (
The MR goggles 1 according to the present embodiment display a generated sound beam image on the display 14. As a result, the user can visually recognize the locus of the sound beam B1 to be outputted from the speaker 2. Therefore, the user can visually recognize the direction of the sound beam B1 to be outputted from the speaker 2. As a result, the user can more easily adjust the sound beam B1. For example, the user can correctly adjust the angle of the sound beam B1, or the like, by seeing a visualized sound beam B1. Therefore, the user, by comparing a case of adjusting the sound beam B1 only by a sound, can orient the direction of the sound beam B1 to a desired direction.
It is to be noted that the speaker 2 does not necessarily need to be placed in the closed space Sp including the ceiling surface CS, the wall surface WS, and the floor surface FS. For example, the speaker 2 may be placed in a space such as an open space that has no ceiling surface CS. In such a case, the speaker 2 is placed on the wall surface WS or the floor surface FS, for example.
It is to be noted that the speaker 2 may be placed outdoors. In such a case, the speaker 2 is placed on the floor surface FS.
Hereinafter, MR goggles 1a according to a first modification will be described with reference to the drawings.
The speaker 2 is placed so that the sound beam B1 may be outputted with reference to a negative Y direction (a direction perpendicular to the wall surface WS and the front of the speaker 2). Therefore, in the present modification, the X′ direction shown in
The calculator 131 of the MR goggles 1a obtains a slope (l1, m1, n1) in the three-dimensional rectangular coordinate system corresponding to the angle θ and the angle φ in the polar coordinate system. In addition, the calculator 131 of the MR goggles 1a obtains a position (x2, y2, z2) of the speaker 2 by the above second image processing or the like. The calculator 131 of the MR goggles 1a obtains coordinates Cd3 of an intersecting position at which a straight line (x, y, z)=(x2, y2, z2)+t(l1, m1, n1) passing through the position (x2, y2, z2) of the speaker 2 intersects the wall surface WS (see
As shown in
Lastly, the generator 132 of the MR goggles 1a generates a sound beam image that shows the loci of the sound beam B1 and the sound beam B2. For example, the generator 132 of the MR goggles 1a, as with the generator 132 of the MR goggles 1, performs calculation that matches the above three-dimensional coordinates with the position of the two-dimensional coordinates of the display 14. In such a case, the sound beam image includes an image (a reflection image) that shows the locus of the sound beam B2 after the reflection.
It is to be noted that the number of reflections is not limited to one. The sound beam may be outputted toward the ceiling surface CS and may be reflected on the ceiling surface CS. In addition, a sound beam may be outputted toward the floor surface FS and may be reflected on the floor surface FS.
Moreover, the MR goggles 1a may vary the color or the like of the image that shows the sound beam before and after a reflection, based on the characteristic information (the degree of sound absorption of the ceiling surface CS, the wall surface WS, or the sound absorption of the floor surface FS, for example) on the ceiling surface CS, the wall surface WS, or the floor surface FS. Specifically, the calculator 131 obtains the characteristic information (the degree of sound absorption of the ceiling surface CS, the wall surface WS, or the floor surface FS, for example) on the ceiling surface CS, the wall surface WS, or the floor surface FS. For example, the calculator 131 previously reads out the characteristic information stored in the flash memory 11. The generator 132 varies the image (the reflection image) that shows the sound beam B2, based on the degree of sound absorption. For example, the generator 132, according to the degree of sound absorption, causes (varies from dark blue to light blue, for example) the color of the image that shows the sound beam B2 after the reflection to be lighter than the color of the image that shows the sound beam B1 before the reflection.
It is to be noted that the characteristic information is not limited to the degree of sound absorption. The characteristic information may include the surface hardness, surface roughness, thickness, density or the like, of a wall or the like, for example. In such a case, the calculator 131 previously reads out (obtains) the characteristic information stored in the flash memory 11, for example. The generator 132 changes the image, based on read characteristic information. For example, the generator 132 varies (varies from dark blue to light blue, for example) the shade of the image that shows the sound beam B1 according to the density of a wall or the like. Similarly, the generator 132 varies the shade of the image that shows the sound beam B1, based on the surface hardness, surface roughness, thickness, or the like, of a wall or the like, for example.
Moreover, the MR goggles 1a may estimate a degree of sound absorption, based on obtained surface hardness, surface roughness, thickness, density or the like, of a wall or the like, and may vary the image that shows the sound beam B1, based on an estimated degree of sound absorption.
It is to be noted that the MR goggles 1a, even in a case of obtaining no characteristic information, may suitably vary the color or the like of the image that shows the sound beam before and after the reflection.
Moreover, the generator 132 may vary a property other than the color of the image that shows the sound beam. For example, the generator 132 may vary (varies a length of a width of a line segment that shows the sound beam, for example) a size of the image that shows the locus of the sound beam, or may vary a shape or the like, before and after the reflection.
It is to be noted that the MR goggles 1a may vary the sound beam image, based on information other than the characteristic information. For example, the generator 132 may vary the sound beam image, based on at least one of a channel of the sound beam, a volume of the sound beam, or frequency characteristics of the sound beam. For example, the generator 132 may generate the sound beam image so that the color or the like of the image of the sound beam to be outputted from an R channel of the speaker 2 may be different from the color of the image of the sound beam to be outputted from an L channel of the speaker 2. In addition, for example, the generator 132 may thicken the color as the volume of the sound beam is increased. Moreover, for example, the generator 132 may vary the color of the image that shows the sound beam according to frequency. For example, the generator 132 may vary the color of the image to red when the level of a low frequency component is high and to blue when the level of a high frequency component is high.
The user cannot visually recognize the sound beams B1 and B2, and finds it extremely difficult to determine in which the direction the sound beam B2 reflected on a wall goes. In contrast, the MR goggles 1a visualize the sound beam B2 reflected on the ceiling surface CS, the wall surface WS, or the floor surface FS. As a result, the user can visually recognize the locus of the sound beam B2 reflected on the wall or the like. Therefore, the user can more easily perform adjustment or the like of the direction of the sound beam B2 reflected on the wall or the like.
For example, the MR goggles 1a vary the shade of the color of the sound beam image before and after the reflection, according to the degree of sound absorption of the ceiling surface CS, the wall surface WS, or the floor surface FS. As a result, the user can visually recognize the variation or the like of the volume of the sound beam B2 to be reflected on the wall or the like.
For example, the MR goggles 1a vary the sound beam image, based on the channel of the sound beam. As a result, the user can visually recognize from which either the R channel or the L channel the sound beam has been outputted, or the like, for example.
For example, the MR goggles 1a vary the sound beam image, based on the frequency characteristics of the sound beam. As a result, the user can visually recognize the frequency of the sound beam.
An information processing apparatus of a second modification is VR (Virtual Reality) goggles (not shown), in place of MR goggles. The VR goggles display an image on the basis of image data DD (camera image data) obtained by capturing by the sensor 15 (the stereo camera) on the display 14. As a result, a user of the VR goggles can visually recognize a real space by the image displayed on the display 14.
The VR goggles, as with the processor 13 of the MR goggles 1, calculate the locus of a sound beam B1 and generates a sound beam image.
The VR goggles generate the image (hereinafter, referred to as a display image) displayed on the display 14 from the image data DD (the camera image data), and performs processing to superimpose the sound beam image of the sound beam B1 on the display image. The VR goggles output the display image on which the sound beam image is superimposed, to the display 14. As a result, the user can visually recognize the locus of the sound beam B1, while visually recognizing a real space (a space around the user). In this manner, the VR goggles produce the same effect as the MR goggles 1.
It is to be noted that the information processing apparatus such as a smartphone, similarly to the above, is also able to display the display image on which the sound beam image is superimposed.
Hereinafter, MR goggles 1 according to a third modification will be described with reference to the drawings.
In the present modification, a camera (hereinafter, referred to as a capturing camera) placed at a position different from the position of the MR goggles 1 also detect a position of a user U. In other words, the capturing camera detects position information FLI on the ceiling surface CS, the wall surface WS, and the floor surface FS, position information SLI on the speaker 2 (the acoustic device), and user position information. The MR goggles 1 obtain the position information FLI, the position information SLI, and the user position information, from the capturing camera. The MR goggles 1 obtain direction information DI of a sound beam, from the speaker 2. The MR goggles 1 calculate the locus of the sound beam to be outputted from the speaker 2 (the acoustic device), based on the position information FLI, the position information SLI, the direction information DI, and the user position information that have been obtained.
The capturing camera is placed at a position (a position at which an image as shown in
The capturing camera performs the first image processing and the second image processing on the image data DD. In addition, the capturing camera obtains the user position information that shows the position (coordinates Cd4 shown in
The MR goggles 1 obtain the direction information DI from the speaker 2. The MR goggles 1 calculate the locus of the sound beam B1, based on the position information FLI, the position information SLI, and the direction information DI. The position information FLI, the position information SLI, the direction information DI, and the position (the coordinates Cd4) of the user U is a position with reference to the position of the capturing camera. Therefore, the MR goggles 1 convert the position information FLI, the position information SLI, and the direction information DI into a position at which the coordinates Cd4 are defined as a reference (an origin), and convert the locus of the sound beam. The MR goggles 1 perform display on the basis of a sound beam image. The MR goggles 1 display the sound beam image with reference to the position of the user U. Therefore, the user U can visually recognize the direction of the sound beam B1 to be outputted from the speaker 2.
Hereinafter, in the fourth modification, a first apparatus (a server or the like) different from the MR goggles 1 performs all calculations and generation of a sound beam image. The MR goggles 1 (a second apparatus) of the fourth modification obtain a sound beam image generated by the server (the first apparatus) or the like, and display an obtained sound beam image on the display 14.
In the present modification, a different apparatus such as a server, in place of the MR goggles 1, performs the first image processing, the second image processing, the calculation of the locus of the sound beam B1, and the generation of the sound beam image. Therefore, a load of processing on the MR goggles 1 is reduced. Therefore, even when performance of the processor 13 of the MR goggles 1 is low, the MR goggles 1 are able to more easily display the sound beam image, without causing a delay or the like.
The description of the foregoing embodiments and modifications is illustrative in all points and should not be construed to limit the present disclosure. The scope of the present disclosure is defined not by the foregoing embodiments and modifications but by the following claims. Further, the scope of the present disclosure is intended to include all changes within the scopes of the claims of patent and within the meanings and scopes of equivalents.
The configurations of the MR goggles 1, the MR goggles 1a, the VR goggles according to the second modification, the MR goggles 1 according to the third modification, and the MR goggles 1 according to the fourth modification may be optionally combined.
Number | Date | Country | Kind |
---|---|---|---|
2022-044126 | Mar 2022 | JP | national |