1. Technical Field
The exemplary and non-limiting embodiments relate generally to sound and, more particularly, to playing of sound with display of a map.
2. Brief Description of Prior Developments
Three dimensional (3D) virtual maps have become popular. As an example, GOOGLE STREETVIEW or NOKIA's 3D CITY MAPS are known. These maps provide a rather realistic view to cities around the world. However, one element is missing: sound. There are many existing methods to create sounds to virtual environments like in the case of a game sound design. Yet, the combination of real cities and their virtual maps is rather new. Navigational prompts, such as a voice command during map navigation, are also known.
The following summary is merely intended to be exemplary. The summary is not intended to limit the scope of the claims.
In accordance with one aspect, an example method comprises associating a sound with a first location in a virtual three dimensional (3D) map; and during navigation of a user from a second location to the first location with use of the virtual 3D map, playing the sound by an apparatus based, at least partially, upon information from the virtual 3D map.
In accordance with another aspect, an example apparatus comprises a non-transitory program storage device readable by a machine, tangibly embodying a program of instructions executable by the machine for performing operations, the operations comprising associating a sound with a location in a virtual three dimensional (3D) map; and during navigation of a user from a second location to a first location with use of the virtual 3D map, playing the sound based, at least partially, upon information from the virtual 3D map.
In accordance with another aspect, an example embodiment is provided in apparatus comprising a processor and a memory comprising software configured to associate a sound with a location in a virtual three dimensional (3D) map; and during navigation of a user from a second location to a first location with use of the virtual 3D map, control playing of the sound based, at least partially, upon information from the virtual 3D map.
The foregoing aspects and other features are explained in the following description, taken in connection with the accompanying drawings, wherein:
Referring to
The apparatus 10 may be a hand-held communications device which includes a telephone application. The apparatus 10 may also comprise an Internet browser application, camera application, video recorder application, music player and recorder application, email application, navigation application, gaming application, and/or any other suitable electronic device application. Referring to both
The display 14 in this example may be a touch screen display which functions as both a display screen and as a user input. However, features described herein may be used in a display which does not have a touch, user input feature. The user interface may also include a keypad 28. However, the keypad might not be provided if a touch screen is provided. The electronic circuitry inside the housing 12 may comprise a printed wiring board (PWB) having components such as the controller 20 thereon. The circuitry may include a sound transducer 30 provided as a microphone and one or more sound transducers 32 provided as a speaker and earpiece.
The receiver (s) 16 and transmitter(s) 18 form a primary communications system to allow the apparatus 10 to communicate with a wireless telephone system, such as a mobile telephone base station for example, or any other suitable communications link such as a wireless router for example. Referring also to
Referring also to
Referring also to
Features as described herein may be used for optimizing surround sound for 3D map environments, such as provided on the apparatus 10 for example. With 3D virtual maps, features may be used where the audio playback takes the effect of surrounding objects (including the directionality of sound sources) within the virtual scene so as to provide a more realistic virtual reality. All 3D virtual map navigations are based on a graph of paths that can be travelled. 3D virtual maps, such as GOOGLE STREETVIEW for example, have as their nodes the locations that have a 3D view of the surroundings available. Features as described herein may use graphs to simplify adding sounds to virtual worlds and using scanned 3D objects, such as buildings for example, to add reality to virtual worlds.
Adding sounds to virtual 3D maps can require a lot of processing power or sound very unnatural. Features as described herein may be used to reduce the necessary processing power, and also help to make the sound quality very natural (more realistic).
In one example embodiment, when sounds are spatialized to a user in a virtual environment, they are rendered to only the directions where the user can move, when the user is far away from the sound source. This makes navigation based on sounds easier for the user and requires less processing power than existing methods. In this case the objective is clearly not to create the sound scene for maximal authenticity. Instead, the objective is to play the sounds that are relevant from the navigation tasks point of view. Information that is typically present in such environment, but is not relevant for the navigation task, can be suppressed from the audio scene. The same sound may come from several directions, and the loudest direction may lead to the sound source the fastest.
Referring also to
In this example, when the user 50 is at least one navigation node 46 away from the sound source 58 (the destination at the first location 54), then the sound source 58 is played from the navigation direction the user has to move to in order to get to the sound source 58. In this example, sound source 58″ and 58′ are merely the same sound as sound source 58, but at lower volumes. In this example:
This way, only as many directions as the user may move along the path 41′ needs to be rendered. This reduces complexity and, thus, reduces the necessary processing power and time for processing. Additionally, this reduces the possible sound sources in the same node as the user may need to be rendered. In an alternate example, the sound source 58 could be played from each node at the user approaches the node without changes in volume.
However, in some embodiments it may be desirable to play a sound source louder when the user is closer to the first location (the destination) than when the user is farther away from the first location (the destination). Stated another way, in some embodiments it may be desirable to play closer sources louder than sources further away. In one example embodiment, this type of volume difference may be achieved by applying a multiplier to the sound source each time one moves from one node to another node. The multiplier may depend on the length of the arc 47 between the nodes. In game audio design, game developers typically want to exaggerate audio cues. For example, sound attenuation may be much faster compared to the natural sound attenuation in an equivalent physical space. Therefore, the sound design may also be adapted to make the navigation task easier.
In one type of example, when a new source is added to a node N, the system may propagate its sound through the graph. For each neighboring node the sound may be added to the existing sounds coming from node N with a multiplier a, (0<a≦1) that depends on the graph arc length 47 between to graph nodes. Then, the sound is added to the neighbors of the neighbors of N, and so forth, until the sound is multiplied close to zero.
In the example shown in
In one example embodiment, the sound sources are typically mono and have been recorded in an anechoic chamber. The sound sources are transmitted to the user as audio files (e.g. WAV, mp3 etc.). The sound sources are rendered to a user with headphones 52 using HRTF functions or, instead, loudspeakers can be used. “Auditory display” is a research field with a lot of literature on how to represent information to the user with audio. These systems can utilize 3D audio, but the content is commonly related to the graphical user interface (GUI) functionality.
Traditionally, sound sources in virtual worlds are rendered so that each source is rendered separately. For binaural playback each source is applied its corresponding head-related transfer function (HRTF) and the results are summed together. For multichannel playback each source is panned to the corresponding direction and loudspeaker signals are summed together. Panning or HRFT processing of sources individually requires a lot of processing power. In 3D maps, sounds such as verbal commands, are usually needed for directing the user to move in the right direction.
The sound played by the headphones 52 corresponding to a sound source (such as 58, 58′, 58″ for example) may be played to appear to come from a direction of the navigation path during playback. With a navigation application, there are usually a limited number of directions where the user can move (along a street for example). This can be used to limit the required processing power for directional sound generation. The sound sources 58, 58′, 58″ have an assigned location in the virtual world; at one of the nodes between the user and the final destination 54. A directed graph may be used for representing the map of the virtual world. A directed graph may be necessary because some streets may be one way and the sounds are different to different directions.
In one example embodiment, each arc 47 in the graph may have one sound associated to it called the arc sound. The arc sound may be the sum of the sounds that lead a user towards the sources of the summed sounds when one user traverses that arc. The arc sound may be played to a user from the direction of the arc (e.g. using HRTFs for example). Only those arc sounds where the arc leads away from the node where the user is, are played to the user. Each arc may have a weight called arc weight that is later used to attenuate the arc sounds relative to the length of the arc when a new sound source is added to the graph. The arc weight may be:
where ∥a∥ refers to the length of the arc, such as in meters for example.
If the sound source is “close” to a user, or if the user can reach and touch it in the virtual world, or if the user can see the sound source (the user has a real world line-of-sight to the source), a sound source may be played from its actual direction instead of one of the directions where the user can move to. This is illustrated by location 54′ having sound source 59 shown in
Each node in the graph may have one or more direct sounds. The direct sounds in a node are the sound sources that have a line-of-sight from their location to the node, and that are not too far away from the node. Thus, in one example, when the user is close to the sound source, the sound may be rendered from the actual direction of the sound source. In this way the user can best find the location of the sound source relatively quickly. Also, it should be possible for an administrator of the system to manually assign sound sources as direct sounds to a node. The system may calculate, for each direct sound in a node, a direction from which the sound should be played (i.e. the direction from the location of the sound source to the node). Also, each direct sound in a node may be assigned a weight proportional to the distance from the sound source location to the node. When a user is in a node, each of the direct sounds of that node may be played back to the user from the correct direction (e.g. using HRTFs for example). The apparatus 10 may be configured to allow a user to select and/or deselect which of the sounds should be played. Thus, in one type of example, even though the system may have 10 or more direct sound sources at a node, the user may be able to select only sound sources corresponding to a desired destination to be played, such as only sound sources corresponding to restaurants, or only sound sources corresponding to shopping locations. Thus, the sound sources actually played to the user may be reduced to only 1 or 2. The user may also be able to allow the user to chose and re-chose other selection criteria for playing the sound sources. For example, if in a first try the user only is provided with 2 choices within a first distance from the user, where the user does not like the 2 choices, the user may be able to reset the selection filter criteria to disregard choices in that first distance and extend the distance to a second farther distance from the user. This may give the user more choices and exclude the 2 closer choices which the user did not like.
Direct sounds may be attenuated with a weight that is dependent on the distance between the source location and the node. The attenuation weight may be for example:
Instead of calculating the shortest path anew every time a new node is reached, the information of the direction that leads to the sound source and the loudness at which the sound should be played (essentially the distance to the sound source) could be stored relative to each node in order to reduce the complexity of the system. For more than one sound coming from the same direction, the sounds may be combined to reduce the number of sounds that need to be stored in a node. Combining sources that come to the user through the same arc in the graph is a novel feature which may be used.
Referring to
In the example described above, the sound source (such as 59 in the example shown in
In one type of example embodiment, a sound source S may be added to the graph in the following manner:
where ∥P−xl∥ denotes the distance between location P and node xl. From nodes xl the sound is then propagated to neighboring nodes using a pseudo code. The pseudo code stops when ail the nodes within hearing distance from the sound source have been processed. This is done by setting a minimum weight for the sounds. The minimum weight depends on how far the sounds are desired to be heard from. For example the minimum weight could be:
Each node can have a temporary weight (a positive real number) and a temporary flag (flagged/not flagged) assigned to it. In the beginning all nodes are assigned zero as weight and all nodes are unflagged. The process for the pseudo code may comprise:
Referring also to
leading to node x. For all kε{k1, k2, . . . , kK}, multiply S with the weight of the node x and with the arc weight w(ak) and add the thus weighted S i.e. w(ak)νS to the existing arc sounds in arc ak.
The playback can be done for example as in the flowchart shown in
Summing the arc sounds together may lead to some inaccuracies. With an alternative embodiment it is possible to leave the arc sounds not summed. This way each arc may have several sounds associated to it. When the same sound reaches a node from several different directions (arcs), only the loudest one may be played back to the user.
In another aspect 3D scanning to get a rough estimate of the surrounding structures, and using images to recognize trees, lakes or snow may be used to refine the acoustic model may be used. When sounds are played back in a virtual world, the type of the surrounding area may be taken into account. Ambient environment sounds are very different in an open countryside setting, where there are almost no echoes, as compared to a narrow street with high-rise building on both sides where there are a lot of reflecting surfaces and hence echoes. Acoustic propagation of sound is a well understood phenomena and it can be modeled quite accurately provided that the model of the acoustic environment is accurate enough. However, accurate simulation of real physical spaces, such as a 3D city model for example, may require a lot of information about accurate geometries and acoustic properties of different surfaces. In mobile applications such level of fidelity is difficult to justify since the objective is to render a realistic illusion of a physical acoustic space instead of aiming for authentic rendering of sound environment, such as in the case of modeling of a concert hall for building such hall for example.
When 3D virtual city models are created, city buildings are 3D scanned. These 3D scans may also be used to control the rendering of audio. An impulse response that matches the current 3D scan may be applied to the sounds that are played, thus creating a better match between the visual and auditory percepts of the virtual world. When cities are photographed for making 3D models, creation of the base electronic navigable maps, such as NAVTEQ for example, also scans the buildings in 3D. The 3D model that is built of one cities can be used to help make audio sound more realistic. In a simple example implementation different filters may be applied to the audio signals depending on how many walls or other structures or objects are surrounding the node where the user is. For example there can be five different filters based on how many sides of the node are surrounded by walls (0-4).
In a first example, a database of 3D scans and related impulse responses are created. Different locations are 3D scanned (such as what NAVTEQ does). The scan results in a point cloud. Let the points of the point cloud in location X1 be X1,1, X1,2, . . . X1,N1. In the same locations, impulse responses from different directions to the user location are also recorded. For example directions with 10 degree spacing on the horizontal plane may be used. It is possible to use directions above or below the horizontal plane as well. Let the 36 directions on the horizontal plane be D1, D2, . . . , D36. A starter pistol is fired around a microphone with an e.g. 5 meter radius from directions Di and the resulting sound is recorded. The recorded sounds are clipped to e.g. 20 ms. These are the recorded impulse responses from location X1. Let's assume that the impulse responses are Ix
When a user is in the virtual world in location Y the point cloud scanned (by e.g. NAVTEQ) in location Y is compared to the point clouds in the database. Let she point cloud in location Y be {y1, y2, . . . , yM}. Point clouds can be compared by comparing the points in them in the following way:
Location Xm is now the location in the database that best corresponds to location Y. Therefore, the impulse responses {Ix
Playback of the Impulse response filtered sounds can be found in the following flowchart shown in
ƒ(D)=Ix
In another example embodiment, instead of comparing different point clouds and having a database of impulse responses as in the first example embodiment described above, it is possible to estimate the desired impulse responses directly from the point cloud of the current location. Firstly, walls are detected from the point could of the current location. As an example (see
An artificial sound source may be placed into the virtual world. The sound source could be e.g. directions to a Point Of Interest (POI), an advertisement, sound effects for an object in the virtual world like a fountain, etc. The sound source 59 played back to the user has reflections 64, 66 added to it to account for the expected sound environment from the visual environment. The sound from the sound source may be x. The total sound played to the user could be e.g.:
Where Li are the sound paths with reflections taken into account from the sound source to the user. ƒ(Li) is an attenuation function that attenuates sounds that travel a longer distance, ƒ(0)=1, ƒ(infinity)=0 where the scale is linear in decibel domain. Additionally each reflection can be made to have a frequency dependent additional attenuation. C is the speed of sound. Reflections are attenuated, delayed versions of the original sound source.
Filters may be created with several acoustic simulation methods, but even simple ray tracing model can produce a feel of sound envelopment that correlates with the 3D model of the reality and makes it easier to associate the sound scene with the reality. Similarly it is possible to map some environmental factors such as wind or rain into the sound scene, or traffic information such as congestion into the sound scene. In many cases Informative sound environment may be a much more preferable way of passing information about the environment compared to voice prompts telling that traffic is heavy or it is likely to rain.
A conventional 3D model itself does not describe the nature of sounds sources in the environment, but they can be created, or synthesized, based on the nearby POI information and sound libraries that correlate with the local geographical data such as, for example, a park in Tokyo surrounded by high buildings or a street in London next to a football stadium. Features as described herein may be independent of the sound sources, and sound creation methods that can be applied as a sound sources for such method. Also, the camera images of the surroundings can be used to affect the select ion of proper impulse responses. Recognition of trees/lakes/snow and other environmental structures may be mapped to the 3D model of environment to refine the acoustic model of the environment.
In one type of example, a method comprises associating a sound with a location in a virtual three dimensional (3D) map; and during navigation of a user to the location with use of the virtual 3D map, when a navigation task of the virtual 3D map is located between the user and the location, playing the sound as coming from a direction of the navigation task irrespective of a direct direction of the user relative to the location.
In one type of example embodiment, a non-transitory program storage device readable by a machine is provided, tangibly embodying a program of instructions executable by the machine for performing operations, the operations comprising associating a sound with a location in a virtual three dimensional (3D) map; and during navigation of a user to the location with use of the virtual 3D map, when a navigation task of the virtual 3D map is located between the user and the location, playing the sound as coming from a direction of the navigation task irrespective of a direct direction of the user relative to the location.
In one type of example embodiment, an apparatus comprises a processor and a memory comprising software configured to associate a sound with a location in a virtual three dimensional (3D) map; and during navigation of a user to the location with use of the virtual 3D map, when a navigation task of the virtual 3D map is located between the user and the location, play the sound as coming from a direction of the navigation task irrespective of a direct direction of the user relative to the location.
One type of example method comprises associating a sound with a first location in a virtual three dimensional (3D) map; and during navigation of a user from a second location to the first location with use of the virtual 3D map, playing the sound by an apparatus based, at least partially, upon information from the virtual 3D map.
When a navigation task of the virtual 3D map is located between the user and the first location, the method may comprise playing the sound as coming from a direction of the navigation task irrespective of a direct direction of the user relative to the first location. A volume of which the sound is played may be based, at least partially, upon a distance of a location of the navigation cask on the virtual 3D map relative to the first location. A volume of which the sound is played may be based, at least partially, upon at least one second navigation task of the virtual 3D map located between the user and the first location on the virtual 3D map. A volume of the sound may be based, at least partially, upon a distance of the user relative to the first location. When a navigation task of the virtual 3D map is not located between the user and the first location, the method may comprise playing the sound as coming from a direct direction of the first location relative to the user. Playing the sound may comprise playing the sound comprises playing the sound as coming from at least two directions. Playing the sound as coming from a first one of the directions may be played a first way, and where playing the sound as coming from a second one of the directions is played a second different way. The information from the virtual 3D map may comprise at least one of an ancillary sound source and a sound reflection source which influences playing of the sound.
In one type of example embodiment, a non-transitory program storage device readable by a machine is provided, tangibly embodying a program of Instructions executable by the machine for performing operations, the operations comprising associating a sound with a location in a virtual three dimensional (3D) map; and during navigation of a user from a second location to the first location with use of the virtual 3D map, playing the sound based, at least partially, upon information from the virtual 3D map. When a navigation task of the virtual 3D map is located between the user and the first location, the operations may comprise playing the sound as coming from a direction of the navigation task irrespective of a direct direction of the user relative to the first location. The Information from the virtual 3D map may comprise at least one of an ancillary sound source and a sound reflection source which influences playing of the sound.
In one type of example embodiment, an apparatus is provided comprising a processor and a memory comprising software configured to associate a sound with a location in a virtual three dimensional (3D) map; and during navigation of a user from a second location to a first location with use of the virtual 3D map, control playing of the sound based, at least partially, upon information from the virtual 3D map.
When a navigation task of the virtual 3D map is located between the user and the first location, the apparatus may be configured to control playing of the sound as coming from a direction of the navigation task irrespective of a direct direction of the user relative to the first location. The apparatus may be configured to control volume of which the sound is played is based, at least partially, upon a distance of a location of the navigation task on the virtual 3D map relative to the first location. The apparatus may be configured to control volume of which the sound is played is based, at least partially, upon at least one second navigation task of the virtual 3D map located between the user and the first location on the virtual 3D map. The apparatus may be configured to control volume of the sound based, at least partially, upon a distance of the user relative to the first location. When a navigation task of the virtual 3D map is not located between the user and the first location, the apparatus may be configured to control playing the sound as coming from a direct direction of the first location relative to the user. The apparatus may be configured to control playing the sound as coming from at least two directions. The Information from the virtual 3D map may comprise at least one of an ancillary sound source and a sound reflection source which influences playing of the sound. The apparatus may comprise means for controlling of playing of the sound based, at least partially, upon the information from the virtual 3D map.
In one type of example embodiment, an apparatus comprises a processor and a memory comprising software configured to associate a sound with a location in a virtual three dimensional (3D) map; and control playing of the sound based, at least partially, upon information from the virtual 3D map, the information from the virtual 3D map comprises at least one of an ancillary sound source and a sound reflection source which influences playing of the sound.
Unlike a video game where a character in the video game moves around the virtual world and the user hears different sounds as the character moves to different locations, features as described herein may be used where “a user” (in the real world) moves from a second location to a first location with use of the virtual 3D map. In one type of alternate example, the apparatus may be configured to control playing of the sound based upon some parameter in the virtual 3D map other than location of the user, such as the nearest node on the map (regardless of the actual position of the user) relative to the first position. For example, referring to
In one example embodiment, an apparatus is provided comprising a processor and a memory comprising software configured to: associate a sound with a location in a virtual three dimensional (3D) map; and during navigation from a second location to a first location within the virtual 3D map, control playing of the sound based, at least partially, upon the first location information within the virtual 3D map relative to the second location. The sound may be played based on the location of the first location relative to the second location.
It should be understood that the foregoing description is only illustrative. Various alternatives and modifications can be devised by those skilled in the art. For example, features recited in the various dependent claims could be combined with each other in any suitable combination(s). In addition, features from different embodiments described above could be selectively combined into a new embodiment. Accordingly, the description is intended to embrace all such alternatives, modifications and variances which fall within the scope of the appended claims.