The disclosure relates to a robot and a method for controlling the robot thereof, and more particularly to a robot for estimating a position at which a user voice is uttered and a method for controlling the robot.
Recently, technology on a sound source direction estimation for estimating a position of a sound source of a user voice or another audio signal is being developed. The sound source direction estimation may refer to a technique of estimating a direction of a user based on a microphone by measuring an intensity of a voice signal for each direction through sound included with user voice information.
However, this method lacks usability in an indoor environment in which sound is easily reflected from walls and the like, and is used with another sensor such as a camera to compensate the above. However, if the method uses another sensor such as a LiDAR sensor or a camera, identifying that an object is present at a relevant position is possible, but there is a problem of additional operations being necessary to repeatedly move a robot or rotating the robot in order to identify the relevant position through the LiDAR sensor or the camera.
According to an aspect of the disclosure, a robot includes: at least one sensor; a speaker; a microphone; a driver; at least one memory storing one or more instructions; and at least one processor configured to execute the one or more instructions, wherein the one or more instructions, when executed by the at least one processor, cause the robot to: generate a map including information regarding a plurality of objects based on sensing information obtained through the at least one sensor, generate ultrasonic waves toward each of the plurality of objects through the speaker, obtain reflectivity information regarding the plurality of objects based on reflected sounds reflected from each of the objects and received through the microphone, and store the reflectivity information, the reflected sounds reflected from each of the objects being at least a portion of the ultrasonic waves reflected from each of the objects, based on receiving a user voice through the microphone, obtain information on an intensity of the user voice for each of a plurality of directions, obtain information on a plurality of candidate directions from which the user voice is received from among the plurality of directions based on the information on the intensity of the user voice for each of the plurality of directions, obtain priority order information for the plurality of candidate directions based on a position of the robot and the stored reflectivity information, and obtain information on a direction in which the user voice is uttered from among the plurality of candidate directions based on the priority order information.
The one or more instructions, when executed by the at least one processor, may further cause the robot to: generate ultrasonic waves at preset distance intervals with respect to a wall object among the plurality of objects, and obtain reflectivity information regarding the wall object based on reflected sounds reflected from the wall object and received through the microphone, the reflected sounds reflected from the wall object being at least a portion of the ultrasonic waves generated at the preset intervals reflected from the wall object.
The one or more instructions, when executed by the at least one processor, may further cause the robot to: generate ultrasonic waves from two or more directions toward an object among the plurality of objects other than the wall object, and obtain reflectivity information regarding the object by obtaining an average value of reflected sounds reflected from the object and received through the microphone, the reflected sounds reflected from the object being at least a portion of the ultrasonic waves generated from the two or more directions that is reflected from the object.
The one or more instructions, when executed by the at least one processor, may further cause the robot to: obtain the information on the intensity of the user voice for each of the plurality of directions based on the position of the robot, and identify as the plurality of candidate directions a preset number of directions among the plurality of directions in which the respective intensity of the user voice exceeds a predetermined threshold.
The one or more instructions, when executed by the at least one processor, may further cause the robot to: identify objects, among the plurality of objects, positioned in the plurality of candidate directions relative to the position of the robot, and identify a priority order with respect to the plurality of candidate directions based on the respective information on the intensity of the user voice corresponding to each of the plurality of candidate directions and reflectivity information corresponding to the identified objects.
The one or more instructions, when executed by the at least one processor, may further cause the robot to: obtain a corrected intensity of the user voice for each of the plurality of candidate directions by multiplying weight values to the respective intensity of the user voice corresponding to each of the plurality of candidate directions, and identify a priority order based on the corrected intensities for the plurality of candidate directions, and wherein the weight values correspond to the reflectivity information corresponding to the identified objects.
The weight values and the reflectivity information corresponding to the identified objects may be inversely proportional to one another.
For each candidate direction of the plurality of candidate directions, the priority order may be directly proportional to the corrected intensity, and the one or more instructions, when executed by the at least one processor, may further cause the robot to: identify a candidate direction with a highest priority order from among the plurality of candidate directions as the direction in which the user voice is uttered.
The one or more instructions, when executed by the at least one processor, may further cause the robot to: perform voice recognition on the user voice by performing beam forming in the direction in which the user voice is uttered.
According to an aspect of the disclosure, a method for controlling a robot includes: generating a map including information regarding a plurality of objects based on sensing information obtained through at least one sensor of the robot; generating ultrasonic waves toward each of the plurality of objects through a speaker of the robot; obtaining reflectivity information regarding the plurality of objects based on the reflected sounds reflected from each of the objects and received a microphone of the robot, and storing the reflectivity information, the reflected sounds reflected from each of the objects being at least a portion of the ultrasonic waves reflected from each of the objects; based on a user voice being received through the microphone, obtaining information on an intensity of the user voice for each of a plurality of directions; obtaining information on a plurality of candidate directions from which the user voice is received from among the plurality of directions based on the information on the intensity of the user voice for each of the plurality of directions; obtaining priority order information for the plurality of candidate directions based on a position of the robot and the stored reflectivity information; and obtaining information on a direction in which the user voice is uttered from among the plurality of candidate directions based on the priority order information.
The generating ultrasonic waves toward each of the plurality of objects may include generating ultrasonic waves at preset distance intervals with respect to a wall object among the plurality of objects, and wherein the obtaining the reflectivity information may include obtaining reflectivity information regarding the wall object based on reflected sounds reflected from the wall object and received through the microphone, the reflected sounds reflected from the wall object being at least a portion of the ultrasonic waves generated at the preset intervals reflected from the wall object.
The generating ultrasonic waves toward each of the plurality of objects may include generating ultrasonic waves from two or more directions toward an object among the plurality of objects other than the wall object, and the obtaining the reflectivity information may include obtaining reflectivity information with respect to the object by obtaining an average value of the reflected sounds reflected from the object and received through the microphone, the reflected sounds reflected from the object being at least a portion of the ultrasonic waves generated from the two or more directions reflected from the object.
The obtaining information on the plurality of candidate directions may include: identifying as the plurality of candidate directions a preset number of directions from among the plurality of directions in which the respective intensity of the user voice exceeds a predetermined threshold.
The obtaining the priority order information may include: identifying objects among the plurality of objects positioned in the plurality of candidate directions relative to the position of the robot; and identifying a priority order with respect to the plurality of candidate directions based on the respective information on the intensity of the user voice corresponding to each of the plurality of candidate directions and reflectivity information corresponding to the identified objects.
The obtaining priority order information may include: obtaining a corrected intensity of the user voice for each of the plurality of candidate directions by multiplying weight values to the respective intensity of the user voice corresponding to each of the plurality of candidate directions; and identifying a priority order based on the corrected intensities for the plurality of candidate directions, and wherein the weight values correspond to the reflectivity information corresponding to the identified objects.
According to an aspect of the disclosure, a non-transitory computer readable medium having instructions stored therein, which when executed by at least one processor cause the at least one processor to execute a method of controlling a robot, wherein the method includes: generating a map including information regarding a plurality of objects based on sensing information obtained through at least one sensor of the robot; generating ultrasonic waves toward each of the plurality of objects through a speaker of the robot; obtaining reflectivity information regarding the plurality of objects based on the reflected sounds reflected from each of the objects and received a microphone of the robot, and storing the reflectivity information, the reflected sounds reflected from each of the objects being at least a portion of the ultrasonic waves reflected from each of the objects; based on a user voice being received through the microphone, obtaining information on an intensity of the user voice for each of a plurality of directions; obtaining information on a plurality of candidate directions from which the user voice is received from among the plurality of directions based on the information on the intensity of the user voice for each of the plurality of directions; obtaining priority order information for the plurality of candidate directions based on a position of the robot and the stored reflectivity information; and obtaining information on a direction in which the user voice is uttered from among the plurality of candidate directions based on the priority order information.
With regard to the non-transitory computer readable medium, the generating ultrasonic waves toward each of the plurality of objects may include generating ultrasonic waves at preset distance intervals with respect to a wall object among the plurality of objects, and the obtaining the reflectivity information may include obtaining reflectivity information regarding the wall object based on reflected sounds reflected from the wall object and received through the microphone, the reflected sounds reflected from the wall object being at least a portion of the ultrasonic waves generated at the preset intervals reflected from the wall object.
With regard to the non-transitory computer readable medium, the generating ultrasonic waves toward each of the plurality of objects may include generating ultrasonic waves from two or more directions toward an object among the plurality of objects other than the wall object, and the obtaining the reflectivity information may include obtaining reflectivity information with respect to the object by obtaining an average value of the reflected sounds reflected from the object and received through the microphone, the reflected sounds reflected from the object being at least a portion of the ultrasonic waves generated from the two or more directions reflected from the object.
With regard to the non-transitory computer readable medium, the obtaining information on the plurality of candidate directions may include: identifying as the plurality of candidate directions a preset number of directions from among the plurality of directions in which the respective intensity of the user voice exceeds a predetermined threshold.
With regard to the non-transitory computer readable medium, the obtaining the priority order information may include: identifying objects among the plurality of objects positioned in the plurality of candidate directions relative to the position of the robot; and identifying a priority order with respect to the plurality of candidate directions based on the respective information on the intensity of the user voice corresponding to each of the plurality of candidate directions and reflectivity information corresponding to the identified objects.
The above and other aspects, features, and advantages of certain embodiments of the present disclosure will be more apparent from the following description taken in conjunction with the accompanying drawings, in which:
Various modifications may be made to the embodiments of the disclosure, and there may be various types of embodiments. Accordingly, specific embodiments will be illustrated in drawings, and described in detail in the detailed description. However, it should be noted that the above is not intended to limit the scope of the disclosure to a specific embodiment, but should be interpreted to include all modifications, equivalents or alternatives of the embodiments included in the ideas and the technical scopes disclosed herein. With respect to the description of the drawings, like reference numerals may be used to indicate like elements.
In describing the disclosure, where it is determined that the detailed description of related known technologies may unnecessarily confuse the gist of the disclosure, the detailed description thereof will be omitted.
Further, the embodiments below may be modified to various different forms, and it is to be understood that the scope of the technical spirit of the disclosure is not limited to the embodiments below. Rather, the embodiments are provided so that the disclosure will be thorough and complete, and to fully convey the technical spirit of the disclosure to those skilled in the art.
Terms used in the disclosure have been used to simply described a specific embodiment, and is not intended to limit the scope of protection. A singular expression includes a plural expression, unless otherwise specified.
In the disclosure, expressions such as “have,” “may have,” “include,” and “may include” are used to designate a presence of a corresponding characteristic (e.g., elements such as numerical value, function, operation, or component), and not to preclude a presence or a possibility of additional characteristics.
In the disclosure, expressions such as “A or B,” “at least one of A and/or B,” or “one or more of A and/or B” may include all possible combinations of items listed together. For example, “A or B,” “at least one of A and B,” or “at least one of A or B” may refer to all cases including (1) only A, (2) only B, or (3) both of A and B.
Expressions such as “1st”, “2nd”, “first” or “second” used in the disclosure may limit various elements regardless of order and/or importance, and may be used merely to distinguish one element from another element and not limit the relevant element.
When a certain element (e.g., first element) is indicated as being “(operatively or communicatively) coupled with/to” or “connected to” another element (e.g., second element), it may be understood as the certain element being directly coupled with/to the another element or as being coupled through other element (e.g., third element).
On the other hand, when a certain element (e.g., first element) is indicated as “directly coupled with/to” or “directly connected to” another element (e.g., second element), it may be understood as the other element (e.g., third element) not being present between the certain element and the another element.
The expression “configured to . . . (or set up to)” used in the disclosure may be used interchangeably with, for example, “suitable for . . . ,” “having the capacity to . . . ,” “designed to . . . ,” “adapted to . . . ,” “made to . . . ,” or “capable of . . . ” based on circumstance. The term “configured to . . . (or set up to)” may not necessarily mean “specifically designed to” in terms of hardware.
Rather, in a certain circumstance, the expression “a device configured to . . . ” may mean something that the device “may perform . . . ” together with another device or components. For example, the phrase “a processor configured to (or set up to) perform A, B, and C” may mean a dedicated processor for performing a relevant operation (e.g., embedded processor), or a generic-purpose processor (e.g., a central processing unit (CPU) or an application processor) capable of performing the relevant operations by executing one or more software programs stored in a memory device.
The term “module” or “part” in the embodiments herein perform at least one function or operation, and may be implemented with a hardware or software, or implemented with a combination of hardware and software. In addition, a plurality of “modules” or a plurality of “parts”, except for a “module” or a “part” which needs to be implemented to a specific hardware, may be integrated in at least one module and implemented as at least one processor.
Various elements and areas of the drawings have been schematically illustrated. Accordingly, the technical spirit of the disclosure is not limited by relative sizes and distances illustrated in the accompanied drawings.
Embodiments according to the disclosure will be described in detail below with reference to the accompanying drawings to aid in the understanding of those of ordinary skill in the art.
The at least one sensor 110 may obtain various information on a state of the robot 100 or a surrounding environment of the robot 100. Specifically, the at least one sensor 110 may include a LiDAR sensor and an inertial measurement unit (IMU) sensor. The LiDAR sensor may project rays (e.g., laser, near-infrared light, visible light, ultraviolet rays, etc.) to objects, and obtain sensing information for obtaining information on a distance with an object by detecting light reflected by the object. The IMU sensor may be a sensor for sensing movement of the robot 100, and may include at least one from among a geomagnetic sensor, an acceleration sensor, and a gyro sensor. However, as described above, using the LiDAR sensor for obtaining information on a distance with an object is merely one embodiment, and the information on a distance with an object may be obtained using various sensors such as a depth sensor.
Specifically, the at least one processor 160 may obtain information on distances with a plurality of objects and movement information of the robot 100 based on sensing information obtained through the LiDAR sensor and the IMU sensor. However, the above is merely one embodiment, and the information on distances with the plurality of objects and movement information of the robot 100 may be obtained based on sensing information obtained by the at least one sensor 110.
In addition, the at least one sensor 110 may include a camera for capturing an image. Specifically, the camera may obtain an image of an object by capturing a surrounding of the robot 100. The robot 100 may obtain information (e.g., a type of object, a size of object, a form of object, etc.) on the object by inputting the image of the object in a trained neural network model (e.g., an object recognition model).
The speaker 120 may be a configuration for outputting an audio signal. Specifically, the speaker 120 may output ultrasonic waves toward an object according to a control of the at least one processor 160. The speaker 120 may also generate ultrasonic waves at preset distance intervals with respect to a wall object from among the plurality of objects. In addition, the speaker 120 may generate ultrasonic waves from two or more directions toward remaining objects among the plurality of objects aside from the wall object.
The microphone 130 may be configured to receive various audio signals. Specifically, the microphone 130 may receive reflected sounds which are ultrasonic waves output through the speaker 120 reflected by objects. In addition, the microphone 130 may receive a user voice. The microphone 130 may be provided in plurality and may receive the user voice received from a plurality of directions through a plurality of microphones.
The driver 140 may cause the robot 100 to travel or otherwise move according to the control of the at least one processor 160. Traveling may include an operation of the robot 100 moving with power. According to one or more embodiments of the disclosure, traveling may include an operation of the robot 100 moving using power in random directions. Alternatively, traveling may include an operation of the robot 100 moving using power along a preset line or route.
Specifically, the driver 140 may include wheels which allow the robot 100 to travel and a wheel driving motor which rotates the wheels. Specifically, the driver 140 may move the robot 100 within a space of a home. In addition, when the direction in which the user voice is uttered is identified, the driver 140 may move the robot 100 in the direction in which the user voice is uttered.
The at least one memory 150 may store instructions or data associated with an operating system (OS) and elements of the robot 100 to control the overall operation of the elements of the robot 100. Specifically, the at least one memory 150 may include a plurality of modules for generating a map including reflectivity information and identifying the direction in which the user voice is uttered by using the generated map. For example, the at least one memory 150 may include, as shown in
Specifically, if the map including reflectivity information is generated and a function for identifying the direction in which the user voice is uttered is executed using the generated map, the robot 100 may load data for various modules which are stored in a non-transitory memory to perform various operations in a volatile memory. Here, loading may mean an operation of calling and storing data stored in the non-volatile memory in the volatile memory for the at least one processor 160 to access.
In addition, the at least one memory 150 may store information on the trained neural network model to obtain information on objects by inputting an image obtained by the camera.
The at least one memory 150 may be implemented as the non-volatile memory (e.g., a hard disk, a solid state drive (SSD), a flash memory), the volatile memory (the at least one processor 160 within the memory may be included), and the like.
The at least one processor 160 may control the robot 100 according to at least one instruction stored in the memory 150.
Specifically, the at least one processor 160 may include one or more processors. In particular, the one or more processors may include one or more from among a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), a many integrated core (MIC), a digital signal processor (DSP), a neural processing unit (NPU), a hardware accelerator, or a machine learning accelerator. The one or more processors may control one or a random combination from among other elements of the robot 100, and perform an operation associated with communication or data processing. The one or more processors may execute one or more programs or instructions stored in the memory. For example, the one or more processors may perform, by executing the one or more instructions stored in the memory, a method according to an embodiment of the disclosure.
When a method according to an embodiment of the disclosure includes a plurality of operations, the plurality of operations may be performed by one processor, or performed by a plurality of processors. That is, when a first operation, a second operation, and a third operation are performed by a method according to an embodiment, the first operation, the second operation, and the third operation may all be performed by a first processor, or the first operation and the second operation may be performed by the first processor (e.g., a generic-purpose processor) and the third operation may be performed by a second processor (e.g., an artificial intelligence dedicated processor). For example, the robot 100 may perform an operation of generating a map, and the like using the generic-purpose processor, and perform an operation of recognizing an object, an operation of recognizing the user voice, or the like using the artificial intelligence dedicated processor.
The one or more processors may be implemented as a single core processor that includes one core, or as one or more multicore processors that include a plurality of cores (e.g., a homogeneous multicore or a heterogeneous multicore). If the one or more processors are implemented as a multicore processor, each of the plurality of cores included in the multicore processor may include a memory inside the processor such as a cache memory and an on-chip memory, and a common cache shared by the plurality of cores may be included in the multicore processor. In addition, each of the plurality of cores (or a portion from among the plurality of cores) included in the multicore processor may independently read and perform a program command for implementing a method according to an embodiment of the disclosure, or read and perform a program command for implementing a method according to an embodiment of the disclosure due to a whole (or a portion) of the plurality of cores being interconnected.
When a method according to an embodiment of the disclosure includes a plurality of operations, the plurality of operations may be performed by one core from among the plurality of cores or performed by the plurality of cores included in the multicore processor. For example, when a first operation, a second operation, and a third operation are performed by a method according to an embodiment, the first operation, the second operation, and the third operation may all be performed by a first core included in the multicore processor, or the first operation and the second operation may be performed by the first core included in the multicore processor and the third operation may be performed by a second core included in the multicore processor.
In one or more embodiments of the disclosure, the at least one processor 160 may generate a map including information on the plurality of objects based on sensing information obtained through the at least one sensor 110. The plurality of objects may include goods included within the home, walls, windows, and the like. In addition, the map may include a drawing representing a plane by reducing a state of a space within the home to a certain ratio. According to embodiments of the disclosure, the map may include a drawing in which a plane structure of inside a house is reduced to a certain ratio and the same is indicated with predetermined symbols. In addition, the map may include a drawing indicating the plane structure of inside the house with lines. However, the above is not limited thereto, and the map may include positions of main objects inside the house. The map may be implemented as at least one from among a 2-dimensional grid map, a 2-dimensional linear vector map, a 3-dimensional, or a 3-dimensional map.
The at least one processor 160 may generate ultrasonic waves toward each of the plurality of objects through the speaker 120, receive sounds reflected from each of the objects through the microphone 130, obtain reflectivity information with respect to each of the plurality of objects, and store said reflectivity information in the at least one memory 150. The reflectivity information related to each of the plurality of objects may include information on a ratio of intensity of the ultrasonic waves output through the speaker 120 and intensity of reflected sounds received through the microphone. Information on reflectivity information may be obtained per object, but this is merely one embodiment, and may be obtained according to direction even if it is the same object.
Specifically, the at least one processor 160 may obtain information on reflectivity with different methods according to the types of objects.
In an embodiment, the at least one processor 160 may generate ultrasonic waves at preset distance intervals with respect to the wall object from among the plurality of objects, and obtain reflectivity information with respect to the wall object by receiving the reflected sounds reflected from the wall object through the microphone 130. In an embodiment, the at least one processor 160 may generate ultrasonic waves from two or more directions toward the remaining objects, excluding the wall object, from among the plurality of objects, receive the sounds reflected from the remaining objects based on the ultrasonic waves generated from two or more directions through the microphone 130, and obtain reflectivity information with respect to the remaining objects by obtaining an average value of received sounds reflected from the sounds generated in the two or more directions.
With the method as described above, the at least one processor 160 may store reflectivity information with respect to the plurality of objects together with the generated map.
When the user voice is received through the microphone 130, the at least one processor 160 may obtain information on an intensity of the user voice for each of the plurality of directions. The intensity of the user voice may mean sound wave energy which is transferred per unit time and unit area, and may be substituted with different terms such as, for example, and without limitation, an amplitude of the user voice, a volume of the user voice, an energy of the user voice, and the like. Further, the at least one processor 160 may obtain information on an intensity of the user voice for each of the plurality of directions for every preset interval (e.g., 1 degree) based on the robot 100.
The at least one processor 160 may obtain information on a plurality of candidate directions in which the user voice is received from among the plurality of directions based on information on the intensity of the user voice. The plurality of candidate directions may be directions in which there is a likelihood of the user voice being uttered from among the plurality of directions. Specifically, the at least one processor 160 may obtain the information on the intensity of the user voice received from the plurality of directions based on a position of the robot 100. Further, the at least one processor 160 may identify a preset number of directions (e.g., five directions) in which the intensity of the user voice is high from among the plurality of directions as the plurality of candidate directions. As used herein, an intensity may be considered “high” if it exceeds a predetermined threshold. Alternatively, an intensity may be considered “high” if it is among a subset of intensity values that are the highest from among the total set of obtained intensity values (e.g., the five highest intensity values from among a larger set of intensity values).
The at least one processor 160 may obtain priority order information with respect to the plurality of candidate directions based on a position of the robot 100 and the stored reflectivity information. The priority order information may include information on a direction in which there is a likelihood of the user voice being uttered from among the plurality of directions.
Specifically, the at least one processor 160 may identify an object positioned in the plurality of candidate directions based on the position of the robot 100. Then, the at least one processor 160 may identify a priority order with respect to the plurality of candidate directions based on the intensities of the user voice measured from each of the plurality of candidate directions and reflectivity information with respect to the identified object. The at least one processor 160 may correct the intensities of the user voice obtained from the plurality of candidate directions by multiplying weight values corresponding to the respective reflectivities of the objects positioned in the plurality of candidate directions to the intensities of the user voice obtained from the plurality of candidate directions. The weight values may be set low where reflectivities of objects positioned in the candidate directions are higher, and set high where reflectivities of objects positioned in the candidate directions are lower. Then, the at least one processor 160 may identify the priority order based on the corrected intensity of the user voice in each of the plurality of candidate directions.
The at least one processor 160 may obtain information on a direction in which the user voice is uttered from among the plurality of candidate directions based on the priority order information. Specifically, the at least one processor 160 may identify the priority order to be high as the corrected intensities of the user voice with respect to each of the plurality of candidate directions are larger, and obtain information on the direction in which the user voice is uttered by identifying the candidate direction with the highest priority order from among the plurality of candidate directions as the direction in which the user voice is uttered.
The at least one processor 160 may perform various functions based on information on the direction in which the user voice is uttered. In an embodiment, the at least one processor 160 may perform voice recognition on the user voice by performing beam forming in the direction in which the user voice is uttered. The voice recognition on the user voice may include at least one from among an operation of obtaining texts corresponding to the user voice, an operation of performing natural language understanding using texts corresponding to the user voice, and an operation of providing a response based on the result of natural language understanding or controlling the robot 100. In an embodiment, the at least one processor 160 may rotate or move the robot 100 for the robot 100 to face the direction in which the user voice is uttered.
As described above, by recognizing the direction in which the user voice is uttered using the reflectivity information, the robot 100 may provide various functions by more accurately recognizing a direction in which the user is positioned.
A method of generating a map including reflectivity information and identifying the direction in which the user voice is uttered using the generated map may be described in detail below with reference to
The map generating module 210 may generate a map which includes various information on objects based on sensing information received from the at least one sensor 110 and reflected sounds obtained through the microphone 130. Specifically, the map generating module 210 may include the topographical information obtaining module 211, the object information obtaining module 212, and the reflectivity information obtaining module 213 as shown in
The topographical information obtaining module 211 may obtain topographical information of a space within the home based on sensing information obtained through the LiDAR sensor and the IMU sensor while the robot 100 is traveling. For example, as shown in
The object information obtaining module 212 may obtain information on objects through an image obtained through the camera. Specifically, the object information obtaining module 212 may obtain information on the plurality of objects included in the space within the home by inputting the image obtained through the camera in the trained neural network model (e.g., the object recognition model). For example, as shown in
The reflectivity information obtaining module 213 may generate ultrasonic waves directed toward each of the plurality of objects through the speaker 120 after having generated the map 320 or while generating the map 320, receive the reflected sounds reflected from each of the objects through the microphone 130, obtain reflectivity information with respect to the plurality of objects, and store the same. The reflectivity information obtaining module 213 may obtain information on reflectivity in various methods according to the type of the object.
In an embodiment, the reflectivity information obtaining module 213 may generate ultrasonic waves at walls at specific distance (e.g., 2 m) intervals with respect to outermost objects (e.g., the wall object, a door object, etc.), and obtain information on reflectivity by measuring reflected volumes of the reflected sounds. For example, as shown in
In an embodiment, the reflectivity information obtaining module 213 may generate ultrasonic waves from two or more directions directed towards the remaining objects excluding the outermost objects (e.g., home appliances, furniture, etc.) from among the plurality of objects, receive the reflected sounds reflected from the two or more directions toward the remaining objects, and obtain reflectivity information with respect to the remaining objects by obtaining the average value of the received sounds reflected from the two or more directions toward each remaining object. For example, as shown in
The reflectivity information obtaining module 213 may store the reflectivity information obtained in the method as described above in the metadata form of the map 320. In an example, the reflectivity information obtaining module 213 may store the reflectivity information with respect to the first object 331 as 0.2, store the reflectivity information with respect to the second object 332 as 0.7, store the reflectivity information with respect to the third object 333 as 0.4, and store the reflectivity information with respect to the wall object as 0.8 as shown in
The recognition module 220 may identify an utterance position of the user voice using the map obtained by the map generating module 210, and recognize the user voice by using the identified utterance position. The recognition module 220 may include, as shown in
The user voice obtaining module 221 may obtain the user voice through the microphone 130. The user voice may be a voice uttered by the user, and may be differentiated from noise and the like. Specifically, the user voice obtaining module 221 may identify whether the user voice is included in an audio signal by inputting the obtained audio signal into a trained neural network model. For example, the user voice obtaining module 221 may obtain the user voice of “Bot nano, can you look at me?” uttered by a user 10 as shown in
The preprocessing module 222 may perform a preprocessing operation on the obtained user voice. Specifically, the preprocessing module 222 may obtain an audio signal for a portion of bandwidths from among the user voice obtained through a band pass filter. Then, the preprocessing module 222 may perform the preprocessing operation of removing noise, echo, or the like.
The intensity measuring module 223 may measure an intensity of the user voice obtained from the plurality of directions. Then, the intensity measuring module 223 may perform sampling of the user voice obtained from the plurality of directions. Specifically, the intensity measuring module 223 may measure the intensity of the user voice obtained from the plurality of directions by using a steered response power phase transform (SRP-PHAT) algorithm. The intensity measuring module 223 may measure the intensity of the user voice from the plurality of directions (e.g., 360 directions) for every preset interval (e.g., 1 degree) based on the robot 100.
The candidate direction obtaining module 224 may identify a preset number of directions in which the intensity of the user voice is considered to be high from among the plurality of directions as the plurality of candidate directions. For example, the candidate direction obtaining module 224 may obtain, as shown in
The priority order information obtaining module 225 may obtain priority order information with respect to the plurality of candidate directions based on the position of the robot 100 and the stored reflectivity information.
Specifically, the priority order information obtaining module 225 may identify objects positioned in the plurality of candidate directions based on the position of the robot. For example, the priority order information obtaining module 225 may identify a first object to a fifth object which are positioned in a first candidate direction to a fifth candidate direction based on the position of the robot 100. Objects may not be present in a portion of the directions from among the plurality of candidate directions.
Further, the priority order information obtaining module 225 may identify weight values corresponding to the reflectivities with respect to objects that are positioned in the plurality of candidate directions. The weight values may be set low where reflectivities of the objects positioned in the candidate directions are higher, and may be set high where reflectivities of the objects positioned in the candidate directions are lower. For example, the weight value and reflectivity may have a relationship as in Equation 1 below.
weight value=1−reflectivity Equation 1:
However, Equation 1 described above is merely one embodiment, and the relationship between the weight value and reflectivity may be represented in an equation in which the weight value and reflectivity have a different inversely proportional relationship.
The priority order information obtaining module 225 may correct the intensities of the user voice obtained from the plurality of candidate directions by multiplying the weight values corresponding to the reflectivities with respect to the objects positioned in the plurality of candidate directions to the intensities of the user voice obtained from the plurality of candidate directions. For example, the priority order information obtaining module 225 may correct the intensity of the user voice obtained from the first candidate direction to 0.36 by multiplying the intensity information by 0.3, which is the weight value corresponding to the reflectivity with respect to the first object positioned in the first candidate direction from among the plurality of candidate directions, and correct the intensity of the user voice obtained from the third candidate direction to 0.56 by multiplying the intensity information by 0.7, which is the weight value corresponding to the reflectivity with respect to the third object positioned in the third candidate direction from among the plurality of candidate directions.
The priority order information obtaining module 225 may identify the priority order information with respect to the plurality of candidate directions based on the corrected intensities of the user voice with respect to each of the plurality of candidate directions. That is, the priority order information obtaining module 225 may identify the priority order of a given candidate direction to be high as the corrected intensities of the user voice with respect to each of the plurality of candidate directions are higher, and identify the priority order of a given candidate direction to be low as the intensities of the user voice are lower. For example, the priority order information obtaining module 225 may identify, as shown in
The direction recognition module 226 may recognize the direction in which the user voice is uttered based on the priority order information. Specifically, the direction recognition module 226 may recognize the candidate direction of the first priority order as the direction in which the user voice is uttered. For example, as shown in
The user voice recognition module 227 may perform voice recognition on the user voice. The user voice recognition module 227 may perform voice recognition on the (additional) user voice by performing beam forming in the direction in which the user voice is uttered. The user voice recognition module 227 may convert the additional user voice into text, perform a natural language understanding of the converted text, and provide a response to the user 10 based on the result of natural language understanding or control the robot 100 according to the user voice. For example, the user voice recognition module 227 may recognize the user voice of “Bot, can you look at me?” which is the user voice and rotate the robot 100 in the direction in which the user voice is uttered.
First, the robot 100 may obtain sensing information through the at least one sensor (S605). Specifically, if an event for map creating (e.g., an event in which a cleaning robot is initially installed, and a user command for map creating is received, etc.) occurs, the robot 100 may travel in the space within the home through the impetus of the driver 140. Then, the robot 100 may obtain sensing information for map generating by using the at least one sensor 110 while traveling in the space within the home. In an example, the robot 100 may obtain a sensing value for obtaining distance information between the robot 100 and an object using the LiDAR sensor. In an example, the robot 100 may obtain a sensing value for obtaining movement information of the robot 100 by using the IMU sensor. In an example, the robot 100 may obtain an image for obtaining information on an object at a surrounding of the robot 100 by using the camera.
The robot 100 may generate a map including information on a plurality of objects based on the sensing information (S610). Specifically, the robot 100 may obtain topographical information of the space within the home using the IMU sensor and the LiDAR sensor, and obtain information on the plurality of objects included in the space within the home by inputting the image obtained through the camera in the trained neural network model. Various information such as, for example, and without limitation, type information of each of the plurality of objects, size information of each of the plurality of objects, shape information of each of the plurality of objects, position information of each of the plurality of objects, and the like may be included in the information on the plurality of objects. Further, the robot 100 may store information on the objects together with a map including information on the space within the home in the meta data form.
The robot 100 may generate ultrasonic waves at each of the plurality of objects (S615). The robot 100 may generate the ultrasonic waves using different methods based on types of the plurality of objects.
In an example, the robot 100 may generate the ultrasonic waves at a wall at specific distance intervals (e.g., 3 m) with respect to the wall object from among the plurality of objects. With respect to the outermost objects such as windows or doors other than the wall object, ultrasonic waves may be generated at specific distance intervals as described above. In an example, the robot 100 may determine two or more number of samples (e.g., five) with an angle range to which the robot 100 may view with respect to the remaining objects (e.g., furniture, goods within the home, etc.) excluding the outermost objects from among the plurality of objects, and generate the ultrasonic waves from the two or more directions based on the object by equally dividing the angle according to the number of samples.
The robot 100 may generate the ultrasonic waves at each of the plurality of objects while traveling the space within the home to generate a map, but this is merely one embodiment, and the robot 100 may generate the ultrasonic waves at each of the plurality of objects after having generated the map.
The robot 100 may obtain reflectivity information with respect to objects by receiving reflected sounds with respect to each of the plurality of objects, and store the obtained reflectivity information (S620). Specifically, the robot 100 may obtain information on the intensities of the reflected sounds by receiving reflected sounds with respect to each of the plurality of objects. Then, the robot 100 may obtain information on the reflectivity of each of the plurality of objects based on a ratio of the intensities of the output ultrasonic waves and the received intensities of the reflected sounds.
The robot 100 may obtain information on the reflectivity for every specific distance interval by receiving the reflected sounds with respect to the ultrasonic waves generated at specific distance intervals for the outermost objects. The robot 100 may store the information on reflectivity for every specific distance interval, and store the representative value (e.g., the average value, the modal value, etc.) of the reflectivity obtained at every specific distance interval as information on the reflectivity of the objects. In addition, the robot 100 may obtain information on the reflectivities according to at least two directions by receiving the reflected sounds obtained from the at least two directions for the remaining objects excluding the outermost objects. The robot 100 may store information on the reflectivities for every at least two directions, and store the representative value (e.g., the average value, the modal value, etc.) of the reflectivities obtained at every at least two directions as information on the reflectivity of the objects.
The robot 100 may store information on reflectivity of objects together with the map. As described above, the robot 100 may store information on reflectivity of objects on the map in the meta data form.
The robot 100 may generate a map including the reflectivity information with respect to the objects through operation S605 to operation S620 and store in the robot 100. The robot 100 may not only store the map in the robot 100, but also transmit the map to an external server to store the map in the external server. In addition, operation S605 to operation S620 may be performed by another electronic device other than the robot 100.
After operation S620 (i.e., after the map is generated), the robot 100 may identify whether the user voice is received through the microphone 130 (S625). Specifically, the robot 100 may identify whether the user voice is received using the trained neural network model to identify whether the voice is that of a person, but this is merely one embodiment, and may identify whether the user voice is received in another method.
If the user voice is identified as received (S625—Y), the robot 100 may preprocess the user voice (S630). Specifically, the robot 100 may obtain the user voice of a specific frequency range using a preset filter (e.g., the band pass filter, etc.), and perform preprocessing of the user voice such as noise removal.
The robot 100 may obtain information on the intensities of the user voice for each of the plurality of directions (S635). Specifically, the robot 100 may obtain the information on the intensities of the user voice for each of the plurality of directions at preset intervals (e.g., 1 degree) based on the robot 100. In an example, the robot 100 may obtain the information on the intensities of the user voice with respect to the first direction to three hundred sixty directions.
The robot 100 may obtain information on the plurality of candidate directions in which the user voice is received (S640). Specifically, the robot 100 may identify a preset number of directions (e.g., five) in which the intensity of the user voice is high based on the information on the intensities of the user voice obtained for each of the plurality of directions as the plurality of candidate directions. The robot 100 may identify the largest intensity information from among intensity information of the user voice received from an adjacent direction (e.g., a direction within a 10 degree range) from among the plurality of directions as the candidate direction, and exclude the remaining adjacent directions from the candidate directions even if the intensity of the user voice is larger than in other directions.
The robot 100 may obtain the priority order information with respect to the plurality of candidate directions (S645). Specifically, the robot 100 may obtain the priority order information with respect to the plurality of candidate directions based on the corrected intensities of the user voice after having corrected the intensities of the user voice received from the candidate directions based on the position of the robot 100 and the reflectivity information.
Specifically, the robot 100 may identify objects positioned in the plurality of candidate directions based on the position of the robot. For example, the robot 100 may identify the first object positioned at the first candidate direction based on the position of the robot 100, and identify a second object positioned at a second candidate direction.
Then, the robot 100 may identify weight values corresponding to the reflectivities with respect to the objects positioned in the plurality of candidate directions. The weight values may be set low where reflectivities with respect to the objects positioned in the candidate directions are higher, and set high where reflectivities with respect to the objects positioned in the candidate directions are lower. That is, the weight value may be set low because of the likelihood of it being reflected sounds of the user voice reflecting from objects rather than the actual user voice as the reflectivities with respect to the objects positioned in the candidate directions are higher. A value of the weight value may be a value between 0 and 1, but the disclosure is not limited thereto.
The robot 100 may correct the intensities of the user voice obtained from the plurality of candidate directions by multiplying weight values corresponding to the reflectivities with respect to the objects positioned in the plurality of candidate directions to the intensities of the user voice obtained from the plurality of candidate directions. That is, the robot 100 may multiply a weight value which is set low to an intensity of the user voice received from a direction in which an object with high reflectivity is positioned, and multiply a weight value which is set high to an intensity of the user voice received from a direction in which an object with low reflectivity is positioned.
Then, the robot 100 may identify the priority order information with respect to the plurality of candidate directions based on the corrected intensities of the user voice for each of the plurality of candidate directions. That is, the robot 100 may identify the priority order to be high as the corrected intensities of the user voice with respect to each of the plurality of candidate directions are larger, and identify the priority order to be low as the intensities of the user voice are smaller.
The robot 100 may obtain information on the direction in which the user voice is uttered based on the priority order information (S650). Specifically, the robot 100 may identify the candidate direction with the highest priority order as the direction in which the user voice is uttered. Alternatively, the robot 100 may identify a candidate area in which the user is positioned by capturing the candidate directions with the camera in the order of high priority order, and identify the identified candidate area as the direction in which the user voice is uttered.
Then, the robot 100 may recognize the user voice (S655). Specifically, the robot 100 may drive the robot 100 to face the direction in which the user is positioned based on the direction in which the user voice is uttered. In addition, the robot 100 may perform voice recognition on the (additional) user voice by performing beam forming in the direction in which the user voice is uttered.
A function associated with artificial intelligence according to the disclosure may be operated through the processor and the memory of the robot 100.
The processor may be formed of one or a plurality of processors. The one or plurality of processors may include at least one from among the CPU, the GPU, and the NPU, but is not limited to the example of the above-described processor.
The CPU may be a generic-purpose processor which can perform not only general operations but also artificial intelligence operations, and may effectively execute a complex program through a multi-layer cache structure. The CPU may be advantageous is a serial processing method which allows for an organic connection between a previous calculation result and a following calculation result through a consecutive calculation. The generic-purpose processor is not limited to the above-described example except for when specified as the above-described CPU.
The GPU may be a processor for mass operations such as a floating-point operation used in graphic operations, and perform large-scale operations in parallel by integrating a large number of cores. Specifically, the GPU may be advantageous in a parallel processing method such as a convolution operation compared to the CPU. In addition, the GPU may be used as a co-processor for supplementing a function of the CPU. The processor for mass operations is not limited to the above-described examples except for when specified as the above-described GPU.
The NPU may be a processor specialized in artificial intelligence operations which uses an artificial neural network, and may implement each layer which forms the artificial neural network as hardware (e.g., silicon). Because the NPU may be designed specifically according to the requirements of a company, it may have a lower degree of freedom compared to the CPU or the GPU, but may effectively process an artificial intelligence operation required by the company. As a processor specialized in artificial intelligence operations, the NPU may be implemented in various forms such as, for example, and without limitation, a tensor processing unit (TPU), an intelligence processing unit (IPU), a vision processing unit (VPU), and the like. The artificial intelligence processor is not limited to the above-described examples except for when specified as the above-described NPU.
In addition, the one or the plurality of processors may be implemented as a system on chip (SoC). The SoC may further include a memory in addition to the one or the plurality of processors, and a network interface such as a Bus for data communication between the processor and the memory.
If the plurality of processors is included in the system on chip (SoC) included in the robot, the robot may perform operations (e.g., operations associated with learning of an artificial intelligence model or inference) associated with artificial intelligence using a portion of the processors from among the plurality of processors. For example, the robot may perform operations associated with artificial intelligence using at least one from among the GPU, the NPU, the VPU, the TPU, and the hardware accelerator which are specialized in artificial intelligence operations such as the convolution operation and a matrix multiplication operation from among the plurality of processors. However, the above is merely one embodiment, and operations associated with artificial intelligence may be processed using the generic-purpose processor such as the CPU.
In addition, the robot may perform an operation with respect to a function associated with artificial intelligence by using multi-cores (e.g., a dual core, a quad core, etc.) included in one processor. Specifically, the robot may perform artificial intelligence operations such as, for example, and without limitation, convolution operations, matrix multiplication operations, and the like in parallel using the multi-cores included in the processor.
The one or the plurality of processors may control to process input data according to a pre-defined operation rule or an artificial intelligence model stored in the memory. The pre-defined operation rule or the artificial intelligence model is characterized by being created through learning.
Here, the being created through learning may refer to the pre-defined operation rule or the artificial intelligence model of a desired characteristic being formed by applying a learning algorithm to multiple learning data. The learning may be carried out in the machine itself in which the artificial intelligence according to the disclosure is performed, or carried out through a separate server/system.
The artificial intelligence model may be formed with a plurality of neural network layers. The at least one layer may have at least one weight value (weight values), and perform an operation of a layer through an operation result of a previous layer and at least one defined operation. Examples of the neural network may include a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), a Restricted Boltzmann Machine (RBM), a Deep Belief Network (DBN), a Bidirectional Recurrent Deep Neural Network (BRDNN), a Deep-Q Networks, and a Transformer, and the neural network of the disclosure is not limited to the above-described examples, unless otherwise specified.
The learning algorithm may be a method for training a predetermined target machine (e.g., robot) to make decisions or predictions on its own using the plurality of learning data. Examples of the learning algorithm may include a supervised learning, an unsupervised learning, a semi-supervised learning, or a reinforcement learning, and the learning algorithm of the disclosure is not limited to the above-described examples unless otherwise specified.
A method according to the various embodiments of the disclosure may be provided included a computer program product. The computer program product may be exchanged between a seller and a purchaser as a commodity. The computer program product may be distributed in a form of a machine-readable storage medium (e.g., a compact disc read only memory (CD-ROM)), or distributed online (e.g., downloaded or uploaded) through an application store (e.g., PLAYSTORE™) or directly between two user devices (e.g., smartphones). In the case of online distribution, at least a portion of the computer program product (e.g., downloadable app) may be stored at least temporarily in the storage medium such as a server of a manufacturer, a server of an application store, or a memory of a relay server, or temporarily generated.
The method according to various embodiments of the disclosure may be implemented with software including instructions stored in a machine-readable storage media (e.g., computer). The machine may call the stored instruction from the storage medium, and as a device operable according to the called instruction, may include the robot according to the embodiments described above.
The machine-readable storage medium may be provided in the form of a non-transitory storage medium. Herein, the ‘non-transitory storage medium’ merely means that it is a tangible device, and does not include a signal (e.g., electromagnetic waves), and the term does not differentiate data being semi-permanently stored or being temporarily stored in the storage medium. In an example, the ‘non-transitory storage medium’ may include a buffer in which data is temporarily stored.
Based on the instruction being executed by the processor, the processor may directly or using other elements under the control of the processor perform a function corresponding to the instruction. The instruction may include a code generated by a compiler or executed by an interpreter.
While the disclosure has been illustrated and described with reference to various example embodiments thereof, it will be understood that the various example embodiments are intended to be illustrative, not limiting. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the true spirit and full scope of the disclosure, including the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0114914 | Aug 2023 | KR | national |
This application is a by-pass continuation of International Application No. PCT/KR2024/007917, filed on Jun. 10, 2024, which is based on and claims to Korean Patent Application No. 10-2023-0114914, filed on Aug. 30, 2023, in the Korean Patent Office, the disclosures of all of which are incorporated by reference herein in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/KR2024/007917 | Jun 2024 | WO |
Child | 18901786 | US |