The present disclosure relates to an information processing apparatus, an information processing method, and a program.
There is a known technology that displays an image rendered by a rendering device using augment reality (AR) or virtual reality (VR) on, for example, a head mounted display (HMD) worn by a user. An example of this technology is disclosed in Patent Literature 1 below.
For example, in an AR system that superimposes content in a real environment and presents the content to a user, information regarding the real environment is acquired in order to arrange the content at an appropriate position. Furthermore, in a VR system that displays a virtual space to the user, information regarding the real environment is acquired in order to set an area (play area) safe for the user to move when the user moves in the virtual space.
The system acquires information regarding a real environment such as an obstacle from, for example, a sensor provided in a head mounted display (HMD). At this time, for example, when the sensor detects the user that appears in the camera, there is a possibility that the system erroneously detects the user as an obstacle.
When the user is erroneously detected as an obstacle, an area where content can be arranged may be limited or a play area narrower than an actual play area may be set.
Therefore, the present disclosure provides a system capable of more accurately acquiring information regarding the real environment around the user.
Note that the above problem or object is merely one of a plurality of problems or objects that can be solved or achieved by a plurality of embodiments disclosed in the present specification.
According to the present disclosure, an information processing apparatus is provided. The information processing apparatus includes a control unit. A control unit estimates a person region including a user in distance information generated by a distance measuring device provided in a device used by the user, the person region being estimated based on a user posture estimated using a sensor provided in the device. The control unit updates environment information around the user based on the person region and the distance information.
Hereinafter, an embodiment of the present disclosure will be described in detail with reference to the accompanying drawings. Note that, in the present specification and the drawings, components having substantially the same functional configuration are denoted by the same reference signs to omit redundant description.
Furthermore, in the present specification and the drawings, specific values may be indicated and described, but the values are merely examples, and other values may be applied. In addition, in the present specification, the following references may be used in the description.
In the description below, one or more embodiments (including examples and modifications) may be implemented independently. On the other hand, at least some of the plurality of embodiments described below may be appropriately combined with at least some of other embodiments. The plurality of embodiments may include novel features different from each other. Therefore, the plurality of embodiments may contribute to solving different objects or problems, and may exhibit different effects.
The information processing apparatus 100 and the terminal device 200 can communicate with each other via various wired or wireless networks. Note that, as a communication system used in the network, any system can be applied regardless of wired or wireless (e.g., WiFi (registered trademark) and Bluetooth (registered trademark)).
Furthermore, the number of the information processing apparatuses 100 and the number of the terminal devices 200 included in the information processing system 1 are not limited to the number illustrated in
The terminal device 200 is, for example, a wearable device (eyewear device) such as an eyeglass HMD worn on the head by a user U.
Note that the eyewear device applicable as the terminal device 200 may be a so-called see-through type head mounted display (augmented reality (AR) glasses) that transmits an image of the real space, or may be a goggle type (virtual reality (VR) goggles) that does not transmit an image of the real space.
Furthermore, in the present disclosure, the terminal device 200 is not limited to the HMD, and may be, for example, a tablet, a smartphone, or the like held by the user U.
The information processing apparatus 100 integrally controls the operation of the terminal device 200. The information processing apparatus 100 is realized, for example, by a processing circuit such as a central processing unit (CPU) or a graphics processing unit (GPU). Note that a detailed configuration of the information processing apparatus 100 according to the present disclosure will be described later.
Here, in recent years, many devices that perform processing according to movement of the user U have appeared. For example, there is a game in which a character displayed on a screen is moved in synchronization with user's movement.
When the user always performs operation as in this game, the user becomes too immersed in the operation to notice the surrounding environment. As a result, a problem that the user collides with a surrounding object (obstacle) may occur. In particular, in VR in which the user plays while wearing the HMD, the user may not be able to confirm the surrounding environment at all. Thus, there is a higher risk of colliding with a real object.
Therefore, in order to ensure physical safety of the user U, the information processing apparatus 100 controls the HMD to identify a safe play area (allowable area) that does not come into contact with a real object, so that the user U moves in the safe play area.
For example, in
In a conventional information processing system, it is difficult to automatically set the play area according to the actual surrounding environment without an input by the user U. Therefore, in the conventional information processing system, the play area is often manually set by the user U.
For example, the user U designates the play area by drawing a boundary line using a device (not illustrated) such as a game controller. Alternatively, the information processing system detects a position of the user U and sets a predetermined range within a radius of several meters around the user U as the play area.
When the user U designates the play area, a labor of the user U increases. For example, it takes longer time until the user can start the game. Furthermore, when the conventional information processing system sets a predetermined range as the play area according to the position of the user U, an obstacle is included in the predetermined range, and the user U may collide with the obstacle. Furthermore, in this case, even when there is an area with no obstacle outside the predetermined range, the conventional information processing system cannot set the area as the play area, and a movable range of the user U may be narrowed.
Accordingly, it is expected to set a more appropriate play area while reducing the labor of the user U.
Therefore, for example, it is assumed an information processing system that generates information on a three-dimensional space of the surrounding environment of the user U (environment information) and sets the play area. Here, the environment information expresses an object present in the three-dimensional space with a plurality of planes or voxels (grids). Examples of the environment information are an occupancy grid map and a 3D mesh.
As illustrated in a middle diagram of
An outline of the occupancy map will be described. The occupancy map is a known technology for 3D expression of the environment. In the occupancy map, the surrounding environment is expressed as a plurality of voxels arranged in a 3D grid in a three-dimensional space. Each of the plurality of voxels indicates occupancy/non-occupancy of the object by holding one of the following three states.
A method for generating the occupancy map is disclosed in, for example, Reference [1]. For example, the information processing system estimates a presence probability of the object for each voxel from time-series distance measurement information (distance information DM01 described above), and determines the state of each voxel.
Next, the information processing system sets the play area using the environment information OM01 generated. As illustrated in a lower diagram of
In this manner, the information processing system can set the play area PA01 in which the user U can safely move by acquiring the environment information OM01 around the user U.
Note that the information processing system can use the environment information OM01 for purposes other than the setting of the play area PA01. For example, the information processing system can use the environment information OM01 to set a movement route and a presentation position of an AI character (content) to be presented to the user U. Similarly to the user U, the information processing system moves the AI character while avoiding an obstacle. Therefore, the information processing system uses the environment information OM01 for calculating a movement path of the AI character.
Here, there is a possibility that the user U is included in the distance information acquired by the information processing system. For example, as illustrated in an upper diagram of
In this case, the information processing system sets a plane not including the user U as the play area. When the information processing system detects the user U as an obstacle as described above, an accuracy of the environment information decreases, and there is a possibility that the play area cannot be set properly.
Therefore, the information processing system 1 according to the present disclosure estimates a person region including the user U in the distance information based on a posture of the user U. The information processing system 1 updates the environment information around the user U based on the estimated person region and the distance information.
As illustrated in an upper diagram of
The information processing system 1 sets a person region reliability value with respect to the distance information included in the person regions R01 and R02. For example, when the distance information is a depth map, the information processing system 1 assigns the person region reliability value to pixels included in the person regions R01 and R02. The person region reliability value is, for example, a value indicating that the distance information is a person (user U). A larger person region reliability value increases a possibility that the distance information is a distance to the user U.
For example, the information processing system 1 sets a different person region reliability value to each of the regions R01 and R02. The information processing system 1 sets the person region reliability value such that the person region reliability value of the region R02 closer to the user U, in other words, the HMD 200, is larger than the person region reliability value of the region R01. Details of the setting of the person region reliability value will be described later.
The information processing system 1 generates or updates the environment information according to the set person region reliability value. Specifically, the information processing system 1 updates the environment information such that the distance information (pixels of the depth map) having a larger person region reliability value is not reflected in the environment information (voxels of the occupancy map). For example, when the person region reliability value is “1”, in other words, when a voxel corresponding to a pixel having the highest possibility of being a person will be updated, the information processing system 1 performs the update without using the pixel value (distance measurement value). Details of the update of the environment information using the person region reliability value will be described later.
In this way, by generating or updating the environment information according to the person region reliability value, the information processing system 1 can further reduce erroneous detection of the user U. Therefore, as illustrated in a lower diagram of
The communication unit 210 transmits and receives information to and from another device. For example, the communication unit 210 transmits a video reproduction request and a sensing result of the sensor unit 220 to the information processing apparatus 100 according to the control by the control unit 250. Furthermore, the communication unit 210 receives a video to be reproduced from the information processing apparatus 100.
The sensor unit 220 may include, for example, a camera (image sensor), a depth sensor, a microphone, an acceleration sensor, a gyroscope, a geomagnetic sensor, and a global positioning system (GPS) receiver. Furthermore, the sensor unit 220 may include a speed sensor, an acceleration sensor, an angular velocity sensor (gyro sensor), and an inertial measurement unit (IMU) that integrates the speed sensor, the acceleration sensor, and the angular velocity sensor.
For example, the sensor unit 220 senses a position of the terminal device 200 in the real space (or position of the user U who uses the terminal device 200), orientation and attitude of the terminal device 200, and acceleration. Furthermore, the sensor unit 220 senses depth information around the terminal device 200. Note that, when the sensor unit 220 includes a distance measuring device that senses the depth information, the distance measuring device may be a stereo camera, or a time of flight (ToF) distance image sensor.
The display unit 230 displays an image according to the control by the control unit 250. For example, the display unit 230 may include a right-eye display unit and a left-eye display unit (not illustrated). In this case, the right-eye display unit projects an image using at least a partial region of a right-eye lens (not illustrated) included in the terminal device 200 as a projection surface. The left-eye display unit projects an image using at least a partial region of a left-eye lens (not illustrated) included in the terminal device 200 as the projection surface.
Alternatively, when the terminal device 200 has a goggle-type lens, the display unit 230 may project a video using at least a partial region of the goggle-type lens as the projection surface. Note that the left eye lens and the right eye lens (or goggle-type lens) may be formed of, for example, a transparent material such as resin or glass.
Alternatively, the display unit 230 may be configured as a non-transmissive display device. For example, the display unit 230 may include a liquid crystal display (LCD) or an organic light emitting diode (OLED). Note that, in this case, an image in front of the user U captured by the sensor unit 220 (camera) may be sequentially displayed on the display unit 230. As a result, the user U can visually recognize a scenery in front of the user U through the video displayed on the display unit 230.
The input unit 240 may include a touch panel, a button, a lever, a switch, and the like. The input unit 240 receives various inputs by the user U. For example, when the AI character is arranged in the virtual space, the input unit 240 may receive an input by the user U for changing an arrangement position of the AI character.
The control unit 250 integrally controls the operation of the terminal device 200 using, for example, a CPU, a graphics processing unit (GPU), and a RAM built in the terminal device 200. For example, the control unit 250 causes the display unit 230 to display a video received from the information processing apparatus 100.
As an example, the terminal device 200 receives a video. In this case, the control unit 250 causes the display unit 230 to display a video portion, in the video, corresponding to the information on the position and attitude of the terminal device 200 (or user U, etc.) sensed by the sensor unit 220.
Furthermore, when the display unit 230 includes the right-eye display unit and the left-eye display unit (not illustrated), the control unit 250 generates a right-eye image and a left-eye image based on the video received from the information processing apparatus 100. Then, the control unit 250 displays the right-eye image on the right-eye display unit and displays the left-eye image on the left-eye display unit. As a result, the display unit 230 can cause the user U to view a stereoscopic video.
Furthermore, the control unit 250 may perform various recognition processes based on a sensing result of the sensor unit 220. For example, the control unit 250 may recognize, based on the sensing result, motion (e.g., user U's gesture and movement) by the user U wearing the terminal device 200.
The communication unit 110 transmits and receives information to and from another device. For example, the communication unit 110 transmits a video to be reproduced to the information processing apparatus 100 according to the control by the control unit 130. Furthermore, the communication unit 110 receives a video reproduction request and a sensing result from the terminal device 200.
The storage unit 120 is realized by, for example, a semiconductor memory element such as a random access memory (RAM), a read only memory (ROM), or a flash memory, or a storage device such as a hard disk or an optical disk.
The control unit 130 integrally controls the operation of the information processing apparatus 100 using, for example, a CPU, a graphics processing unit (GPU), and a RAM, provided in the information processing apparatus 100. For example, the control unit 130 is implemented by a processor executing various programs stored in the storage device inside the information processing apparatus 100 using a random access memory (RAM) or the like as a work area. Note that the control unit 130 may be realized by an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA). Any of the CPU, the MPU, the ASIC, and the FPGA can be regarded as a controller.
As illustrated in
The pose estimation unit 131 estimates an attitude (pose) of the terminal device 200 based on a sensing result acquired by the sensor unit 220 of the terminal device 200. For example, the pose estimation unit 131 acquires a measurement result (hereinafter also referred to as position and attitude information.) of the IMU, which is an example of the sensor unit 220, and a photographing result of the camera (hereinafter also referred to as a camera image).
The pose estimation unit 131 estimates a self position/attitude (hereinafter also referred to as a camera pose) and a gravity direction of the terminal device 200 (or user U) based on the position and attitude information and the camera image acquired. The pose estimation unit 131 outputs the estimated camera pose and gravity direction to the occupancy map generation unit 132.
The occupancy map generation unit 132 generates or updates the occupancy map based on the camera pose, the gravity direction, and the distance information. As described above, the occupancy map generation unit 132 acquires the camera pose and the gravity direction from the pose estimation unit 131. For example, the occupancy map generation unit 132 acquires a depth map as the distance information from the terminal device 200.
As illustrated in
The estimation processing unit 1321 estimates the person region in the depth map based on the camera pose, the gravity direction, and the distance information. In addition, the estimation processing unit 1321 assigns a person region reliability value c to each pixel of the depth map corresponding to the estimated person region.
First, estimation of the person region by the estimation processing unit 1321 will be described.
It is assumed that the estimation processing unit 1321 acquires the depth map as illustrated in
Here, as described above, as the face direction of the user U, in other words, the distance measurement direction of the distance measuring device, is further directed downward, the user U is more likely included in the range-finding area. This point will be described with reference to
As illustrated in a left diagram of
The description returns to
Therefore, the estimation processing unit 1321 according to the embodiment of the present disclosure increases a height r of the person region as the downward angle θ decreases.
As illustrated in
For example, the estimation processing unit 1321 determines the length r of the person region by using Expression (1) below.
Note that rmax is the maximum value of the length r, and is a value that can be changed according to the size of the depth map, the distance measurement direction L, and the like. θmax and θmin are parameters changed according to a value of the person region reliability value c to be described later. The estimation processing unit 1321 can estimate a plurality of person regions having different lengths r according to the person region reliability value c by changing values of θmax and θmin according to the person region reliability value c.
Next, a width w of the person region will be described with reference to
Generally, a width of a person (user U) is narrower than the ranging range of the distance measuring device. Therefore, as illustrated in
The estimation processing unit 1321 changes the width w according to the person region reliability value c to be described later. Thus, the estimation processing unit 1321 can estimate the plurality of person regions F having different widths w according to the person region reliability value c.
The estimation processing unit 1321 estimates the person region F having the length r and the width w in the depth map, and assigns the person region reliability value c to the pixel included in the person region F.
As described above, the estimation processing unit 1321 estimates the plurality of person regions F. The estimation processing unit 1321 sets different person region reliability values c for the plurality of person regions F.
As illustrated in
The value of the length r changes according to the downward angles θ, rmax, θmax, and θmin. Here, the downward angle θ is uniquely determined when the depth map is generated. In other words, the distance measuring device performs distance measurement at a predetermined downward angle θ to generate a depth map. Therefore, the length r set for a predetermined depth map is a value corresponding to rmax, θmax, and θmin. In other words, as illustrated in
In
In
The estimation processing unit 1321 refers to the person region reliability value table illustrated in
As illustrated in
As illustrated in
The estimation processing unit 1321 outputs the generated depth map with person region reliability value to the integrated processing unit 1322.
The integrated processing unit 1322 generates the occupancy map based on the camera pose and the depth map with person region reliability value. As described above, the occupancy map is a known technology generated by a method disclosed in, for example, Reference [1].
The integrated processing unit 1322 updates the occupancy map each time by changing the occupancy probability based on observation of a depth point for each voxel of the occupancy map. At this time, the integrated processing unit 1322 changes (varies) the occupancy probability according to the person region reliability value c. For example, to change the occupancy probability, the integrated processing unit 1322 generates an occupancy map in which erroneous detection of the user U is further reduced by reducing an influence of a voxel corresponding to a pixel having a high person region reliability value c.
For example, as disclosed in Reference [1] described above, when there is depth observation Z1:t from time 1 to time t, the occupancy probability P (n Z1:t) of a voxel n is calculated based on Expression (2) below.
In addition, this Expression (2) can be written as Expressions (3) and (4) below.
Here, the integrated processing unit 1322 according to the embodiment of the present disclosure changes Expression (3) to Expression (5) below to generate an occupancy map 1 reflecting the person region reliability value c.
Here, c is the person region reliability value. In other words, as illustrated in Expression (5), as the person region reliability value c is closer to “1”, the distance information is less likely to be reflected in the occupancy map.
The integrated processing unit 1322 outputs the generated occupancy map to the area estimation unit 133.
The area estimation unit 133 estimates the play area in which the user U can safely move based on the occupancy map generated by the integrated processing unit 1322, the gravity direction, the position of the user U, and the like. For example, the area estimation unit 133 estimates a floor plane from the occupancy map, and sets the floor plane where the user U is located as a play area.
As illustrated in
The information processing apparatus 100 executes an occupancy map generation process using the estimated camera pose and gravity direction and the distance information acquired from the terminal device 200 (Step S102) to generate an occypancy map.
The information processing apparatus 100 performs a depth map person region estimation process using the camera pose, the gravity direction, and the distance information (depth map) (Step S201) to generate a depth map with person region reliability value.
For example, the information processing apparatus 100 estimates at least one person region F using the camera pose and the gravity direction, and sets the person region reliability value c corresponding to the person region F to a pixel in the person region F. The information processing apparatus 100 sets the person region reliability value c such that the person region reliability value c increases as the distance measuring device, in other words, the person region, is closer to the user U.
The information processing apparatus 100 performs a depth-time-space integration process using the camera pose and the depth map with person region reliability value (Step S202), to generate an occupancy map. For example, the information processing apparatus 100 updates the occupancy probability of each voxel so that the occupancy probability of the voxel corresponding to the pixel having the large person region reliability value c is hardly updated.
As a result, the information processing apparatus 100 can further reduce erroneous detection of the user U, and can generate the occupancy map with higher accuracy. Therefore, the information processing apparatus 100 can set the play area with higher accuracy.
As a method of reducing erroneous detection of the user U, for example, there is a method of using machine learning. For example, Reference [2] discloses a method of detecting the person region from a color image of a first-person viewpoint using deep learning. However, a recognizer used in deep learning requires a large calculation resource. In addition, Reference [2] does not refer to the occupancy map.
On the other hand, the information processing apparatus 100 according to the embodiment of the present disclosure can estimate the person region without using the recognizer, and can further reduce the influence of the user U on the occupancy map at high speed without using a large calculation resource.
Furthermore, the information processing apparatus 100 according to the embodiment of the present disclosure can generate the occupancy map with reduced influence of the user U by using sensing results of the distance measuring device, the IMU, or the like provided in the terminal device 200. As described above, the information processing apparatus 100 can generate the occupancy map with high accuracy without using a device for detecting the person region F such as a controller.
Furthermore, the information processing apparatus 100 according to the embodiment of the present disclosure can generate the occupancy map while the user U moves. In this case, the information processing apparatus 100 can estimate a region near a moving user U as the person region F, and generate an occupancy map in which the influence of the person region F is reduced.
Furthermore, the information processing apparatus 100 can generate the occupancy map in which the influence of the user U is reduced based on the gravity direction, the camera pose, and the depth map. Therefore, when the gravity direction, the camera pose, and the depth map can be acquired, the information processing apparatus 100 can generate the occupancy map in which the influence of the user U is reduced even when a color image cannot be acquired.
The above embodiment mainly describes the case where the user U is standing, but the information processing apparatus 100 may detect whether the user U is standing or sitting.
Therefore, in the present modification, the information processing apparatus 100 detects whether the user U is standing or sitting as a user posture, in addition to the camera pose, and corrects the person region F when a sitting position is detected.
As illustrated in a left diagram of
More specifically, when the sitting position of the user U is detected, the information processing apparatus 100 corrects the person region F by calculating the height r using Expression (6) below instead of Expression (1) to estimate the person region Fs.
The control unit 130 (see
For example, the information processing apparatus 100 detects the floor plane by calculating the maximum plane by RANSAC with respect to the occupancy mpa. Note that the calculation of the maximum plane by RANSAC can be executed using, for example, the technology described in Reference [3].
Here, a distance between the floor plane and the terminal device 200 corresponds to an eye height of the user U. Therefore, the information processing apparatus 100 detects the eye height of the user U based on the floor plane and the camera pose. The information processing apparatus 100 detects the standing position when the detected eye height is a predetermined threshold or more, and detects the sitting position when the detected eye height is less than the predetermined threshold. Note that the predetermined threshold may be a value determined in advance, and may be set, for example, according to a height of the user U. The height of the user U may be input by the user U himself/herself or may be estimated from an external camera (not illustrated) or the like.
The information processing apparatus 100 executes the depth map person region estimation process based on the posture of the user U in addition to the camera pose, the gravity direction, and the distance information (Step S302) to generate a depth map with person region reliability value.
When the standing position (standing state) is detected as the posture of the user U, the information processing apparatus 100 generates the depth map with person region reliability value in the same manner as in the embodiment.
On the other hand, when the sitting position (sitting state) is detected as the posture of the user U, the information processing apparatus 100 generates the depth map with person region reliability value using Expression (6) instead of Expression (1). The method of estimating the person region Fs is the same as the method of estimating the person region F of the embodiment except for the calculation of the height rs, and thus the description thereof will be omitted.
As described above, the information processing apparatus 100 can estimate a corrected person region Fs by detecting the sitting position as the posture of the user U. As a result, the accuracy of generating the occupancy map can be further improved.
In the embodiment described above, the information processing apparatus 100 detects the front distance measurement direction as the posture of the user U, but the present disclosure is not limited thereto. For example, the information processing apparatus 100 may detect the posture itself of the user U.
As illustrated in a left diagram of
When estimating the skeleton of the user U as the posture as illustrated in a left diagram of
The information processing apparatus 100 sets a range of a radius rt centered on the skeleton reflected in the depth map as the person region Ft. The information processing apparatus 100 sets the person region reliability value c corresponding to a value of the radius rt (e.g., rt=rt1 to rt3) to each pixel in the person region Ft.
It is assumed that a smaller value of the radius rt results in a larger person region reliability value c. Furthermore, when person regions having different radii overlap, the information processing apparatus 100 sets a larger value as the person region reliability value c of the overlapping region.
In this way, the information processing apparatus 100 estimates the person region Ft with the skeleton of the user U as the posture of the user U, and generates the depth map with person region reliability value. Note that the occupancy map generation process using the depth map with person region reliability value is the same as that in the embodiment, and thus description thereof will be omitted.
A case where an arm of the user U is detected as the person region F will be described as a third modification of the embodiment of the present disclosure.
As illustrated in a left diagram of
The information processing apparatus 100 executes a clustering process on the depth map. The information processing apparatus 100 performs clustering by calculating a distance between measurement points (pixels) of the depth map using the k-means method described in, for example, Reference [6].
The information processing apparatus 100 acquires a point group including the controller 300 among clustered point clouds based on the position information of the controller 300. In the point clouds including the controller 300, the information processing apparatus 100 sets a point cloud closer to the terminal device 200 than the controller 300 as the person region F. The information processing apparatus 100 generates the depth map with person region reliability value by setting the person region reliability value c to the pixel included in the person region F in the depth map.
In the example in
As illustrated in
For example, the information processing apparatus 100 estimates the person region F using the camera pose, the gravity direction, and the distance information similarly to the embodiment. Furthermore, the information processing apparatus 100 estimates the arm of the user U as the person region F using the distance information and the position information of the controller 300.
The information processing apparatus 100 sets the person region reliability value c to the pixel of the estimated person region F in the depth map, and generates the depth map with person region reliability value. Note that the depth-time-space integration process using the depth map with person region reliability value is the same as that of the embodiment, and thus description thereof will be omitted.
As described above, the information processing apparatus 100 estimates the arm of the user U as the person region F using the position information of the controller 300 gripped by the user U. As a result, the information processing apparatus 100 can generate the occupancy map with higher accuracy.
Note that, here, the information processing apparatus 100 estimates the point cloud region CL2 closer to the terminal device 200 than the controller 300 as the person region F, but the present disclosure is not limited thereto. For example, the information processing apparatus 100 may divide the point cloud region CL2 into a plurality of person regions. More specifically, for example, the information processing apparatus 100 may set a plurality of person regions F such that the person region reliability value c increases as the point cloud region CL2 is closer to the terminal device 200.
In the embodiment described above, the information processing apparatus 100 updates the voxel of the occupancy map based on a degree of influence according to the person region reliability value c, but the present disclosure is not limited thereto. Here, an example in which the information processing apparatus 100 generates the occupancy map using the person region reliability value c will be described.
In the present modification, the information processing apparatus 100 generates two maps: a person region occupancy map (person region environment information) and an environment occupancy map (surrounding environment information), and generates an occupancy map with reduced influence of the user U from the person region occupancy map and the environment occupancy map. The person region occupancy map is an occupancy map in which the person region reliability value c is input as the occupancy probability. The environment occupancy map is an occupancy map generated without using the person region reliability value c, and corresponds to a conventional occupancy map.
In this manner, the person region occupancy map handles whether or not it is a person region as the occupancy probability. Therefore, the person region occupancy map is an occupancy map equivalent to the person region.
Thus, the information processing apparatus 100 generates an occupancy map that does not include the person region by subtracting the person region occupancy map from the generated environment occupancy map. More specifically, the information processing apparatus 100 generates the occupancy map by regarding the voxel of the environment occupancy map corresponding to the occupied voxel of the person region occupancy map as an unknown voxel.
The information processing apparatus 100 executes a first occupancy map generation process using the camera pose and the depth map with person region reliability value (Step S501) to generate a person region occupancy map. For example, the information processing apparatus 100 generates the person region occupancy map using the person region reliability value c as the object occupancy probability.
In addition, the information processing apparatus 100 executes a second occupancy map generation process using the camera pose and the distance information (depth map) (Step S502) to generate an environment occupancy map. Here, the person region reliability value c is not assigned to the distance information (depth map) used by the information processing apparatus 100 for generating the environment occupanry map. The information processing apparatus 100 generates the environment occupancy map using, for example, a conventional method.
Next, the information processing apparatus 100 executes a map integration process using the generated person region occupancy map and environment occupancy map (Step S503) to generate an occupancy map. For example, the information processing apparatus 100 generates an occupancy map that does not include the person region by subtracting (or masking) the person region occupancy map from the environment occupancy map.
As described above, in the present modification, the information processing apparatus 100 generates the person region occupancy map, and subtracts the person region occupancy map from the environment occupancy map, so that it is possible to generate the occupancy map in which the influence of the user U is further reduced.
When the occupancy map is generated, the information processing apparatus 100 can use plane information to generate the occupancy map in which the influence of the person region is further reduced. Here, generation of an occupancy map using the plane information that is a plane region, for example, will be described as a fifth modification.
The surrounding environment of the user U includes many planes parallel to the floor, such as a desk. The person region is not included in the plane. Therefore, the information processing apparatus 100 generates an occupancy map by excluding the plane region from the person region.
Therefore, the information processing apparatus 100 according to the present modification corrects the person region F by excluding the plane region P from the person region F to generate an occupancy map.
Specifically, for example, the information processing apparatus 100 updates the occupancy map without using the person region reliability value c, and generates a plane detection map.
The information processing apparatus 100 detects a plane from the plane detection map. For example, the information processing apparatus 100 acquires a set of center points of occupied voxels in the plane detection map as a point cloud. Next, the information processing apparatus 100 repeatedly detects a plane using, for example, RANSAC described in Reference [3].
The information processing apparatus 100 extracts, as the plane region, a plane having a normal line in the gravity direction and including a point equal to or greater than a predetermined threshold among the detected planes. For example, in
The information processing apparatus 100 updates the occupancy map using the depth map with person region reliability value. At this time, the information processing apparatus 100 regards the person region reliability value c of the voxel included in the detected plane region as “0” and updates the occupancy map.
Next,
The information processing apparatus 100 executes a plane estimation process using the camera pose, the gravity direction, and the distance information (Step S601) to extract the plane region.
Here,
First, the information processing apparatus 100 generates the plane detection map (Step S701). For example, the information processing apparatus 100 generates the plane detection map by updating the occupancy map without using the person region reliability value c.
Next, the information processing apparatus 100 acquires the point cloud from the plane detection map (Step S702). For example, the information processing apparatus 100 acquires a set of center points of occupied voxels in the plane detection map as a point cloud.
The information processing apparatus 100 detects a plane using the acquired pint cloud (Step S703). The information processing apparatus 100 repeatedly detects the plane using, for example, RANSAC described in Reference [3].
The information processing apparatus 100 extracts the plane region from the plane detected in Step S703 according to the normal line direction (Step S704). The information processing apparatus 100 extracts, among the detected planes, a plane having a normal line in the gravity direction and including a point equal to or greater than a predetermined threshold as the plane region parallel to the floor.
The description returns to
As described above, the information processing apparatus 100 according to the present modification detects the plane region parallel to the floor and generates the occupancy map by excluding the plane region from the person region. As a result, even in a case where the person region of the depth map includes environment such as a floor or a table near the user U, the information processing apparatus 100 can more accurately estimate the person region. Therefore, the information processing apparatus 100 can generate an occupancy map with higher accuracy.
The above-described embodiment and modification are examples, and various modifications and applications are possible.
For example, some functions of the information processing apparatus 100 of the present embodiment may be implemented by the terminal device 200. For example, the terminal device 200 may generate the depth map with person region reliability value, or may generate the occupancy map.
In the above-described embodiment, the information processing apparatus 100 sets the play area of the user U, but the present disclosure is not limited thereto. For example, the information processing apparatus 100 may set, as the play area, a range in which a moving object such as a vehicle or a drone can safely move. Alternatively, the information processing apparatus 100 may set, as the play area, a range in which a partially fixed object such as a robot arm can be safely driven. Accordingly, the target object for which the information processing apparatus 100 sets the play area is not limited to the user U.
For example, a communication program for executing the above-described operation is stored and distributed in a computer-readable recording medium such as an optical disk, a semiconductor memory, a magnetic tape, or a flexible disk. Then, for example, the program is installed on a computer, and the above-described processes are executed to configure the control device. At this time, the control device may be a device (e.g., personal computer) outside the information processing apparatus 100 and the terminal device 200. Furthermore, the control device may be a device (e.g., control units 130 and 250) inside the information processing apparatus 100 and the terminal device 200.
In addition, the above communication program may be stored in a disk device included in a server device on a network such as the Internet so that the communication program can be downloaded to the computer. In addition, the above-described functions may be realized by cooperation of an operating system (OS) and application software. In this case, a portion other than the OS may be stored in a medium and distributed, or a portion other than the OS may be stored in a server device and downloaded to the computer.
Among the processes described in the above embodiments, all or part of the processes described as being performed automatically can be performed manually, or all or part of the processes described as being performed manually can be performed automatically by a known method. In addition, the processing procedure, specific name, and information including various data and parameters illustrated in the above document and the drawings can be arbitrarily changed unless otherwise specified. For example, various types of information illustrated in each drawing are not limited to the illustrated information.
In addition, each component of each device illustrated in the drawings is functionally conceptual, and is not necessarily physically configured as illustrated in the drawings. In other words, a specific form of distribution and integration of each device is not limited to the illustrated form, and all or a part thereof can be functionally or physically distributed and integrated in an arbitrary unit according to various loads, usage conditions, and the like. Note that this configuration by distribution and integration may be performed dynamically.
In addition, the above-described embodiments can be appropriately combined in a region in which the processing content do not contradict each other. Furthermore, the order of each step illustrated in the sequence diagram of the above-described embodiment can be appropriately changed.
Furthermore, for example, the present embodiment can be implemented as any configuration constituting an apparatus or a system, for example, a processor as a system large scale integration (LSI) or the like, a module using a plurality of processors or the like, a unit using a plurality of modules or the like, a set obtained by further adding other functions to a unit, or the like (i.e., configuration of a part of device).
Note that, in the present embodiment, the system means a set of a plurality of components (devices, modules (parts), etc.), and it does not matter whether or not all the components are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network and one device in which a plurality of modules is housed in one housing are both systems.
Furthermore, for example, the present embodiments can adopt a configuration of cloud computing in which one function is shared and processed by a plurality of devices in cooperation via a network.
The information processing apparatus such as the information processing apparatus 100 according to each embodiment described above is realized by, for example, a computer 1000 having a configuration as illustrated in
The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400, and controls each unit. For example, the CPU 1100 develops a program stored in the ROM 1300 or the HDD 1400 in the RAM 1200, and executes processes corresponding to various programs.
The ROM 1300 stores a boot program such as a basic input output system (BIOS) executed by the CPU 1100 when the computer 1000 is activated, a program depending on hardware of the computer 1000, and the like.
The HDD 1400 is a computer-readable recording medium that non-transiently records a program executed by the CPU 1100, data used by the program, and the like. Specifically, the HDD 1400 is a recording medium that records a program for the medical arm control method, which is an example of the program data 1450, according to the present disclosure.
The communication interface 1500 is an interface for the computer 1000 to connect to an external network 1550 (e.g., the Internet). For example, the CPU 1100 receives data from another apparatus or transmits data generated by the CPU 1100 to another apparatus via the communication interface 1500.
The input/output interface 1600 is an interface for connecting an input/output device 1650 and the computer 1000. For example, the CPU 1100 receives data from an input device such as a keyboard or a mouse via the input/output interface 1600. In addition, the CPU 1100 transmits data to an output device such as a display, a speaker, or a printer via the input/output interface 1600. Furthermore, the input/output interface 1600 may function as a media interface that reads a program or the like recorded on a predetermined computer-readable recording medium (medium). The medium is, for example, an optical recording medium such as a digital versatile disc (DVD) or a phase change rewritable disk (PD), a magneto-optical recording medium such as a magneto-optical disk (MO), a tape medium, a magnetic recording medium, a semiconductor memory, or the like.
For example, when the computer 1000 functions as the information processing apparatus 100 according to the embodiment of the present disclosure, the CPU 1100 of the computer 1000 implements the functions of the control unit 130 and the like by executing a program loaded on the RAM 1200. Note that the CPU 1100 reads the program data 1450 from the HDD 1400 and executes the program data 1450. However, as another example, an information processing program may be acquired from another device via the external network 1550.
Furthermore, the information processing apparatus 100 according to the present embodiment may be applied to a system including a plurality of devices on the premise of connection to a network (or communication between devices), such as cloud computing. In other words, for example, the information processing apparatus 100 according to the present embodiment described above can be implemented as the information processing system 1 according to the present embodiment by the plurality of devices.
An example of the hardware configuration of the information processing apparatus 100 has been described above. Each of the above-described components may be configured using a general-purpose member, or may be configured by hardware specialized for the function of each component. This configuration can be appropriately changed according to a technical level at the time of implementation.
Although the embodiments of the present disclosure have been described above, the technical scope of the present disclosure is not limited to the above-described embodiments as it is, and various modifications can be made without departing from the gist of the present disclosure. In addition, the components of different embodiments and modifications may be appropriately combined.
Note that the effects of each embodiment described in the present specification are merely examples and not limited thereto, and other effects may be provided.
The present technology can also have the following configurations.
(1)
An information processing apparatus comprising a control unit, the control unit being configured to:
(2)
The information processing apparatus according to (1), wherein
(3)
The information processing apparatus according to (2), wherein
(4)
The information processing apparatus according to (2) or (3), wherein the control unit sets the person region reliability value such that the person region reliability value increases as a region is closer to the user.
(5)
The information processing apparatus according to anyone of (1) to (4), wherein the control unit estimates the person region based on a front distance measurement direction of the distance measuring device and a gravity direction.
(6)
The information processing apparatus according to (5), wherein the control unit corrects the person region when a sitting position is detected as the user posture.
(7)
The information processing apparatus according to anyone of (1) to (4), wherein
(8)
The information processing apparatus according to anyone of (1) to (7), wherein the control unit estimates an arm of the user as the person region according to a position of a second device gripped by the user.
(9)
The information processing apparatus according to (1), wherein
(10)
The information processing apparatus according to (9), wherein
(11)
The information processing apparatus according to anyone of (1) to (10), wherein
(12)
The information processing apparatus according to anyone of (1) to (11), wherein the device used by the user is worn on a head of the user and provides predetermined content to the user.
(13)
An information processing method comprising:
(14)
A program causing a computer to function as a control unit executing:
Number | Date | Country | Kind |
---|---|---|---|
2021-141542 | Aug 2021 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/013402 | 3/23/2022 | WO |