The present invention relates to an information processing device, a system, an information processing method, and an information processing program.
There has been known an event driven vision sensor including pixels each asynchronously generating a signal when the pixel detects a change in intensity of incident light. The event driven vision sensor is advantageous in such a point that the event driven vision sensor can operate at a low power and at a high speed compared with a frame-based vision sensor, specifically, an image sensor such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) which scans all pixels at each predetermined cycle. A technology relating to such an event driven vision sensor is described in, for example, PTL 1 and PTL 2.
The above-mentioned advantage of the event driven vision sensor has been known, but it is hard to say that a method of using the event driven vision sensor in combination with another device has sufficiently been suggested.
In view of the foregoing problem, a purpose of the present invention is to provide an information processing device, a system, an information processing method, and an information processing program capable of using a sensor which synchronously generates an image signal and an event driven vision sensor to carry out tracking, to thereby precisely carry out the tracking while suppressing latency.
According to one aspect of the present invention, provided is an information processing device including a detection unit that detects a detection target on the basis of a first image signal generated by a first image sensor, a setting unit that sets a region of interest including at least a part of the detection target, a tracking unit that tracks the detection target in the region of interest on the basis of a second image signal generated by a second image sensor including an event driven vision sensor that asynchronously generates an image signal when an intensity change in light incident to each pixel is detected, and a comparison unit that compares position information on the detection target represented by a result of the detection by the detection unit on the basis of the first image signal with position information on the detection target represented by a result of the tracking by the tracking unit on the basis of the second image signal associated with the first image signal.
According to another aspect of the present invention, provided is an information processing device including a detection unit that detects a detection target on the basis of a first image signal generated by a first image sensor, a setting unit that sets a region of interest including at least a part of the detection target, and a tracking unit that tracks the detection target in the region of interest on the basis of a second image signal generated by a second image sensor including an event driven vision sensor that asynchronously generates an image signal when an intensity change in light incident to each pixel is detected and a result of the detection by the detection unit on the basis of the first image signal associated with the second image signal.
According to still another aspect of the present invention, provided is a system including an information processing device that includes a first image sensor that generates a first image signal, a second image sensor that includes an event driven vision sensor that asynchronously generates a second image signal when an intensity change in light incident to each pixel is detected, a detection unit that detects a detection target on the basis of the first image signal, a setting unit that sets a region of interest including the detection target, a tracking unit that tracks the detection target in the region of interest on the basis of the second image signal, and a comparison unit that compares position information on the detection target represented by a result of the detection by the detection unit on the basis of the first image signal with position information on the detection target represented by a result of the tracking by the tracking unit on the basis of the second image signal associated with the first image signal.
According to still another aspect of the present invention, provided is a system including an information processing device that includes a first image sensor that generates a first image signal, a second image sensor that includes an event driven vision sensor that asynchronously generates a second image signal when an intensity change in light incident to each pixel is detected, a detection unit that detects a detection target on the basis of the first image signal, a setting unit that sets a region of interest including the detection target, and a tracking unit that tracks the detection target in the region of interest on a basis of the second image signal and a result of the detection by the detection unit on the basis of the first image signal associated with the second image signal.
According to still another aspect of the present invention, provided is an information processing method including a first reception step of receiving a first image signal acquired by a first image sensor, a second reception step of receiving a second image signal generated by a second image sensor that includes an event driven vision sensor that asynchronously generates an image signal when an intensity change in light incident to each pixel is detected, a detection step of detecting a detection target on the basis of the first image signal, a setting step of setting a region of interest including at least a part of the detection target, a tracking step of tracking the detection target in the region of interest on the basis of the second image signal, and a comparison step of comparing position information on the detection target represented by a result of the detection by the detection step on the basis of the first image signal with position information on the detection target represented by a result of the tracking by the tracking step on the basis of the second image signal associated with the first image signal.
According to still another aspect of the present invention, provided is an information processing method including a first reception step of receiving a first image signal acquired by a first image sensor, a second reception step of receiving a second image signal generated by a second image sensor that includes an event driven vision sensor that asynchronously generates an image signal when an intensity change in light incident to each pixel is detected, a detection step of detecting a detection target on the basis of the first image signal, a setting step of setting a region of interest including at least a part of the detection target, and a tracking step of tracking the detection target in the region of interest on the basis of the second image signal and a result of the detection by the detection step on the basis of the first image signal associated with the second image signal.
According to still another aspect of the present invention, provided is an information processing program for causing a computer to implement a function of receiving a first image signal acquired by a first image sensor, a function of receiving a second image signal generated by a second image sensor that includes an event driven vision sensor that asynchronously generates an image signal when an intensity change in light incident to each pixel is detected, a function of detecting a detection target on the basis of the first image signal;
According to the above-mentioned configurations, the tracking can be carried out precisely while suppressing latency by use of the sensor which synchronously generates the image signal and the event driven vision sensor to carry out the tracking.
According to still another aspect of the present invention, an information processing program for causing a computer to implement a function of receiving a first image signal acquired by a first image sensor, a function of receiving a second image signal generated by a second image sensor that includes an event driven vision sensor that asynchronously generates an image signal when an intensity change in light incident to each pixel is detected, a function of detecting a detection target on the basis of the first image signal, a function of setting a region of interest including at least a part of the detection target, and a function of tracking the detection target in the region of interest on the basis of the second image signal and a result of the detection on the basis of the first image signal associated with the second image signal.
Several embodiments of the present invention are now described in detail with reference to the accompanying drawings. Note that components having substantially identical functional configurations in the present description and the drawings are given identical reference signs to omit a redundant description.
The EDS 12 is an example of a second vision sensor which generates an event signal when the sensor detects an intensity change in light and includes a sensor 121 which is a second image sensor forming a sensor array and a processing circuit 122 connected to the sensor 121. The sensor 121 is an event driven vision sensor which includes a light reception element and generates an event signal 123 when an intensity change in light incident to each pixel, more specifically, a luminance change exceeding a predetermined value defined in advance is detected. The sensor 121 does not generate the event signal 123 when an intensity change in incident light is not detected, and hence, the event signal 123 is generated asynchronously in the EDS 12. The event signal 123 output via the processing circuit 122 includes identification information (for example, a position of the pixel) on the sensor 121, a polarity of the luminance change (an increase or a decrease), and a timestamp 124. Moreover, the EDS 12 can generate the event signal 123 at a frequency much higher than a generation frequency (a frame rate of the RGB camera 11) of the RGB image signal 113 when the luminance change is detected. Note that a signal on the basis of which an image can be built is herein referred to as an image signal. Thus, the RGB image signal 113 and the event signal 123 represent examples of the image signal.
In the present embodiment, the timestamp 114 added to the RGB image signal 113 and the timestamp 124 added to the event signal 123 are synchronized with each other. Specifically, for example, the timestamp 114 can be synchronized with the timestamp 124 by providing time information used to generate the timestamp 124 in the EDS 12 to the RGB camera 11. As another example, when pieces of time information used to generate the timestamps 114 and 124 are independent of each other between the RGB camera 11 and the EDS 12, the timestamp 114 and the timestamp 124 can be synchronized with each other later by calculating an offset amount between the timestamps with reference to a time at which a specific event (for example, a change in subject over an entire image) occurs.
Moreover, the sensor 121 of the EDS 12 is associated with one or a plurality of pixels of the RGB image signal 113 through a calibration procedure between the RGB camera 11 and the EDS 12 carried out in advance in the present embodiment, and hence the event signal 123 is generated in correspondence to the intensity change in light in the one or plurality of pixels of the RGB image signal 113. More specifically, the sensor 121 can be associated with the one or plurality of pixels of the RGB image signal 113 by, for example, capturing a common calibration pattern by the RGB camera 11 and the EDS 12, to thereby calculate correspondence parameters between the camera and the sensor from respective internal parameters and external parameters of the RGB camera 11 and the EDS 12.
The information processing device 20 is implemented by, for example, a computer including a communication interface, a processor, and a memory and includes a function of each of a detection unit 21, a setting unit 22, a tracking unit 23, and a comparison unit 24, which are implemented by the processor operating according to a program stored in the memory or received via the communication interface. A description is now further given of the function of each unit.
The detection unit 21 detects a detection target on the basis of the RGB image signal generated by the image sensor 111, which is the first image sensor. In the present embodiment, a case in which the detection target is a person is described as an example. The detection unit 21 calculates coordinate information on at least one joint of the person who is the detection target.
The setting unit 22 sets a region of interest including at least a part of the detection target. The region of interest is a region including at least a part of the detection target, and is an attention attracting region which is a target of tracking described later. The setting unit 22 sets, for each joint of the person detected by the detection unit 21, a square in a predetermined size having the center at the joint as a region of interest R, for example, as depicted in
The tracking unit 23 tracks the detection target in the region of interest R set by the setting unit 22 on the basis of the event signal 123 generated by the sensor 121 which is the second image sensor. In the EDS 12, a luminance change occurs in a case in which the position or the posture of the person who is the user changes, for example, and the event signal 123 is generated by the sensor 121 at a pixel address at which this luminance change has occurred. Thus, the position itself of the event signal 123 in a region corresponding to the region of interest R set by the setting unit 22 corresponds to coordinate information on the detection target, and hence the tracking unit 23 tacks the detection target on the basis of the position of occurrence, the polarity, and the like of the event signal 123. Moreover, the event signal 123 is asynchronously generated in time, and hence the tracking unit 23 carries out the tracking as needed at a timing at which the event signal 123 is generated. Note that, when a plurality of regions of interest R are set by the setting unit 22, the tracking unit 23 carries out the tracking for each region of interest.
The comparison unit 24 compares the position information on the detection target represented by the result of the detection by the detection unit 21 on the basis of the RGB image signal 113 and the position information on the detection target represented by the result of the tracking by the tracking unit 23 on the basis of the event signal 123 associated with the RGB image signal 113 with each other. As described before, the detection unit 21 calculates the coordinate information on the joint of the person who is the detection target on the basis of the RGB image signal 113, and the tracking unit 23 acquires the coordinate information on the joint of this person as a result of the tracking on the basis of the event signal 123.
The comparison unit 24, for the comparison described above, obtains, for example, a difference between the coordinate information calculated in the detection by the detection unit 21 on the basis of the RGB image signal 113 and the coordinate information obtained as a result of the tracking by the tracking unit 23 on the basis of the event signal 123 associated with the RGB image signal 113. The comparison unit 24 selects the event signal 123 having the added timestamp 124 the same as or close to the timestamp 114 added to the RGB image signal 113, and obtains a difference between the coordinate information calculated on the basis of the RGB image signal 113 and the coordinate information obtained by the tracking on the basis of the event signal 123.
When the difference is less than a predetermined threshold value Th, it can be determined that the tracking by the tracking unit 23 is correctly being carried out. Meanwhile, when the difference is equal to or more than the predetermined threshold value Th, it can be determined that the tracking by the tracking unit 23 is not correctly being carried out. When the difference is equal to or more than the predetermined threshold value Th, for example, the motion of the detection target is likely not appropriately reflected to the event signal 123, or a precision of the tracking has likely decreased due to generation of the event signal 123 as a result of a quick luminance change or the like while the detection target does not actually move. In this case, the setting unit 22 again sets the region of interests on the basis of the detection result of the detection unit 21.
The comparison by the comparison unit 24 may be carried out at any timing, but there is considered a case in which the comparison by the comparison unit 24 is carried out according to the frame rate of the RGB image signal 113 in the example of
The comparison unit 24 carries out the comparison on the basis of the RGB image signal 113 and the event signal 123 generated at a time t4. When the difference is equal to or more than the predetermined threshold value Th, the setting unit 22 sets a region of interest Rt4 in place of the region of interest Ru, and the tracking of the detection target in the region of interest Rt4 by the tracking unit 23 is started.
Note that the region of interest suddenly changes in a case in which the position of the region of interest Rt1 and the position of the region of interest Rt4 are greatly different from each other when the setting unit 22 sets the region of interest Rt4 in place of the region of interest Ru. In this case, there may be provided such a configuration that the setting unit 22 gradually or stepwise changes the region of interest from the region of interest Ru to the region of interest Rt4. Further, a method for changing the region of interest by the setting unit 22 may be changed according to the difference obtained by the comparison unit 24, that is, the difference between the coordinate information calculated on the basis of the RGB image signal 113 and the coordinate information obtained by the tracking on the basis of the event signal 123.
As described above, when the difference is less than the predetermined threshold value Th, the tracking of the detection target in the region of interest set by the setting unit 22 is effective, and hence the region of interest is maintained. When the difference is equal to or more than the predetermined threshold value Th, the tracking of the detection target in the region of interest set by the setting unit 22 is highly likely ineffective, and hence the setting unit 22 again sets the region of interest.
Then, when the event signal 123 is generated (Yes in step S107), the tracking unit 23 tracks the detection target in the region of interest R on the basis of the event signal 123 (step S108). Then, the tracking unit 23 carries out the tracking each time the event signal 123 is generated until a predetermined time elapses. When the predetermined time has elapsed (YES in step S109), the detection unit 21 detects the detection target from the RGB image signal 113 (step S110).
The comparison unit 24 carries out the comparison (step S111). While the difference is less than the predetermined threshold value Th (NO in step S112), the processing from step S107 to the processing in step S112 are repeated. When the comparison unit 24 determines that the difference is equal to or more than the threshold value Th (YES in step S112), the setting unit 22 sets the region of interest Rx as the region of interest R on the basis of the detection result in step S110 (step S113). Each unit of the information processing device 20 repeats the processing from steps S107 to the processing in S113 above (the processing from steps S101 to the processing in S104 are also repeated, but this processing does not necessarily have the same cycle as that from step S107 to step S113), to thereby carry out the tracking while the maintenance and the resetting of the region of interest R are carried out at an appropriate timing. Thus, the tracking can be carried out precisely while latency is suppressed.
In
Note that there may be provided such a configuration that both the resetting of the region of interest R described with reference to
The one embodiment of the present invention as described above includes the detection unit 21 that detects a detection target on the basis of the RGB image signal which is a first image signal, generated by the image sensor 111 which is the first image sensor, the setting unit 22 that sets a region of interest including at least a part of the detection target, and the tracking unit 23 that tracks the detection target in the region of interest on the basis of the event signal 123 which is a second image signal generated by the sensor 121 which is the second image sensor and the result of the detection by the detection unit 21 on the basis of the RGB image signal 113 associated with the event signal 123. Thus, it is possible to set the region of interest on the basis of the RGB image signal 113 having a relatively large information amount, and to track the detection target in the region of interest on the basis of the event signal 123 having a relatively high temporal resolution and the result of the detection by the detection unit 21 on the basis of the RGB image signal 113 associated with the event signal 123.
Moreover, one embodiment of the present invention includes the comparison unit 24 which compares the position information on the detection target represented by the result of the detection by the detection unit 21 on the basis of the RGB image signal 113 and the position information on the detection target represented by the result of the tracking by the tracking unit 23 on the basis of the event signal 123 associated with the RGB image signal 113 with each other. Thus, effectiveness of the tracking can continuously be recognized. Moreover, the tracking on the basis of the event signal 123 enables effective use of characteristics of the event driven vision sensor such as a wide dynamic range, the high temporal resolution, and a characteristic independent of background, thereby making possible to carry out the tracking. Thus, it is possible to increase the temporal resolution and the spatial resolution, and accordingly, the tracking can be carried out precisely while suppressing the latency.
Moreover, according to one embodiment of the present invention, the setting unit 22 resets the region of interest R on the basis of the comparison result of the comparison unit 24 when the difference is more than the predetermined threshold value Th. Thus, the tracking can be carried out while maintaining and resetting the region of interest R at the appropriate timing. Thus, the precise tracking can continuously be carried out.
Moreover, one embodiment of the present invention further includes a correction unit which corrects the result of the tracking by the tracking unit 23 on the basis of the result of the comparison by the comparison unit 24. Thus, it is possible to provide a similar effect to that in the above-mentioned case in which the region of interest R is reset.
Moreover, in one embodiment of the present invention, the detection target is a person, the detection unit 21 calculates the coordinate information on at least one joint of the person, and the setting unit 22 sets the region of interest to each joint of the person. Thus, it is possible to precisely carry out the tracking while setting a person as the detection target and suppressing the latency.
Note that the result of the tracking described in the one embodiment of the present invention may be used in any way. For example, the result may be used for a mirroring system which reproduces a motion of a user by a robot or the like, a rendering system which uses the motion of the user for rendering a CG (Computer Graphics) model, a gaming system which receives a user operation in a manner similar to that of a controller, and the like. For example, when the present invention is used for the mirroring system, more detailed and highly precise tracking can be achieved through the increases in the temporal resolution and the spatial resolution, and hence a smoother and finer motion can be reproduced in the robot.
Moreover, the present invention can similarly be applied also to tracking having, as the detection target, for example, a predetermined vehicle, a machine, a living organism, or the like other than the human and tracking having, as the detection target, a predetermined marker or the like.
Moreover, in the detection unit 21 in the information processing device 20 described in the above example, there is depicted the example in which the detection target is detected from the RGB image signal 113 through use of the method of the machine learning, but there may be provided such a configuration that another method is used to detect the detection target in place of the machine learning or in addition to the machine learning. For example, a publicly-known method such as the block matching and the gradient method may be used to detect the detection target from the RGB image signal 113.
Moreover, the system 1 described in the above-mentioned example may be implemented in a signal device or implemented in a plurality of devices in a distributed manner. For example, the system 1 may be a system formed of a camera unit including the RGB camera 11 and the EDS 12, and the information processing device 20.
While the several embodiments of the present invention have been described above in detail with reference to the accompanying drawings, the present invention is not limited to these examples. It is obvious that various modification examples and correction examples within the scope of the technical ideas described in the scope of claims may be conceived of by those having ordinary knowledge in the technical field to which the present invention belongs. Needless to say, it is understood that these examples also belong to the technical scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
2020-191104 | Nov 2020 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/041256 | 11/10/2021 | WO |