The present invention relates to an information processing apparatus and a target information acquisition method for acquiring status information regarding a target on the basis of captured images.
Games may be played by a user watching a display screen of a head-mounted display (referred to as an HMD hereunder) worn on a head and connected with a game machine (e.g., see PTL 1). If a position and posture of the user's head are acquired so that images of a virtual world are displayed in such a manner that a field of view is varied in accordance with face orientation, for example, this can create a situation in which the user feels as if he or she is in the virtual world. The position and posture of the user are generally acquired from a result of analyzing visible and infrared light images captured of the user and from measurements taken by motion sensors incorporated in the HMD.
[PTL 1]
Japanese Patent No. 5580855
Technology for performing any kind of information processing on the basis of captured images is based on an assumption that a target such as a user is within an angle of view of a camera. However, because the user wearing the HMD cannot view the outside world, the user may become disoriented or may be immersed in a game so much that the user may move to an unexpected location in real space without noticing it. This puts the user out of the angle of view of the camera, disrupting the ongoing information processing or lowering its accuracy. Furthermore, the user may remain unaware of a cause of such aberrations. Regardless of the HMD being used or not, in order to implement information processing in more diverse ways with a minimum of stress on the user, it is desirable to acquire status information stably in a more extensive movable range than before.
The present invention has been made in view of the above problems. An object of the invention is therefore to provide techniques that, in acquiring status information regarding the target by image capture, extend the movable range of the target in an easy and stable manner.
One embodiment of the present invention is an information processing system. The information processing system includes: multiple imaging apparatuses configured to capture images of a target from different points of view at a predetermined rate; and an information processing apparatus configured to analyze each of the images covering the target captured by the multiple imaging apparatuses so as to individually acquire sets of position and posture information regarding the target, the information processing apparatus further using one of the sets of the position and posture information to generate and output final position and posture information at a predetermined rate.
Another embodiment of the present invention is a target information acquisition method. The information acquisition method includes the steps of: causing multiple imaging apparatuses to capture images of a target from different points of view at a predetermined rate; and causing an information processing apparatus to analyze each of the images covering the target captured by the multiple imaging apparatuses so as to individually acquire sets of position and posture information regarding the target, the information processing apparatus being further caused to use one of the sets of the position and posture information to generate and output final position and posture information at a predetermined rate.
Incidentally, if other combinations of the above-outlined constituent elements or the above expressions of the present invention are converted between different forms such as a method, an apparatus, a system, a computer program, and a recording medium that records the computer program, they still constitute effective embodiments of this invention.
In acquiring status information regarding a target by image capture, the techniques according to the present invention permit extension of the movable range of the target in an easy and stable manner.
The imaging apparatuses 12a and 12b have cameras for capturing images of the target such as the user at a predetermined frame rate, and mechanisms for generating output data representing captured images obtained by performing common processes such as demosaicing on an output signal from the cameras, before outputting generated data to the paired information processing apparatuses 10a and 10b with which communication is established. The cameras include visual light sensors such as CCD (Charge Coupled Device) sensors or CMOS (Complementary Metal Oxide Semiconductor) sensors used in common digital cameras and digital video cameras. The imaging apparatus 12 may include either a single camera or what is called a stereo camera having two cameras disposed right and left at a known distance apart as illustrated.
Alternatively, the imaging apparatuses 12a and 12b may each be constituted by combining a monocular camera with an apparatus that emits reference light such as infrared light to the target and measures reflected light therefrom. In the case where the stereo camera or the reflected light measuring mechanism is installed, it is possible to obtain the position of the target in a three-dimensional space with high accuracy. It is well known that the stereo camera operates by the technique of determining the distance from the camera to the target by the principle of triangulation using stereoscopic images captured from right and left points of view. Also well known is the technique of determining the distance from the camera to the target through measurement of reflected light on a TOF (Time of Flight) basis or by use of a pattern projection method.
However, even where the imaging apparatuses 12a and 12b are a monocular camera each, by attaching markers of predetermined sizes and shapes to the target or by having the size and shape of the target made known beforehand, it is possible to identify the position of the target in the real world from the position and size of images captured of the target.
The information processing apparatuses 10a and 10b establish communication with the corresponding imaging apparatuses 12a and 12b, respectively, to acquire information regarding the position and posture of the target using data of its images captured and transmitted by the imaging apparatuses. Generally, the position and posture of the target obtained with the above-described techniques using captured images are given as information in a camera coordinate system that has its origin at the optical center of each imaging apparatus and has the axes oriented in the longitudinal, crosswise, and vertical directions of the imaging plane of the information apparatus. With this embodiment, the position and posture information regarding the target is first obtained by the information processing apparatuses 10a and 10b in each camera coordinate system.
The information in the camera coordinate systems is then transformed to information in a world coordinate system integrating these coordinate systems. This generates the final position and posture information regarding the target. This makes it possible to perform information processing using the position and posture information regardless of the field of view of any imaging apparatus covering the target. That is, the movable range of the target is extended by an amount reflecting the number of configured imaging apparatuses without affecting subsequent processes. Because the information processing apparatuses 10a and 10b acquire and use the position and posture information independently in the camera coordinate systems of the corresponding imaging apparatuses 12a and 12b, the existing pairs 8a and 8b of the imaging apparatuses and information processing apparatuses may be utilized unmodified, which makes system implementation easy to accomplish.
In the description that follows, the information processing apparatus 10a that collects the position and posture information in the camera coordinate systems, transforms the collected information into the final position and posture information, and performs predetermined information processing using the generated information may be referred to as “information processing apparatus 10a having the main functions,” and any other information processing apparatus as “information processing apparatus having the sub functions.”
The content of processes performed by the information processing apparatus 10a having the main functions using the position and posture information is not limited to anything specific and may be determined in accordance with the functions or the content of applications desired by the user. The information processing apparatus 10a may acquire the position and posture information regarding the HMD 18 in the manner described above, for example, thereby implementing a virtual reality by rendering it in the field of view in accordance with the user's line of sight. Further, the information processing apparatus 10a may identify the motions of the user's head and hands in order to advance games in which characters or the items reflecting the identified motions appear, or so as to convert the user's motions into command input for information processing. The information processing apparatus 10a having the main functions outputs the generated output data to a display apparatus such as the HMD 18.
The HMD 18 is a display apparatus that presents the user wearing it with images on a display panel such as an organic EL panel positioned in front of the user's eyes. For example, parallax images acquired from right and left points of views are generated and displayed on a right and a left display region bisecting the display screen to let the images be viewed stereoscopically. However, this is not limitative of the embodiment of the present invention. Alternatively, a single image may be displayed over the entire display screen. The HMD 18 may further incorporate speakers or earphones that output sounds to the positions corresponding to the user's ears.
Incidentally, the destination to which the information processing apparatus 10a having the main functions outputs data is not limited to the HMD 18. The destination of the data output may alternatively be a flat-screen display, not illustrated.
The communication between the information processing apparatuses 10a and 10b on one hand and the corresponding imaging apparatuses 12a and 12b on the other hand, between the information processing apparatus 10a having the main functions on one hand and the information processing apparatus 10b having the sub functions on the other hand, and between the information processing apparatus 10a having the main functions on one hand and the HMD 18 on the other hand, may be implemented either by cable such as Ethernet (registered trademark) or wirelessly such as by Bluetooth (registered trademark). The external shapes of these apparatuses are not limited to those illustrated. For example, the imaging apparatus 12a and the information processing apparatus 10a may be integrated into an information terminal, and so may be the imaging apparatus 12b and the information processing apparatus 10b.
Further, the apparatuses may each be provided with an image display function, and images generated in accordance with the position and posture of the target may be displayed by each apparatus. With this embodiment, as described above, the pairs 8a and 8b of the information processing apparatuses and imaging apparatuses acquire the position and posture information regarding the target in the camera coordinate systems. Whereas the target is not limited to anything specific because the process involved may be implemented using existing techniques, the description that follows assumes that the HMD 18 is the target.
The output mechanism section 102 includes a housing 108 shaped in such a manner as to cover the right and left eyes when the user wears the HMD 18. Inside the output mechanism section 102 is a display panel directly facing the eyes when worn. Disposed on the outer surface of the housing 108 are markers 110a, 110b, 110c, 110d, and 110e that are lit in a predetermined color. The number of markers, their arrangements, and their shapes are not limited to anything specific. In the illustrated example, approximately rectangular markers are provided in the four corners and at the center of the output mechanism section 102.
Further, both rear sides of the wearing band 106 are provided with elliptically shaped markers 110f and 110g. On the basis of their number and their positions, the markers thus arranged permit identification of situations in which the user faces sideways or backwards relative to the imaging apparatuses 12a and 12b. Incidentally, the markers 110d and 110e are disposed under the output mechanism section 102 and the markers 110f and 110g are outside the wearing band 116, so that their contours are indicated by dotted lines because the markers are invisible from points of view in
The CPU 22 controls the information processing apparatus 10a as a whole by executing an operating system stored in the storage section 34. Further, the CPU 22 performs various programs read from the removable recording media or downloaded via the communication section 32 and loaded into the main memory 26. The GPU 24 has the functions of both a geometry engine and a rendering processor. Under rendering instructions from the CPU 22, the GPU 24 performs rendering processes and stores resulting display image into a frame buffer, not depicted.
The display image stored in the frame buffer is converted to a video signal before being output to the output section 36. The main memory 26 is configured with a RAM (Random Access Memory) that stores programs and data necessary for processing. The information processing apparatus 10b having the sub functions has basically the same internal circuit configuration. It is to be noted, however, that in the information processing apparatus 10b, the input section 38 receives input of data from the information processing apparatus 10a and the output section 36 outputs the position and posture information in the camera coordinate system.
The CPU 50 processes information acquired from the components of the HMD 18 via the bus 58, and supplies output data acquired from the information processing apparatus 10a having the main functions to the display section 54 and to the sound output section 56. The main memory 52 stores the programs and data required by the CPU 50 for processing. Depending on the application to be executed or on the design of the apparatus, there may be a case where the information processing apparatus 10a performs almost all processing so that the HMD 18 need only output the data transmitted from the information processing apparatus 10a. In this case, the CPU 50 and the main memory 52 may be replaced with more simplified devices.
The display section 54, configured with a display panel such as a liquid crystal panel or an organic EL panel, displays images before the eyes of the user wearing the HMD 18. As described above, a pair of parallax images may be displayed on the display regions corresponding to the right and left eyes so as to implement stereoscopic images. The display section 54 may further include a pair of lenses positioned between the display panel and the eyes of the user wearing the HMD 18, the paired lenses serving to extend the viewing angle of the user.
The sound output section 56 is configured with speakers or earphones positioned corresponding to the ears of the user wearing the HMD 18, the speakers or earphones outputting sounds for the user to hear. The number of channels on which sounds are output is not limited to any specific number. There may be monaural, stereo, or surround channels. The communication section 62 is an interface that transmits and receives data to and from the information processing apparatus 10a, the interface being implemented using known wireless communication technology such as Bluetooth (registered trademark). The IMU sensors 64 include a gyro sensor and an acceleration sensor and acquire angular velocity and acceleration of the HMD 18. The output values of the sensors are transmitted to the information processing apparatus 10a via the communication section 62. The light-emitting section 66 is an element or an aggregate of elements emitting light in a predetermined color. As such, the light-emitting section 66 constitutes the markers disposed at multiple positions on the outer surface of the HMD 18 depicted in
The information processing apparatus 10a having the main functions includes a captured image acquisition section 130 that acquires data representing captured images from the imaging apparatus 12a, an image analysis section 132 that acquires position and posture information based on the captured images, a sensor value acquisition section 134 that acquires the output values of the IMU sensors 64 from the HMD 18, a sensor value transmission section 136 that transmits the output values of the IMU sensors 64 to the information processing apparatus 10b having the sub functions, and a local information generation section 138 that generates position and posture information in the camera coordinate system by integrating the output values of the IMU sensors 64 and the position and posture information based on the captured images. The information processing apparatus 10a further includes a local information reception section 140 that receives the position and posture information transmitted from the information processing apparatus 10b having the sub functions, a global information generation section 142 that generates position and posture information in the world coordinate system, an output data generation section 150 that generates output data by performing information processing using the position and posture information, and an output section 152 that transmits the output data to the HMD 18.
The captured image acquisition section 130 is implemented using the input section 38, CPU 22, and main memory 26 in
The image analysis section 132 is implemented using the CPU 22, GPU 24, and main memory 26 in
The target is not limited to the HMD 18 as discussed above. The position and posture information regarding the user's hand as the target may be acquired on the basis of images of light-emitting markers disposed on the input apparatus, not depicted. Further, it is possible to use in combination the techniques of image analysis for tracking a part of the user's body using contour lines and for recognizing the face or the target having a specific pattern through pattern matching. Depending on the configuration of the imaging apparatus 12a, the distance to the target may be identified by measuring the reflection of infrared rays as described above. That is, the techniques of image analysis are not limited to anything specific as long as they serve to acquire the position and posture of a subject through image analysis.
The sensor value acquisition section 134 is implemented using the input section 38, communication section 32, and main memory 26 in
The local information generation section 138 is implemented using the CPU 22 and main memory 26 in
The local information generation section 138 estimates a subsequent position and posture of the HMD 18 using the position and posture information regarding the HMD 18 identified at the time of the preceding frame and the changes in the position and posture of the HMD 18 based on the output values of the IMU sensors 64. By integrating the estimated position and posture information and the information regarding the position and posture obtained through analysis of captured images, the local information generation section 138 identifies with high accuracy the information regarding the position and position at the time of the next frame. The techniques for status estimation that use the Kalman filter and are known in the field of computer vision may be applied to this process.
The local information reception section 140 is implemented using the communication section 32 and input section 38 in
More specifically, the global information generation section 142 includes a transformation parameter acquisition section 144, an imaging apparatus switching section 146, and a coordinate transformation section 148. The transformation parameter acquisition section 144 acquires transformation parameters for transforming the position and posture information in each camera coordinate system into the world coordinate system by identifying the position and posture information regarding the imaging apparatuses 12a and 12b in the world coordinate system. The acquisition, at this time, of the transformation parameters takes advantage of the fact that if the HMM 18 is found in a region where the fields of view of the imaging apparatuses 12a and 12b overlap with each other (the region will be referred to as “field-of-view overlap region” hereunder), the local information obtained in the camera coordinate systems of both imaging apparatuses proves to be the same when transformed into global information.
When the transformation parameters are derived using the local information actually obtained during operation, the coordinate transformation is accomplished advantageously by taking into consideration error characteristics that may occur upon generation of the local information by each of the information processing apparatuses 10a and 10b. Another advantage is that there is no need to position the imaging apparatuses 12a and 12b with high precision where they are arranged. Also, the transformation parameter acquisition section 144 gradually corrects the transformation parameters in such a manner that the position and posture information thus obtained regarding the imaging apparatuses 12a and 12b in the world coordinate system of will be smoothed in the time direction or that their posture values will approach normal values.
The imaging apparatus switching section 146 switches the imaging apparatuses whose fields of view cover the HMD 18 to select the imaging apparatus for use in acquiring global information. In the case where the HMD 18 is found only in the image captured by one imaging apparatus, the global information is obviously generated using the local information generated by the information processing apparatus corresponding to that imaging apparatus. In the case where the HMD 18 is found in the fields of view of multiple imaging apparatuses, one of them is selected in accordance with predetermined rules. For example, the imaging apparatus closest to the HMD 18 is selected, and the global information is generated using the local information generated by the information processing apparatus corresponding to the selected imaging apparatus.
The coordinate transformation section 148 generates the global information by performing a coordinate transformation on the local information generated by the information processing apparatus corresponding to the selected imaging apparatus. At this point, using the transformation parameters, generated by the transformation parameter acquisition section 144, corresponding to the selected imaging apparatus allows the coordinate transformation section 148 to obtain accurately the position and posture information independent of the imaging apparatuses constituting the sources of information.
The output data generation section 150 is implemented using the CPU 22, GPU 24, and main memory 26 in
The output section 152 is implemented using the output section 36 and communication section 32 in
The information processing apparatus 10b having the sub functions includes a captured image acquisition section 160 that acquires the data of captured images from the imaging apparatus 12b, an image analysis section 162 that acquires position and posture information based on the captured images, a sensor value reception section 164 that receives the output values of the IMU sensors 64 from the information processing apparatus 10a, a local information generation section 166 that generates local information by integrating the position and posture information based on the captured images and the output values of the IMU sensors 64, and a local information transmission section 168 that transmits the local information to the information processing apparatus 10a.
The captured image acquisition section 160, image analysis section 162, and local information generation section 166 have the same functions as those of the captured image acquisition section 130, image analysis section 132, and local information generation section 138 respectively in the information processing apparatus 10a having the main functions. The sensor value reception section 164 is implemented using the communication section 32 and input section 38 in
The play areas 184a and 184b are delimited, in the front-back direction, for example, by an extent ranging from a distance A of approximately 0.6 m from the imaging apparatus 12a to a distance B of approximately 3 m therefrom, by a width C of approximately 0.7 m in the crosswise direction closest to the imaging apparatus 12a, and by a width D of approximately 1.9 m in the crosswise direction farthest from the imaging apparatus 12a. The camera coordinate systems of the imaging apparatuses 12a and 12b are each defined by the optical center as the origin, by an X axis representing the imaging plane oriented right in the crosswise direction, by a Y axis representing the imaging plane oriented upward in the longitudinal direction, and by a Z axis representing the imaging plane oriented in the vertical direction. According to existing techniques, the position and posture of the HMD 18 in the play area (e.g., play area 184a) of one imaging apparatus (e.g., imaging apparatus 12a) are obtained using the camera coordinate system of that imaging apparatus.
In this embodiment, the play areas are extended by providing multiple such systems. When the imaging apparatuses 12a and 12b are arranged in such a manner that their play areas are contiguous to each other as illustrated, the overall play area is doubled in size. It is to be noted, however, that these multiple play areas need only be continuous and that the imaging apparatuses 12a and 12b need not be arranged so that their play areas are precisely adjacent to each other. As described above, the information processing apparatus 10a corresponding to the imaging apparatus 12a generates the local information constituted by the position and posture of the HMD 18 in the camera coordinate system of the imaging apparatus 12a.
The information processing apparatus 10b corresponding to the imaging apparatus 12b generates the local information constituted by the position and posture of the HMD 18 in the camera coordinate system of the imaging apparatus 12b. Consider an example in which the HMD moves from the position of an HMD 18a to the position of an HMD 18b to the position of an HMD 18c, as illustrated. When the HMD is in the play area 184a of the imaging apparatus 12a as in the case of the HMD 18a, the local information obtained in the camera coordinate system of the imaging apparatus 12a is transformed into global information. When the HMD is in the play area 184b of the imaging apparatus 12b as in the case of the HMD 18c, the local information obtained in the camera coordinate system of the imaging apparatus 12b is transformed into global information.
When the HMD is in a field-of-view overlap region 186 between the imaging apparatuses 12a and 12b while moving from the play area 184a to the play area 184b as in the case of the HMD 18b, the imaging apparatus as the source of local information for use in generating global information is switched from the imaging apparatus 12a to the imaging apparatus 12b at a timing in accordance with predetermined rules. For example, the imaging apparatus switching section 146 monitors the distance between the center of gravity of the HMD 18b on one hand and each of the optical centers of the imaging apparatuses 12a and 12b on the other hand. At the time when the magnitude relation between the monitored distances is reversed, the closer imaging apparatus of the two (e.g., imaging apparatus 12b) is selected so that the local information obtained in the camera coordinate system of the selected apparatus may be used to generate global information.
Also, when the HMD is in the field-of-view overlap region 186 as in the case of the HMD 18b, the local information obtained in both camera coordinate systems should represent the same position and posture information when transformed into global in formation. This assumption is used by the transformation parameter acquisition section 144 as the basis for obtaining the parameters for transforming the local information into global information.
Qualitatively, the origin and the rotation angles of the axes of each camera coordinate system in the world coordinate system may be used, when obtained, to transform the position and posture of the HMD 18 in the camera coordinate systems into information in the world coordinate system. This involves obtaining, first of all, the position and posture of the imaging apparatus 12b as viewed from the imaging apparatus 12a. Here, the position in three-dimensional coordinates is represented by “pos” and the quaternion indicative of the posture is noted as “quat.” A posture difference dq of the HMD 18b between the camera coordinate system of the imaging apparatus 12a (referred to as “0-th camera coordinate system” hereunder) and the camera coordinate system of the imaging apparatus 12b (referred to as “first camera coordinate system” hereunder) is calculated as follows: dq=hmd.quat@cam0*conj(hmd.quat@cma1)
In the above equation, hmd.quat@cam0 denotes the posture of the HMD 18b in the 0-th camera coordinate system, and hmd.quat@cam1 stands for the posture of the HMD 18b in the first camera coordinate system. “conj” represents the function that returns the conjugate of a complex number. The first camera coordinate system is rotated by an amount of the posture difference so as to align the posture of the HMD 18b, before the vector from the origin of the 0-th camera coordinate system to the HMD 18b and the vector from the HMD 18b to the imaging apparatus 12b are added up. This provides the position cam1.pos@cam0 of the imaging apparatus 12b in the 0-th camera coordinate system as illustrated.
cam1.pos@cam0=rotate(dq,-hmd.pos@cam1)+hmd.pos@cam0 where, “rotate” is the function for rotating coordinates around the origin.
If the position and posture information cam0@world regarding the imaging apparatus 12a in the world coordinate system is already known, then the position and posture information cam1@world regarding the imaging apparatus 12b in the world coordinate system is obtained by transforming the position cam1.pos@cam0 and the posture dq of the imaging apparatus 12b in the 0-th camera coordinate system further into data in the world coordinate system. The calculations involved may be common 4×4 affine transformation matrix operations. In the case where the 0-th camera coordinate system is used uncorrected as the world coordinate system, the position of the imaging apparatus 12a is given as cam0.pos@world=(0, 0, 0) and the posture as cam0.quat@world=(0, 0, 0, 1).
When the position cam1.pos@world of the imaging apparatus 12b and its posture cam1.quat@world in the world coordinate system are obtained in the manner described above, it is possible to transform the position hmd.pos@cam1 of a given HMD and the posture hmd.quat@cam1 thereof in the first camera coordinate system of the imaging apparatus 12b into a position hmd.pos@world and a posture hmd.quad@world in the world coordinate system.
hmd.quat@world=cam1.quat@world*hmd.quat@cam1
hmd.pos@world=rotate(cam1.quat@world,hmd.pos@cam1)+cam1.pos@world
Alternatively, the position and posture information regarding the imaging apparatus 12b in the world coordinate system may be obtained collectively by affine transformation. That is, a 4×4 matrix representative of the position and posture information hmd@cam0 regarding the HMD 18b in the 0-th camera coordinate system, of the position and posture information hmd@cam1 regarding the HMD 18b in the first camera coordinate system, and of the position and posture information cam0@world regarding the imaging apparatus 12a in the world coordinate system is used to obtain a matrix cam1mat of the position and posture information cam1@world regarding the imaging apparatus 12b in the world coordinate system as follows:
cam0to1mat=hmd0mat*inverse(hmd1mat)
cam1mat=cam0mat*cam0to1mat
where, “inverse” is the function for obtaining an inverse matrix.
Explained below is the operation of the information processing apparatuses implemented by use of the configurations described above.
With the communication established, the imaging apparatuses 12a and 12b transmit the data of captured images and the HMD 18 transmits the output values of the IMU sensors 64. This causes the local information generation sections 138 and 166 in the information processing apparatuses 10a and 10b to generate the position and posture information regarding the HMD 18 in their respective camera coordinate systems (S14 and S16). When the HMD 18 is not in the field of view of a given imaging apparatus at this point, the corresponding information processing apparatus generates invalid data. The information processing apparatus 10b having the sub functions transmits the generated local information to the information processing apparatus 10a having the main functions.
In the case where multiple sets of local information include valid data as the position and posture information regarding the HMD 18, that means the HMD 18 is in the field-of-view overlap region. During this period, the imaging apparatus switching section 146 in the information processing apparatus 10a having the main functions monitors whether or not the HMD 18 meets predetermined switching conditions (S18). For example, in the case where the HMD 18 in the field of view of the imaging apparatus 12a moves out of it and into the field of view of the imaging apparatus 12b as depicted in
Another condition for switching the source of information may be the time when the center of gravity of the HMD 18 moves into the play area of an adjacent imaging apparatus. Qualitatively, the imaging apparatus 12 capable of acquiring the position and posture information regarding the HMD 18 with higher accuracy than any other imaging apparatus is selected as the source of information. When such switching conditions are met (Y in S18), the transformation parameter acquisition section 144 in the information processing apparatus 10a first acquires the transformation parameters for the camera coordinate system of the imaging apparatus whose field of view has started to cover the HMD 18 anew (S20).
Specifically, as discussed above, when the local information acquired by each of the information processing apparatuses is transformed into global information, the transformation parameters for the camera coordinate system of the destination imaging apparatus are acquired in such a manner that the positions and postures provided by the information processing apparatuses coincide with one another. The transformation parameter acquisition section 144 stores the acquired transformation parameters in an internal memory in association with information for identifying the imaging apparatus. The imaging apparatus switching section 146 proceeds to select the imaging apparatus serving as the source of the local information that is to be transformed into global information in the manner described above (S24).
In the case where the HMD 18 does not meet the switching conditions (N in S18) or where the HMD 18 has met the switching conditions causing the sources of information switched (S24), the coordinate transformation section 148 generates the global information by transforming in coordinates the local information from the currently determined source of information (S26). Used at this time are the transformation parameters held by the transformation parameter acquisition section 144 in the internal memory in association with the imaging apparatus serving as the source of information. The output data generation section 150 generates data such as display images using the global information. The output section 152 outputs the generated data to the HMD 18 (S28). Because the global information is independent of the imaging apparatus serving as the source of information, the output data generation section 150 can generate the output data through similar processing.
If it is not necessary to terminate the process by the user's operation, for example (N in S30 or N in S34), the transformation parameter acquisition section 144 corrects as needed the transformation parameters acquired in S20 (S32). Thereafter, the information processing apparatus 10a having the main functions repeats the processing in S14 to S28 and in S32, and the information processing apparatus 10b having the sub function repeats the processing in S16, at a predetermined rate each.
When the information processing apparatuses generate the local information, they can determine the position and posture information with a minimum of errors by additionally using the position and posture information regarding the HMD 18 estimated from the output values of the IMU sensors 64 in the HMD 18, as discussed above. This measure is taken to deal with the fact that errors are included both in the position and posture information obtained from the captured images and in the position and posture information acquired from the IMU sensors 64. The local information that integrates these sets of information also includes minute errors. The transformation parameters acquired in S20 can potentially include minute errors also because these parameters are based on the local information.
For that reason, the transformation parameters are to be acquired and corrected as needed. The local information obtained immediately after such correction is then transformed into global information with the fewest possible errors. On the other hand, at a time when the imaging apparatus as the source of information is switched in S24, even a little deviation of the axes of the world coordinate system before and after the switchover can cause a discontinuous change in the field of view of the display image generated by use of the world coordinate system. Such a change can give the user an uncomfortable feeling. Thus, in S20 immediately before the switching of the imaging apparatus as the source of information, the local information in each of the camera coordinate systems at that point in time is compared with each other as discussed above. The comparison of the local information enables the transformation parameters to be acquired in such a manner that the world coordinate system after the transformation fully coincides with the preceding world coordinate system.
As a result of giving priority to such continuity, the position and posture of the imaging apparatus represented by the transformation parameters acquired in S20 may conceivably include relatively large errors. The transformation parameters, if used uncorrected, can lead to errors accumulating in the position and posture information regarding the HMD 18 and can even cause the origin of the world coordinate system to be displaced or tilted. Thus, in S32 as a period other than the switching timing for the imaging apparatuses, the transformation parameter acquisition section 144 gradually corrects the transformation parameters acquired in S20 upon switching of the imaging apparatuses.
That is, the transformation parameters are corrected in a manner reflecting the actual positions and postures of the imaging apparatuses. The techniques of correction may be varied depending on the characteristics of the imaging apparatuses. For example, in the case where the imaging apparatuses 12a and 12b are fixed, the transformation parameters are corrected in such a manner that the positions and postures represented by the transformation parameters become averages of the positions and postures obtained so far. In the case where the imaging apparatuses 12a and 12b are fixed, with the longitudinal direction of their imaging planes coinciding with the vertical direction of the real space, the postures represented by the transformation parameters are corrected in such a manner that the Y axes of the imaging apparatuses 12a and 12b are in the reverse direction of gravity. The direction of gravity is obtained on the basis of the output values of the IMU sensors 64 in the HMD 18.
In the case where the imaging apparatuses 12a and 12b are not fixed, the positions and postures obtained so far are smoothed in the time direction. This determines the target values for the positions and postures represented by the transformation parameters. When the camera coordinate system of the imaging apparatus 12a corresponding to the information processing apparatus 10a having the main functions is taken as the world coordinate system, the transformation parameters are corrected in a manner making the origins and axes of the two systems coincide with one another. Such corrections are carried out gradually in multiple steps in such a manner that the user, presented with the generated display images, will not notice. For example, the upper limits of correction amounts per unit time may be obtained beforehand by experiments, and the number of times correction to be separately performed may be determined in accordance with the actually required correction amounts. Upon completion of the corrections, the processing in S32 may be omitted.
Repeating the processing in S14 to S32 permits continuous output of images through similar processing regardless of the imaging apparatus whose field of view currently covers the user wearing the HMD 18. If it becomes necessary to terminate the process typically by the user's operation, the whole processing is terminated (Y in S30 or Y in S34). A similar processing procedure basically applies to the case where three or more imaging apparatuses are configured. In such a case, however, the switching of the sources of information may conceivably be performed between imaging apparatuses excluding the imaging apparatus 12a corresponding to the information processing apparatus 10a having the main functions.
At that time, the above-described techniques are used directly to acquire the position and posture information regarding the post-switching imaging apparatus in the camera coordinate system of the pre-switching imaging apparatus. Meanwhile, the position and posture information regarding the pre-switching imaging apparatus in the world coordinate system is supposed to have been obtained by the cascading switching of imaging apparatuses relative to the displacement so far of the HMD 18. As a result, the position and posture information regarding the post-switching imaging apparatus in the world coordinate system, as well as the transformation parameters eventually, can be indirectly obtained in a continuation of the cascading switching.
The above-described processing procedure includes two processes: a process in which the information processing apparatus 10a having the main functions transmits the output values of the IMU sensors 64 to the information processing apparatus 10b having the sub functions, and a process in which the information processing apparatus 10b having the sub functions transmits the local information to the information processing apparatus 10a having the main functions. In a system such as this embodiment in which the result of tracking the target is reflected in the output data in real time, it is particularly important to align the time axes of diverse data from the point of view of processing accuracy.
However, since the information processing apparatuses 10a and 10b operate in their respective process times, the timestamps added by the source information processing apparatus cannot be applied unmodified to the time axis of the own apparatus. Thus, the difference in process time between the information processing apparatuses 10a and 10b is measured so that timestamps may be reciprocally transformed therebetween.
This technique is used basically to obtain the parameters for transforming timestamps from the difference in time between the transmission and reception of test signals in round-trip propagation. In
Making at least two measurements using the above relationship provides transform expressions for aligning the timestamps of one information processing apparatus with the time axis of another information processing apparatus. For example, the following linear expression is used to transform the timestamp t of the information processing apparatus 10b into the timestamp T of the information processing apparatus 10a:
T=t*scale+offset
where, “scale” and “offset” are obtained using simultaneous equations based on two measurements.
For example, the sensor value reception section 164 in the information processing apparatus 10b having the sub functions transforms the timestamp T, which was transmitted from the information processing apparatus 10a and added to the output values of the IMU sensors 64, into the timestamp t of the own apparatus. In this manner, the time axis of the sensor output values is aligned with the time axis of the captured image analysis processing in the own apparatus. When the local information obtained as described above is to be transmitted to the information processing apparatus 10a, the local information transmission section 168 transforms the timestamp t of the applicable position information into the timestamp T of the information processing apparatus 10a, before adding the transformed timestamp to the outgoing local information.
In the manner described above, it is possible to improve the accuracy of position information and output data without increasing the processing load on the information processing apparatus 10a having the main functions. In the case where three or more imaging apparatuses are configured, similar transformation processing may be implemented by measuring the difference in process time between the information processing apparatus 10a having the main functions on one hand and any other information processing apparatus on the other hand. Incidentally, it is preferred that the errors in transform processing be minimal because such errors can affect the stability of the position and posture information. For example, an error in the above-mentioned “scale” parameter will increase progressively with time.
Thus, it is preferable to update the parameters used for transformation by measuring the difference in process time periodically using the period during which the HMD 18 is not in the field of view, for example. The difference in process time is measured between the sensor value transmission section 136 or the local information reception section 140 in the information processing apparatus 10a having the main functions on one hand, and the sensor value reception section 164 or the local information transmission section 168 in the information processing apparatus 10b having the sub functions on the other hand. The obtained parameters are retained on the side of the information processing apparatus 10b having the sub functions.
Even in these configurations, a communication mechanism, not depicted, may be used to aggregate the local information into a single information processing apparatus 10a. This makes it possible, through processing similar to what was discussed above, to let the HMD 18 display images reflecting the user moving in an extensive range. Where the imaging apparatuses 12 are arranged to face each other a few meters apart, the user leaving one group of imaging apparatuses necessarily approaches another group of imaging apparatuses. This permits stable acquisition of the position and posture information. Incidentally, the arrangements and the number of configured imaging apparatuses in the drawing are only examples and are not limitative of this invention. Alternatively, each of the plates may be furnished with the imaging apparatuses arranged in a matrix pattern. The imaging apparatuses may further be arranged in a manner encircling the movable range of the user vertically, longitudinally, and crosswise. As another alternative, the imaging apparatuses may be arranged in a curved line as in a circle, or on a curved plane as on a sphere.
In this embodiment, the information processing apparatuses 10a to 10j each generate the local information independently of one another. The generated local information is aggregated into one information processing apparatus 10a. The amount of data transmitted in this case is considerably small compared with a case where multiple imaging apparatuses are configured without being paired and the data of images captured thereby are processed by a single information processing apparatus. It follows that even where numerous apparatuses are arranged over an extensive area as illustrated, there are few problems with processing speeds or communication bands. Where data transmission and reception is implemented with wireless communication by taking advantage of the small data amount involved, it is possible to circumvent constraints on the number of input terminals as well as problems of cable routing.
In the above-described embodiment, multiple pairs of the imaging apparatuses and information processing apparatuses are provided, each pair carrying out image analysis to acquire the position and posture information regarding the target. The local information thus obtained is aggregated into a single information processing apparatus to generate the final position and posture information. Since each information processing apparatus can utilize existing techniques when acquiring the local information, the movable range of the target is extended easily with high scalability. Because the position and posture information is ultimately generated in a manner independent of imaging apparatuses, the information processing carried out using the generated position and posture information is not limited thereby.
By taking advantage of the period during which the target is in the region where the fields of view of adjacent imaging apparatuses overlap with each other, the relative position and posture information regarding these imaging apparatuses is further obtained. The acquired information is used as the basis for obtaining the parameters for transformation from each camera coordinate system to the world coordinate system. The local information is corrected when obtained by the individual information processing apparatuses taking into consideration their current error characteristics. Because the transformation parameters are acquired using the actual local information, the position and posture information is obtained constantly with higher accuracy than if the transformation parameters acquired beforehand through calibration, for example, are utilized.
Upon switching of the imaging apparatuses as the source of information for generating position and posture information, the continuity of the information is guaranteed by determining the transformation parameters in such a manner that the position and posture information in the pre-switching world coordinate system coincides with that in the post-switching world coordinate system. Meanwhile, the position and posture of the imaging apparatus represented by the transformation parameters obtained as described above are corrected to normal values during the post-switching period so as to maintain the accuracy of the position and posture information regarding the target. This eliminates problems with information continuity and accuracy stemming from the introduction of multiple imaging apparatuses.
Furthermore, the difference in process time between the information processing apparatus in which the local information is aggregated on one hand, and any other information processing apparatus on the other hand, is measured periodically in order to transform timestamps reciprocally therebetween. This provides a common time axis for processes involving communication between the information processing apparatuses, such as a process of integrating the transmitted output values of the IMU sensors and the result of analysis of captured images, or a process of generating the global information and the output data using the transmitted local information. Consequently, the movable ranges of the user and of the target are easily extended without adversely affecting or limiting processing accuracy or output results. Because the degree of freedom is high with respect to the arrangement and the number of imaging apparatuses, an environment optimized for the content of the intended information processing is easily implemented at low cost.
The present invention has been described above in conjunction with a specific embodiment. It is to be understood by those skilled in the art that suitable combinations of the constituent elements and of various processes of the embodiment described above as examples will lead to further variations of the present invention and that such variations also fall within the scope of this invention.
For example, whereas there is one information processing apparatus having the main functions in the above embodiment, two or more information processing apparatuses having the main functions may be configured instead. As another example, there may be provided two or more targets such as HMDs each assigned one information processing apparatus having the main functions. In this case, through processing similar to that of the above embodiment, the position and posture information may be tracked continuously in extensive ranges. As a further example, even where there are multiple targets, only one information processing apparatus having the main functions may be provided to selectively process and output the position and posture information regarding the multiple targets.
10
a Information processing apparatus, 10b Information processing apparatus, 12a Imaging apparatus, 12b Imaging apparatus, 18 HMD, 22 CPU, 24 GPU, 26 Main memory, 130 Captured image acquisition section, 132 Image analysis section, 134 Sensor value acquisition section, 136 Sensor value transmission section, 138 Local information generation section, 140 Local information reception section, 142 Global information generation section, 144 Transformation parameter acquisition section, 146 Imaging apparatus switching section, 148 Coordinate transformation section, 150 Output data generation section, 152 Output section, 160 Captured image acquisition section, 162 Image analysis section, 164 Sensor value reception section, 166 Local information generation section, 168 Local information transmission section
As described above, the present invention may be applied to diverse information processing apparatuses such as game machines, imaging apparatus, and image display apparatuses, as well as to information processing systems that include any of such apparatuses.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2017/035066 | 9/27/2017 | WO | 00 |