The present invention relates to a simulation system, a simulation program and a simulation method of a recognition function module for an image varying with position shifting information of a vehicle by the use of a virtual image of a near infrared sensor and a laser beam sensor of a LiDAR.
At the present time, for the purpose of realizing automatic driving of vehicles such as ADAS (advanced driver assistance system) or the like to detect and avoid the possibility of an accident in advance, various tests have actively been conducted by recognizing images of a camera installed on a vehicle to detect objects such as other vehicles, walkers and a traffic signal in accordance with an image recognition technique to perform control to automatically decrease the speed of the vehicle and avoid the objects and the like. In the case of the above experiment system, it is particularly important to synchronously control the entire system with a real-time property and a high recognition rate.
An example of an automatic driving support system is a travel control system disclosed, for example, in Patent Document 1. The travel control system disclosed in this Patent Document 1 is aimed at realizing an automatic driving system with which a vehicle can travel on a predetermined traveling route by detecting road markings such as a lane marker, a stop position and the like around own vehicle on a road and detecting solid objects such as a plurality of mobile objects/obstacles located around the own vehicle to determine a traveling area on the road while avoiding collision with solid objects such as a traffic signal and a signboard.
Incidentally, for the purpose of performing control by recognizing an outside peripheral situation with onboard sensors, it is required to determine vehicles, bicycles and walkers which belong to categories of a plurality of mobile objects and a plurality of obstacles, and detects information about positions and speeds thereof. Furthermore, for driving own vehicle, it is required to determine the meanings of paints such as a lane marker and a stop sign on a road, and the meanings of traffic signs. As a vehicle-mounted camera for detecting outside information around own vehicle, it is considered effective to use an image recognition technique with an image sensor of a camera.
[Patent Document] Japanese Unexamined Patent Application Publication No. 2016-99635
In order to realize automatic driving of a vehicle, the vehicle itself has to recognize the surrounding environment. For this purpose, it is needed to accurately measure the distance between the vehicle itself and a surrounding object. The technique for performing distance measurement has been developed with the following devices which have been already installed in many marketed vehicles for realizing driving assist techniques such as lane keeping, cruise control and automatic braking.
While there are a plurality of methods as described above, each method has both advantages and disadvantages. In the case of the stereoscopic camera, while a distance can easily and accurately be measured by a three-dimensional view, two cameras have to be separated by at least 30 cm resulting in the limit of miniaturization.
The infrared depth sensor and the ultrasonic wave sensor are advantageous in low costs, but substantial attenuation is caused by distance. Because of this, in the case where the distance to the object is greater than several tens of meters, accurate measurement becomes difficult, or measurement itself becomes impossible. Contrary to this, the millimeter wave radar and the LiDAR result in less attenuation even over a long distance, so that it is possible to perform accurate measurement even over a long distance. While there are problems that the apparatus becomes expensive and that it is difficult to reduce the size, installation thereof on vehicles is considered to accelerate by the future research and development.
As has been discussed above, in order to accurately measure the distance to the object from a short distance to a long distance, it is a practical means at the present time to selectively use different sensors. Besides the automatic driving of vehicles, promising applications of the sensors include the technique of detecting the motion of a head for preventing a driver from napping in a vehicle, the technique of detecting gestures, and the technique of avoiding an obstacle for automatic moving robot.
Incidentally, it is regarded as indispensable for future automatic driving to collect a large number of photographed images taken by various sensors as described above to improve the recognition rate of images by a deep learning recognition technique.
However, while it is practically impossible to collect test data by endlessly driving a vehicle in the actual world, it is an important issue how to carry out the above verification with a sufficient reality of an actually substitutable level. For example, in the case where an outside environment is recognized by an image recognition technique with camera images, the recognition rate is substantially changed by external factors such as the weather around own vehicle (rain, fog or the like) and the time zone (night, twilight, backlight or the like) to influence the detection result. As a result, with respect to mobile objects, obstacles and paints on a load around own vehicle, there are increased misdetection and undetection. Such misdetection and undetection of an image recognition means can be resolved with a deep leaning (machine learning) technique having a highest recognition rate by increasing the number of samples for learning.
However, it has a limit to extract learning samples during actually driving on a load, and it is not realistic as a development technique to carry out a driving test and sample collection after meeting severe weather conditions such as rain, backlight, fog or the like while such conditions are difficult to reproduce only with a rare opportunity.
On the other hand, for the purpose of realizing fully automatic driving in future, the above image recognition of camera images would not suffice. This is because camera images are two-dimensional images so that, while it is possible to extract objects such as vehicles, walkers and a traffic signal and the like by image recognition, it is impossible to detect the distance to each picture element of the object. Accordingly, a sensor using laser beams called LiDAR and a sensor using near infrared rays are highly anticipated as means for dealing with the above issues. It is therefore possible to substantially improve the safety of a vehicle during driving by combining a plurality of different types of sensors as described above.
In order to solve the problem as described above, the present invention is related to the improvement of the recognition rate of target objects such as other vehicles peripheral to own vehicle, obstacles on the road, and walkers, and it is an object of the present invention to improve reality of the driving test of a vehicle and sample collection by artificially generating images which are very similar to actually photographed images taken under conditions, such as severe weather conditions, which are difficult to reproduce. In addition, it is an object of the present invention to build a plurality of different types of sensors in a virtual environment and generate images of each sensor by the use of a CG technique. Furthermore, it is an object to provide a simulation system, a simulation program and a simulation method for performing synchronization control with CG images which are generated.
In order to accomplish the object as described above, the present invention is related to a system, a program and a method of generating, as computer graphics, a virtual image which is input to a sensor unit, comprising:
a scenario creation unit which creates a scenario relating to locations and behaviors of objects existing in the virtual image;
a 3D modeling unit which performs modeling of each of the objects on the basis of the scenario;
a 3D shading unit which performs shading of each model generated by the modeling unit and generates a shading image of each model;
a component extraction unit which extracts and outputs a predetermined component contained in the shading image as a component image; and
a depth image generation unit which generates a depth image in which a depth is defined on the basis of three-dimensional profile information about each object in the component image.
In the case of the above invention, it is preferred that the component is an R component of an RGB image.
Also, in the case of the above invention, it is preferred to further provide a gray scale conversion unit which performs gray scale conversion of the component.
The present invention is related to a system, a program and a method of generating, as computer graphics, a virtual image which is input to a sensor unit, comprising:
a scenario creation unit which creates a scenario relating to locations and behaviors of objects existing in the virtual image;
a 3D modeling unit which performs modeling of each of the objects on the basis of the scenario;
a 3D shading unit which performs shading of each model generated by the modeling unit and generates a shading image of each model; and
a depth image generation unit which generates a depth image in which a depth is defined on the basis of three-dimensional profile information about each of the objects, wherein
the shading unit is provided with:
a function to perform shading only of a predetermined portion of the model on which is reflected a light beam emitted from the sensor unit; and
a function to output only a three-dimensional profile of the predetermined portion, and wherein
the depth image generation unit generates a depth image of each of the objects on the basis of the three-dimensional profile of the predetermined portion.
In the case of the above invention, it is preferred that the sensor unit is a near infrared sensor. Also, in the case of the above invention, it is preferred that the sensor unit is a LiDAR sensor which detects reflected light of emitted laser light.
In the case of the above invention, it is preferred that the scenario creation unit is provided with a mechanism to determine three-dimensional profile information of objects, behavior information of objects, material information of objects, parameter information of light sources, positional information of cameras and positional information of sensors.
In the case of the above invention, it is preferred to further provide a deep learning recognition learning unit which acquires, as teacher data, and performs training of a neural network by back propagation on the basis of the component image, the depth image generated by the depth image generation unit, and the teacher data.
In the case of the above invention, it is preferred to provide a deep learning recognition learning unit which acquires, as teacher data, an irradiation image and a depth image on the basis of actual photography, and performs training of a neural network by back propagation on the basis of the image obtained by the shading unit as a result of shading, the depth image generated by the depth image generation unit, and the teacher data.
In the case of the above invention, it is preferred to further provide
a TOF calculation unit which calculates, as TOF information, a time required from irradiation of a light beam to reception of a reflected light thereof on the basis of the depth image generated by the depth image generation unit;
a distance image generation unit which generates a distance image on the basis of the TOF information calculated by the TOF calculation unit; and
a comparison evaluation unit which compares the distance image generated by the distance image generation unit and the depth image generated by the depth image generation unit.
In the case of the above invention, it is preferred that the modeling unit has a function to acquire the result of comparison by the comparison evaluation unit as feedback information, adjust conditions of the modeling on the basis of the acquired feedback information, and perform modeling again.
In the case of the above invention, it is preferred that the modeling unit repeats the modeling until matching error of the comparison result by the comparison evaluation unit becomes smaller than a predetermined threshold by repeating acquisition of the feedback information on the basis of the modeling and the comparison.
Furthermore, the present invention is related to a simulation system, a program and a method of a recognition function module for an image varying in correspondence with position shifting information of a vehicle, comprising:
a positional information acquisition unit which acquires positional information of the vehicle in relation to a surrounding object on the basis of a detection result by a sensor unit;
an image generation unit which generates a simulation image for reproducing an area specified by the positional information on the basis of the positional information acquired by the positional information acquisition unit;
an image recognition unit which recognizes and detects a particular object by the recognition function module in the simulation image generated by the image generation unit;
a positional information calculation unit which generates a control signal for controlling behavior of the vehicle by the use of the recognition result of the image recognition unit, and changes/modifies the positional information of own vehicle on the basis of the generated control signal; and
a synchronization control unit which controls synchronization among the positional information acquisition unit, the image generation unit, the image recognition unit and the positional information calculation unit.
In the case of the above invention, it is preferred that the synchronization control unit further comprises:
a unit of packetizing the positional information in a particular format and transmitting the packetized positional information;
a unit of transmitting the packetized data through a network or a transmission bus in a particular device;
a unit of receiving and depacketizing the packetized data; and
a unit of receiving the depacketized data and generating an image.
In the case of the above invention, it is preferred that the synchronization control unit transmits and receives signals among the respective units in accordance with UDP (User Datagram Protocol).
In the case of the above invention, it is preferred that the positional information of the vehicle includes information about any of XYZ coordinates of road surface absolute position coordinates of the vehicle, XYZ coordinates of road surface absolute position coordinates of tires, Euler angles of own vehicle and a wheel rotation angle.
In the case of the above invention, it is preferred that the image generation unit is provided with a unit of synthesizing a three-dimensional profile of the vehicle by computer graphics.
In the case of the above invention, it is preferred that, as the above vehicle, a plurality of vehicles are set up for each of which the recognition function operates, that
the positional information calculation unit changes/modifies the positional information of each of the plurality of vehicles by the use of information about the recognition result of the recognition unit, and that
the synchronization control unit controls synchronization among the positional information acquisition unit, the image generation unit, the image recognition unit and the positional information calculation unit for each of the plurality of vehicles.
In the case of the above invention, it is preferred that the image generation unit is provided with a unit of generating a different image for each sensor unit.
Also, in the case of the above invention, it is preferred that there is provided, as the sensor unit, with any or all of an image sensor, a LiDAR sensor, a millimeter wave sensor and an infrared sensor.
In the case of the above invention, it is preferred that the simulation system is provided with a unit of generating images corresponding to a plurality of sensors, a recognition unit supporting the generated images, a unit of performing the synchronization control by the use of the plurality of the recognition results.
In the case of the invention related to the simulation system, the program and the method, it is further preferred that the above invention of the image generation system, the image generation program and the image generation method are provided as the image generation unit as described above, and that
the depth image generated by the depth image generation unit of the image generation system is input to the image recognition unit as the simulation image.
As has been discussed above, in accordance with the above inventions, it is possible for learning of a recognition function module such as deep learning (machine learning) to increase the number of samples by artificially generating images such as CG images which are very similar to actually photographed images and improve the recognition rate by increasing learning efficiency.
Specifically, in accordance with the present invention, it is possible to artificially and infinitely generate images with a light source, an environment and the like which are do actually not exist by making use of a means for generating and as synthesizing CG images with high reality on the basis of a simulation model. Test can be conducted as to whether or not target objects can be recognized and extracted by inputting the generated images to the recognition function module in the same manner as inputting conventional camera images, and performing the same process with the generated images as with the camera images, and therefore it is possible to perform learning with such types of images as conventionally difficult or impossible to acquire or take, and furthermore to effectively improve the recognition rate by increasing learning efficiency.
Furthermore, synergistic effects can be expected by simultaneously using different types of sensors such as a millimeter wave sensor and a LiDAR sensor capable of extracting a three-dimensional profile of an object in addition to an image sensor capable of acquiring a two-dimensional image and generating images of these sensors to make it possible to conduct extensive tests and perform brush-up of a recognition technique at the same time.
Incidentally, the application of the present invention covers a wide field, such as, for automatic vehicle driving, experimental apparatuses, simulators, software modules and hardware devices related thereto (for example, a vehicle-mounted camera, an image sensor, a laser sensor for measuring a three-dimensional profile of the circumference of a vehicle), and machine learning software such as deep learning. Also, since a synchronization control technique is combined with a CG technique capable of realistically reproducing actually photographed image, the present invention can be widely applied to other fields than the automatic driving of a vehicle. For example, potential fields of application include a simulator of surgical operation, a military simulator and a safety running test system for robot, drone or the like.
In what follows, with reference to the accompanying drawings, a near infrared ray virtual image generation system in accordance with the present invention will be explained in detail. In the case of the present embodiment, for the purpose of replacing photographed images taken by various types of sensors which are regarded indispensable for automatic driving, a system is built which generates images, which considerably resemble photographed images, by a CG technique.
Incidentally, the near infrared ray virtual image generation system in accordance with the present embodiment is implemented, for example, by executing software installed in a computer to build virtual various modules on an arithmetic processing unit such as a CPU installed in the computer. Meanwhile, in the context of this document, the term “module” is intended to encompass any function unit capable of performing necessary operation, as implemented with hardware such as a device or an apparatus, software capable of performing the functionality of the hardware, or any combination thereof.
As shown in
The scenario creation unit 10 is a means for creating a scenario data which determines what CG is to be generated. This scenario creation unit 10 is provided with a means for determining three-dimensional profile information of target objects, behavior information of target objects, material information of target objects, parameter information of light sources, positional information of cameras and positional information of sensors. For example, in the case of CG for use in automatic driving, while there are a number of target objects such as a road, a building, a vehicle, a walker, a bicycle, a road side strip and a traffic signal in a virtual space, scenario data defines what target objects exist in what positions (coordinates, altitudes) of the virtual space and what motion is taken in what direction, and also defines the position (view point) of a virtual camera in the virtual space, the number and types of light sources, the positions and direction of each light source, movement and behavior of the target objects in the virtual space and the like.
It is determined first by this scenario creation unit 10 what kinds of CG images are generated. The 3D modeling unit 11 generates 3D images in accordance with the scenario created by the scenario creation unit 10.
The 3D modeling unit 11 is a module for generating the profile of an object in the virtual space by setting the coordinates of each vertex for forming the exterior shape of the object and the profile of the internal structure thereof and setting the parameters of equations representing the boundaries and surfaces of the profile to build the three-dimensional shape of the object. Specifically, this 3D modeling unit 11 performs modeling of information such as the 3D profile of a road, the 3D profile of a vehicle traveling on the road and the 3D profile of a walker.
The 3D shading unit 12 is a module for generating actual 3D CG by the use of each 3D model data D101 generated by the 3D modeling unit 11 to represent shading of an object of 3D CG by a shading process so that a stereoscopic real image is generated in accordance with the position of a light source and the intensity of light.
The R image gray scale conversion unit 13 is a module for functioning as a component extraction unit which extracts predetermined components contained in a shading image transmitted from the 3D shading unit 12, and as a gray scale conversion unit which converts the extracted component image to a gray scale image. Specifically, the R image gray scale conversion unit 13 extracts, as a component image, the R component from the shading image D103 which is an RGB image transmitted from the 3D shading unit 12, converts the R component of the extracted R component image to a gray scale image, and outputs a gray scale image D104 (Img(x, y), x: horizontal coordinate value, and y: vertical coordinate value) as illustrated in
The depth image generation unit 14 is a module for acquiring 3D profile data of each target object in a screen on the basis of modeling information D102 of each individual 3D profile model input from the 3D shading unit 12, and generating a depth image (also called a Depth-map) 105 on the basis of the distance to each target object.
(Operation of the Near Infrared Ray Virtual Image Generation System)
The near infrared ray virtual image generation method of the present invention can be implemented by operating the near infrared ray virtual image generation system having the structure as described above.
First, the scenario creation unit 10 creates a scenario what CG is to be generated. For example, in the case of CG for automatic driving, the scenario creation unit 10 creates a scenario which defines in what positions are located a number of target object such as a road, a building, a vehicle, a walker, a bicycle, a road side strip and a traffic signal, and what motion is taken in what direction, and also defines the position of a camera, the number and types of light sources.
This scenario creation unit 10 determines what CG is to be generated. Next, modeling of information such as the 3D profile of a road, the 3D profile of a vehicle traveling on the road and the 3D profile of a walker is performed in accordance with the scenario created by the scenario creation unit 10. Incidentally, modeling means can easily be implemented by, for example with respect to roads, using “high precision map database” which is made by moving a number of vehicles each of which is equipped with a vehicle-mounted device 1b as illustrated in
Next, the 3D modeling unit 11 acquires or generates a 3D profile model of each target object as required on the basis of scenario information D100 created by the scenario creation unit 10. Then, the 3D shading unit 12 generates actual 3D CG by the use of each 3D model data D101 which is generated by the 3D modeling unit 11.
Also, the R component shading image D103 transmitted from the 3D shading unit 12 is converted to a gray scale image of the R image as illustrated in
Then, after gray scale conversion by the process as described above, image recognition is performed by the use of the gray scale image D104 and the depth image D105 which are transmitted as output images of the present embodiment.
In what follows, with reference to the accompanying drawings, a second embodiment of the system in accordance with the present invention will be explained in detail. Meanwhile, in the description of the present embodiment, like reference numbers indicate functionally similar elements as the above first embodiment unless otherwise specified, and therefore no redundant description is repeated.
(Overall Configuration of a LiDAR Sensor Virtual Image Generation System)
In the case of the present embodiment, a system making use of a LiDAR sensor will be described. The system in accordance with the present embodiment is implemented as illustrated in
The shading unit 15 of the present embodiment is a module for generating actual 3D CG by the use of each 3D model data D101 generated by the 3D modeling unit 11 to represent shading of an object of 3D CG by a shading process so that a stereoscopic real image is generated with the position of a light source and the intensity of light. Particularly, the shading unit 15 of the present embodiment is provided with a laser irradiated portion extraction unit 15a which extracts a 3D profile only from a portion which is irradiated with laser light, performs shading of the extracted 3D profile and outputs a shading image D106. Also, since the reflected light of laser light has no color component such as RGB, the shading image D106 is output from the shading unit 15 directly as a gray scale image.
Also, the depth image generation unit 16 is a module for acquiring the 3D profile data of each target object in a screen on the basis of modeling information D102 of each individual 3D profile model input from the 3D shading unit 12, and generating a depth image (also called a Depth-map) 105 on the basis of the distance to each target object. Particularly, the depth image generation unit 16 of the present embodiment outputs a depth image D108 extracted only from a portion which is irradiated with laser light by the laser irradiated portion extraction unit 16a.
(Operation of the LiDAR Sensor Virtual Image Generation System)
Next, the operation of the LiDAR sensor virtual image generation system having the structure as described above will be explained.
In the case of near infrared light, the image shown in
The LiDAR makes use of near-infrared micropulse light (for example, wavelength of 905 nm) as laser light. The LiDAR includes a scanner and an optical system which are constructed by, for example, a motor, mirrors and lenses. On the other hand, a light receiving unit and a signal processing unit receive reflected light and calculate a distance by signal processing.
In this case, the LiDAR employs a LiDAR scan device 114 which is called TOF system (Time of Flight). This LiDAR scan device 114 outputs laser light as an irradiation pulse Plu1 from a light emitting element 114b through an irradiation lens 114c on the basis of the control by a laser driver 114a as illustrated in
L=(c×t)/2
The basic operation of this LiDAR system is such that, as illustrated in
As described above, since the laser light of the LiDAR sensor has a strong directivity, even when laser light is radiated into the distance, the laser light tends to be radiated only to part of the screen. Accordingly, the shading unit 15 shown in
On the other hand, receiving 3D profile data D107 of the laser irradiated portion, likewise, the depth image generation unit 16 outputs a depth image D108 extracted only from a portion which is irradiated with laser light by the laser irradiated portion extraction unit 16a.
Accordingly, for example, the shading unit 15 has to generate an image corresponding to a 3D profile of the vehicle shown in
By the process as described above, the depth image D108 and the shading image D106 as a gray scale image are transmitted as output images of the present embodiment. These two output images can be used for image recognition and recognition function learning.
Next, a deep learning recognition system of a virtual image in accordance with a third embodiment of the present invention will be explained. In the case of the present embodiment, it makes it possible to supply various sensors with virtual environment images in an environment in which imaging is actually impossible by applying the virtual image system with a near infrared sensor as described in the first embodiment and the virtual image system with a LiDAR sensor as described in the second embodiment to an AI recognition technique such as a deep learning recognition system commonly used for automatic driving or the like.
(Configuration of a Deep Learning Recognition System of Virtual Images)
The neural network calculation unit 17 is provided with a neural network consisting of a number of layers, as illustrated in
On the other hand, the back propagation unit 18 receives calculation data D110 which is a calculation result from the neural network calculation unit 17, and calculates error from teacher data which is the comparison target (for example, an irradiation image, a depth image or the like data on the basis of actual photography can be used). The system as illustratively shown in
In this case, arithmetic operations are performed in accordance with the back propagation method in the back propagation unit 18. This back propagation method calculates how much there is error between teacher data and output data of the neural network, and has the result thereof propagate backward again from the output side in the input direction. In the case of the present embodiment, receiving the error data D109 which is fed back, the neural network calculation unit 17 performs predetermined calculation again, and inputs the result thereof to the back propagation unit 18. The above process in loop is repeated until the error data becomes smaller than a predetermined threshold, and the neural network calculation is finished when the error data has been converged fully.
When the above-mentioned process is completed, the coefficient values (608, 610) in the neural network in the neural network calculation unit 17 are determined, and it is possible to perform deep learning recognition for an actual image with this neural network.
Incidentally, while deep learning recognition in the case of the present embodiment is illustratively described for the output image of the near infrared light image as described in the first embodiment, it is possible to perform, completely in the same way, deep learning recognition for the output image of a LiDAR sensor as described in the second embodiment by the similar technique. In such a case, the input images in the left side of
Next, a fourth embodiment of the present invention will be explained. In the case of the second embodiment as described above, of the output images of the virtual image system utilizing a LiDAR sensor, the depth image D108 is output from the depth image generation unit 16. As an evaluation point of this simulation system, it is very important how much accuracy this depth image has as a distance image actually obtained with assumed laser light. In the present embodiment, an example in which the present invention is applied to an evaluation system for evaluating this depth image will be explained.
(Configuration of a Depth Image Evaluation System)
As shown in
The TOF calculation unit 19 is a module for calculating TOF information which includes TOF values and the like with respect to the depth image D108 generated by the depth image generation unit 16. The TOF value corresponds to a delay time which is a time difference between emission of a projection pulse from a light source and reception of the projection pulse by a sensor as a light reception pulse after reflection on the subject. This delay time is output from the TOF calculation unit 19 as a TOF value D113.
The distance image generation unit 20 is a module for acquiring a TOF of each point of a laser irradiated portion on the basis of the TOF value calculated by the TOF calculation unit 19, calculating the distance L to each point on the basis of the delay time of the each point, and generating a distance image D114 which represents the distance L to each point by an image.
The comparison evaluation unit 21 is a module for performing comparison calculation between the distance image D114 generated by the distance image generation unit 20 and the depth image D108 as input from the depth image generation unit 16, and performing evaluation on the basis of the result of comparison including the matching degree therebetween. The method of comparison can be performed by the use of absolute value mean square error or the like which is generally used. The greater the value of the comparison result, the greater the difference therebetween, so that it is possible to evaluate how much the depth image based on 3D CG is close to the distance image generated by actually assuming TOF of laser light.
(Operation of the Depth Image Evaluation System)
Next, the operation of the depth image evaluation system having the structure as described above will be explained.
After receiving the depth image D108 generated by the depth image generation unit 16, the TOF calculation unit 19 calculates the TOF. This TOF is “t” described with respect to
As has been discussed above, the TOF value D113 calculated by the TOF calculation unit 19 shown in
L=(½)×c×t
(c: the velocity of light, t: TOF)
In accordance with the above equation, the distance image D114 of each point of the irradiated image portion is generated by the distance image generation unit 20. Thereafter, comparison calculation is performed between the depth image D108 and the distance image D114. The comparison means can be implemented with absolute value mean square error or the like which is generally used. The greater the value of the comparison result, the greater the difference therebetween, so that it is possible to evaluate how much the depth image based on 3D CG is close to the distance image generated by actually assuming TOF (this is correct) of laser light.
A comparison result D115 may be output as a numeric value such as an absolute value mean square error as described above or a signal indicative that both are not matched after the threshold process. In the latter case, for example, the result may be fed back to the 3D modeling unit 11 shown in
Next, a fifth embodiment of the present invention will be explained. While each of the first to the fourth embodiment is related to the means for generating a near infrared ray or LiDAR sensor virtual image, the present embodiment is related to the explanation of control to actually perform automatic driving on a real time base by the use of these virtual images. In the case of the present embodiment, an example is described in the case where the simulator system of the present invention is applied to the machine learning and test of an image recognition function module of an automated vehicle driving system.
In this description, the automated driving system is a system such as ADAS (advanced driver assistance system) or the like to detect and avoid the possibility of an accident in advance, and performs control to decrease the speed of the vehicle and avoid the objects and the like by recognizing a camera image (real image) acquired with a camera actually mounted on a vehicle to detect objects such as other vehicles, walkers and a traffic signal in accordance with an image recognition technique for the purpose of realizing automatic traveling of vehicles.
(Overall Configuration of Vehicle Synchronization Simulator System)
The communication network 3 is an IP network using the communication protocol TCP/IP, and a distributed communication network which is constructed by connecting a variety of communication lines (a public network such as a telephone line, an ISDN line, an ADSL line or an optical line, a dedicated communication line, the third generation (3G) communication system such as WCDMA (registered trademark) and CDMA2000, the fourth generation (4G) communication system such as LTE, the fifth generation (5G) or later communication system, and a wireless communication network such as wifi (registered trademark) or Bluetooth (registered trademark)). This IP network includes a LAN such as a home network, an intranet (a network within a company) based on 10BASE-T, 100BASE-TX or the like. Alternatively, in many cases, simulator software is installed in the PC 1a. In this case, simulation can be performed by such a PC alone.
The simulator server 2 is implemented with a single server device or a group of server devices each of which has functions implemented by a server computer or software capable of performing a variety of information processes. This simulator server 2 includes a server computer which executes server application software, or an application server in which is installed middleware for managing and assisting execution of an application on such a computer.
Furthermore, the simulator server 2 includes a Web server which processes a http response request from a client device. The Web server performs data processing and the like, and acts as an intermediary to a database core layer in which a relational database management system (RDBMS) is executed as a backend. The relational database server is a server in which a database management system (DBMS) operates, and has functions to transmit requested data to a client device and an application server (AP server) and rewrite or delete data in response to an operation request.
The information processing terminal 1a and the vehicle-mounted device 1b are client devices connected to the communication network 3, and provided with arithmetic processing units such as CPUs to provide a variety of functions by running a dedicated client program 5. This information processing terminal may be implemented with a general purpose computer such as a personal computer or a dedicated device having necessary functions, and includes a smartphone, a mobile computer, PDA (Personal Digital Assistance), a cellular telephone, a wearable terminal device, or the like.
This information processing terminal 1a or the vehicle-mounted device 1b can access the simulator server 2 through the dedicated client program 5 to transmit and receive data. Part or entirety of this client program 5 is involved in a driving simulation system and a vehicle-mounted automated driving system, and executed to recognize images captured by a vehicle-mounted camera, or captured scenery images (including CG motion pictures in the case of the present embodiment) and the like by the use of an image recognition technique to detect objects such as other vehicles, walkers and a traffic signal in the images, calculate the positional relationship between own vehicle and the object on the basis of the recognition result, and performs control to decrease the speed of the vehicle and avoid the objects and the like in accordance with the calculation result. Incidentally, the client program 5 of the present embodiment has the simulator server 2 perform an image recognition function, and calculates or acquires the positional information of own vehicle by having the own vehicle virtually travel on a map in accordance with the recognition result of the simulator server 2 or having the own vehicle actually travel on the basis of the automatic driving mechanism of a vehicle positional information calculation unit 51 shown in
(Configuration of Each Device)
Next, the configuration of each device will specifically be explained.
(1) Configuration of the Client Device
The information processing terminal 1a can be implemented with a general purpose computer such as a personal computer or a dedicated device. On the other hand, the vehicle-mounted device 1b may be a general purpose computer such as a personal computer, or a dedicated device (which can be regarded as a car navigation system) such as an automated driving system. As illustrated in
The memory 103 and the storage device 101 accumulate data on a recording medium, and read out accumulated data from the recording medium in response to an request from each device. The memory 103 and the storage device 101 may be implemented, for example, by a hard disk drive (HDD), a solid state drive (SSD), a memory card, and the like. The input interface 103 is a module for receiving operation signals from an operation device such as a keyboard, a pointing device, a touch panel or buttons. The received operation signals are transmitted to the CPU 102 so that it is possible to perform operations of an OS or each application. The output interface 105 is a module for transmitting image signals and sound signals to output an image and sound from an output device such as a display or a speaker.
Particularly, in the case where the client device is a vehicle-mounted device 1b, this input interface 104 is connected to a system such as the above ADAS for automatic driving system, and also connected to an image sensor such as a camera 104a or the like mounted on a vehicle, or a various sensor means such as a LiDAR sensor, a millimeter wave sensor, an infrared sensor or the like, for the purpose of realizing the automated driving traveling of a vehicle.
The communication interface 106 is a module for transmitting and receiving data to/from other communication devices on the basis of a communication system including a public network such as a telephone line, an ISDN line, an ADSL line or an optical line, a dedicated communication line, the third generation (3G) communication system such as WCDMA (registered trademark) and CDMA2000, the fourth generation (4G) communication system such as LTE, the fifth (5G) generation or later communication system, and a wireless communication network such as wifi (registered trademark) or Bluetooth (registered trademark)).
The CPU 102 is a device which performs a variety of arithmetic operations required for controlling each element to virtually build a variety of modules on the CPU 102 by running a variety of programs. An OS (Operating System) is executed and run on the CPU 102 to perform management and control of the basic functions of the information processing terminals 1a to 1c, 4 and 5. Also, while a variety of applications can be executed on this OS, the basic functions of the information processing terminal are managed and controlled by running the OS program on the CPU 102, and a variety of function modules are virtually built on the CPU 102 by running applications on the CPU 102.
In the case of the present embodiment, a client side execution unit 102a is formed by executing the client program 5 on the CPU 102 to generate or acquire the positional information of own vehicle on a virtual map or a real map, and transmit the positional information to the simulator server 2. The client side execution unit 102a receives the recognition result of scenery images (including CG motion pictures in the case of the present embodiment) obtained by the simulator server 2, calculate the positional relationship between own vehicle and the object on the basis of the received recognition result, and performs control to decrease the speed of the vehicle and avoid the objects and the like on the basis of the calculation result.
(2) Configuration of the Simulator Server
The simulator server 2 in accordance with the present embodiment is a group of server devices which provide a vehicle synchronization simulator service through the communication network 3. The functions of each server device can be implemented by a server computer capable of performing a variety of information processes or software capable of performing the functions. Specifically, as illustrated in
The communication interface 201 is a module for transmitting and receiving data to/from other devices through the communication network 3 on the basis of a communication system including a public network such as a telephone line, an ISDN line, an ADSL line or an optical line, a dedicated communication line, the third generation (3G) communication system such as WCDMA (registered trademark) and CDMA2000, the fourth generation (4G) communication system such as LTE, the fifth (5G) generation or later communication system, and a wireless communication network such as wifi (registered trademark) or Bluetooth (registered trademark)).
As shown in
The UDP information transmitter receiver unit 206 is a module for transmitting and receiving data to/from the client side execution unit 102a of the client device 1 in cooperation. In the case of the present embodiment, the positional information is calculated or acquired in the client device 1 side, and packetized in a particular format. While the packetized data is transmitted to the simulator server 2 through a network or a transmission bus in a particular device, the packet data is received and depacketized by the simulator server 2, and the depacketized data is input to an image generation unit 203 to generate images. Meanwhile, in the case of the present embodiment, the UDP information transmitter receiver unit 206 transmits and receives, by the use of UDP (User Datagram Protocol), signals which are transmitted and received among the respective devices with the UDP synchronization control unit 202.
The above various databases include a map database 210, a vehicle database 211 and a drawing database 212. Incidentally, these databases can be referred to each other by a relational database management system (RDBMS).
The simulation execution unit 205 is a module for generating a simulation image reproducing an area specified on the basis of positional information generated or acquired by the positional information acquisition means of the client device 1 and transmitted to the simulator server 2, and recognizing and detecting particular objects in the generated simulation image by the use of the recognition function module. Specifically, the simulation execution unit 205 is provided with the image generation unit 203 and the image recognition unit 204.
The image generation unit 203 is a module for acquiring the positional information acquired or calculated by the positional information acquisition means of client device 1 and generating a simulation image for reproducing, by a computer graphics technique, an area (scenery based on latitude and longitude coordinates of a map, and direction and a view angle) specified on the basis of the positional information. The simulation image generated by this image generation unit 203 is transmitted to the image recognition unit 204. Incidentally, this image generation unit 203 can be implemented as the near infrared ray virtual image generation system as explained in the first embodiment or the LiDAR virtual image generation system as explained in the second embodiment, and the image recognition unit 204 may receive various virtual images generated by these systems in accordance with a computer graphics technique.
The image recognition unit 204 is a module for recognizing and detecting particular objects in the simulation image generated by the image generation unit 203 with the recognition function module 204a which is under test or machine learning. The recognition result information D06 of this image recognition unit 204 is transmitted to the vehicle positional information calculation unit 51 of the client device 1. The image recognition unit 204 is provided with a learning unit 204b to perform machine learning of the recognition function module 204a.
This recognition function module 204a is a module for acquiring an image acquired with a camera device or CG generated by the image generation unit 203, hierarchically extracting a plurality of feature points in the acquired image, and recognizing objects from the hierarchical combination patterns of the extracted feature points. The learning unit 204b promotes diversification of extracted patterns and improves learning efficiency by inputting images captured by the above camera device or virtual CG images to extract feature points of images which are difficult to image and reproduce in practice.
This recognition function module 204a of the image recognition unit may be implemented by applying the neural network calculation unit 17 of the virtual image deep learning recognition system as explained in the third embodiment, and the learning unit 204b may be implemented by applying the back propagation unit 18 as described above.
(Method of the Vehicle Synchronization Simulator System)
The vehicle synchronization simulation method can be implemented by operating the vehicle synchronization simulator system having the structure as described above.
At first, the vehicle positional information calculation unit 51 acquires vehicle positional information D02 of own vehicle (S101). Specifically, the client program 5 is executed in the client device 1 side to input a various data group D01 such as map information and vehicle initial data to the vehicle positional information calculation unit 51. Next, the positional information of own vehicle on a virtual map or an actual map is calculated (generated) or acquired by the use of the data group D01. The result is transmitted to the simulation execution unit 205 of the simulator server 2 (S102) as vehicle positional information D02 through the UDP synchronization control unit 202 or the UDP information transmitter receiver unit 206.
Specifically speaking, the vehicle positional information calculation unit 51 transmits the vehicle positional information D02 of own vehicle to the UDP synchronization control unit 202 in accordance with the timing of a control signal D03 from the UDP synchronization control unit 202. Of initial data of the vehicle positional information calculation unit 51, map data, the positional information of own vehicle in the map, the rotation angle and diameter of a wheel of the vehicle body frame and the like information, can be loaded from the predetermined storage device 101. The UDP synchronization control unit 202 and the UDP information transmitter receiver unit 206 transmit and receive data from/to the client side execution unit 102a of the client device 1 in cooperation. Specifically, the UDP synchronization control unit 202 and the UDP information transmitter receiver unit 206 transmit the vehicle positional information D02 calculated or acquired in the client device 1 side to the simulator server 2 as packet information D04 packetized in a particular format with a various data group including vehicle information.
While this packetized data is transmitted through a network or a transmission bus in a particular device, the packet data is received and depacketized by the simulator server 2 (S103), and the depacketized data D05 is input to the image generation unit 203 of the simulation execution unit 205 to generate CG images. In this case, the UDP information transmitter receiver unit 206 transmits and receives the packetized packet information D04 of a various data group including vehicle information among the respective devices by the UDP synchronization control unit 202 according to UDP (User Datagram Protocol).
Specifically describing, the UDP synchronization control unit 202 converts the various data group into the packetized packet information D04 by UDP packetizing the vehicle positional information D02 of own vehicle. Thereby, data transmission and reception by the use of the UDP protocol becomes easy. At this time, UDP (User Datagram Protocol) will be described to some extent. Generally speaking, while TCP is high reliable and connection oriented and performs windowing control, retransmission control and congestion control, UDP is a connection-less protocol which has no mechanism to secure reliability but has a substantial advantage due to low delay because the process is simple. In the case of the present embodiment, since low delay is required during transmitting data among the constituent elements, UDP is employed instead of TCP. Alternatively, RTP (Realtime Transport Protocol) may be used as the most common protocol for voice communication and video communication at the present time.
Next, the vehicle positional information D02 of own vehicle specifically contains, for example, the following information.
Receiving the vehicle positional information D02, the UDP information transmitter receiver unit 206 transmits data D05 necessary mainly for generating a vehicle CG image, from among information about the vehicle, e.g., XYZ coordinates as the positional information of the vehicle, XYZ coordinates as the positional information of tires, Euler angles and other various information.
Then, the packet information D04 as UDP packets of the various data group is divided into a packet header and a payload of a data body by a depacketizing process in the UDP information transmitter receiver unit 206. In this case, the UDP packet data can be exchanged by transmission between places remote from each other through a network or transmission inside a single apparatus such as a simulator through a transmission bus. The data D05 corresponding to a payload is input to the image generation unit 203 of the simulation execution unit 205 (S104).
In the simulation execution unit 205, the image generation unit 203 acquires positional information acquired or calculated by the positional information acquisition means of the client device 1 as the data D05, and generates a simulation image for reproducing, by a computer graphics technique, an area (scenery based on latitude and longitude coordinates of a map, a direction and a view angle) specified on the basis of the positional information (S105). The image D13 for simulation generated by this image generation unit 203 is transmitted to the image recognition unit 204.
The image generation unit 203 generates a realistic image by a predetermined image generation method, for example, a CG image generation technique which makes use of the latest physically based rendering (PBR) technique. The recognition result information D06 is input to the vehicle positional information calculation unit 51 again and used, e.g., for calculating the positional information of own vehicle for determining the next behavior of the own vehicle.
The image generation unit 203 generates objects such as a road surface, buildings, a traffic signal, other vehicles and walkers by, for example, a CG technique making use of the PBR technique. This can be understood as feasible with the latest CG technique from the fact that objects such as described above are reproduced in a highly realistic manner in a title of a game machine such as PlayStation. In many cases, object images other than own vehicle are stored already as initial data. Particularly, in an automatic driving simulator, a large amount of sample data such as a number of highways and general roads is stored in a database which can readily be used.
Next, the image recognition unit 204 recognizes and extracts particular targets, as objects, by the use of the recognition function module 204a which is under test or machine learning from simulation images generated by the image generation unit 203 (S106). In this case, if there is no object which is recognized (“N” in step S107), the process proceeds to the next time frame (S109), and the above processes S101 to S107 are repeated (“Y” in step S109) until all the time frames are processed (“N” in step S109).
On the other hand, if there is an object which is recognized (“Y” in step S107), the recognition result of this image recognition unit 204 is transmitted to the vehicle positional information calculation unit 51 of the client device 1 as the recognition result information D06. The vehicle positional information calculation unit 51 of the client device 1 acquires the recognition result information D06 of the image recognition unit 204 through the UDP information transmitter receiver unit 206, generates control signals for controlling vehicle behavior by the use of the acquired recognition result, changes/modifies the positional information of own vehicle on the basis of the generated control signals (S108).
Specifically describing, the simulation image D13 which is generated here is input to the image recognition unit 204 and, as already described above, objects are recognized and detected by, for example, a recognition technique such as deep learning. The recognition results as obtained are given as area information in a screen (for example, two-dimensional XY coordinates of an extracted rectangular area) such as other vehicles, walkers, road markings and a traffic signal.
When running a simulator for automatic driving, there are a number of objects such as other vehicles, walkers, buildings and a road surface in a screen in which an actual vehicle is moving. Automatic driving is realized, for example, by automatically turning the steering wheel, stepping on the accelerator, applying the brake and so on while obtaining realtime information obtained from a camera mounted on the vehicle, a millimeter wave sensor, a radar and other sensors.
Accordingly, in the case of the near infrared light image described in the embodiment 1, a recognition technique such as deep learning as described in the embodiment 3 is used to recognize and discriminate objects necessary for automatic driving such as other vehicles, walkers, road markings and a traffic signal from among objects displayed on a screen.
For example, when another vehicle cuts in front of own vehicle, the image recognition unit 204 detects the approach by an image recognition technique, and outputs the recognition result information D06 of the recognition result to the vehicle positional information calculation unit 51. The vehicle positional information calculation unit 51 changes the positional information of own vehicle by turning the steering wheel to avoid the another vehicle, applying the brake to decelerate own vehicle or performing the like operation. In an another case where a walker suddenly runs out in front of own vehicle, likewise, the vehicle positional information calculation unit 51 changes the positional information of own vehicle by turning the steering wheel to avoid this walker, applying the brake to decelerate own vehicle or performing the like operation.
Meanwhile, in the above described configuration, it is assumed that data is transmitted in a cycle of 25 msec (25 msec is only one example) according to the UDP protocol from the vehicle positional information calculation unit 51 to the simulation execution unit 205 through the UDP synchronization control unit 202 and the UDP information transmitter receiver unit 206.
Incidentally, the need of “synchronizing model” which is a characteristic feature of the present invention exists because the vehicle positional information of the next time frame is determined on the basis of the output result from the simulation execution unit 205 so that the behavior of a real vehicle cannot be simulated unless the entirety can be synchronously controlled. In the above example, transmission is performed in a cycle of 25 msec. However, ideal delay is zero which is practically impossible, so that UDP is employed to reduce the delay time associated with transmission and reception.
Generally speaking, in the case of an automatic driving simulator, test has to be conducted with a very large amount of motion image frames. It is an object of the present embodiment to substitute CG images nearer to actual photographs for an unquestioning amount which cannot be covered by real driving. Accordingly, it is necessary to guarantee operations in response to a long sequence of video sample data.
In the case of the present embodiment, the learning unit 204b diversifies extracted pattern to improve learning efficiency by inputting, in addition to images taken by a vehicle mounted camera during real driving, virtual CG images generated by the image generation unit 203 to the recognition function module 204a to extract the feature points of images which are difficult to take and reproduce. The recognition function module 204a acquires images taken by the camera device and CG images, hierarchically extracts a plurality of feature points in the acquired images, and recognizes objects by the deep learning recognition technique already described in the embodiment 3 on the basis of combinational hierarchic patterns of the extracted objects.
In what follows, with reference to the accompanying drawings, a sixth embodiment of the system in accordance with the present invention will be explained in detail.
As shown in
The vehicle positional information calculation units 51c to 51f transmit vehicle positional information D02c to D02f to the UDP synchronization control unit 202 with the timing of control signals D03c to D03f. Next, the UDP synchronization control unit 202 converts the vehicle positional information D02c to D02f to packet information D04 by UDP packetization. Thereby, data transmission and reception by the use of the UDP protocol becomes easy. The packet information D04 is divided into a packet header and a payload of a data body by a depacketizing process in the UDP information transmitter receiver unit 206. In this case, the UDP packet data can be exchanged by transmission between places remote from each other through a network or transmission inside a single apparatus such as a simulator through a transmission bus. The data D05c to D05f corresponding to a payload is input to the simulation execution units 205c to 205f.
As has already been discussed above in the first embodiment, the simulation execution units 205c to 205f generates a realistic image by a predetermined image generation method, for example, a CG image generation technique which makes use of the latest physically based rendering (PBR) technique. The recognition result information D06c to D06f is fed back to the vehicle positional information calculation units 51c to 51f to change the position of each vehicle.
Incidentally, while there are four vehicle positional information calculation units 51c to 51f in the above example, this number is not limited to four. However, if the number of vehicles to be supported increases, synchronization control as a result becomes complicated, and there is a problem that when there occurs a substantial delay in a certain vehicle, the total delay time increases since the delay times of the vehicles are summed up. Accordingly, the configuration can be designed in accordance with the hardware scale, processing amount and other conditions of the simulator server.
Incidentally, while PC terminals 1c to 1f are remotely connected to a vehicle synchronization simulator program 4 through the communication network 3 in
Furthermore, while 1c to 1f are not limited to PC terminals, for example, when test is conducted with actually moving vehicles, 1c to 1f can be considered to refer to car navigation systems mounted on the test vehicles. In this case, rather than recognizing the simulation image D13 which is a CG image generated by the image generation unit 203 of
Furthermore, a seventh embodiment of the system in accordance with the present invention will be explained. In the case of the present embodiment, another embodiment implemented with a plurality of sensors will be explained with reference to
As illustrated in
The 3D point group data converted to the 3D graphic image as described above is point group data which is obtained by emitting laser light to all directions of 360 degrees from a LiDAR installed on the running center vehicle shown in
Target objects such as an opposite running vehicle, a walker and a bicycle can be acquired from actual point group data as three-dimensional coordinate data, and therefore it is possible to easily generate 3D graphic images of these target objects. Specifically, a plurality of polygon data items are generated by consistently processing point group data, and 3D graphics can be drawn by rendering these polygon data items.
Then, the 3D point group data graphic image as generated by the above means is input to the deep learning recognition unit 62, and recognized by recognition means which has performed learning for 3D point group data in the deep learning recognition unit 62. Accordingly, different means is used than the deep learning recognition means which has performed learning with images for image sensors as described above, and this is substantially effective. This is because while it is likely that a vehicle which is very far away cannot be acquired by the image sensor, the LiDAR can acquire the size and profile of an oncoming vehicle even at the front of several hundred meters. Conversely, while the LiDAR makes use of reflected light so that there is a problem that the LiDAR is not effective for detecting a target object which is not reflective, there is not such a problem in the case of the image sensor.
As has been discussed above, there are provided a plurality of sensors having different characteristics or different device properties, and the learning result synchronization unit 84 analyzes the recognition results thereof, and output the final recognition result D62. Incidentally, this synchronization unit may be arranged outside, for example, in a network cloud. This is because, while the number of sensors per one vehicle dramatically increases in the future, and the computational load of the deep learning recognition process increases, it is effective to perform processes, which can be handled outside through a network, by a cloud having a large scale computing power, and feed back the results.
Incidentally, while virtual CG images are generated in the case of the embodiment shown in
It is assumed that the object imaging devices are a LiDAR sensor and a millimeter wave sensor as described above besides the image sensor installed in a vehicle mounted camera. In the case of the image sensor, a high quality CG image is generated by a PBR technique as described in the first embodiment with reference to parameters such as light information extracted from a photographed image as acquired, and the CG image is output from the image generation unit 203. On the other hand, in the case of the LiDAR sensor, a three-dimensional point group data is generated from the reflected light of laser light which is a beam emitted from the LiDAR sensor actually mounted on a vehicle. Then, an image as a 3D CG converted from this three-dimensional point group data is output from the above image generation unit 203.
In this way, CG images corresponding to a plurality of types of sensors are emitted from the image generation unit 203, and the recognition process thereof is performed in each deep learning recognition unit of
Number | Date | Country | Kind |
---|---|---|---|
2016-197999 | Oct 2016 | JP | national |
2017-092950 | May 2017 | JP | national |
This Application claims the benefit of priority and is a Continuation application of the prior International Patent Application No. PCT/JP2017/033729, with an international filing date of Sep. 19, 2017, which designated the United States, and is related to the Japanese Patent Application No. 2016-197999, filed Oct. 6, 2016 and Japanese Patent Application No. 2017-092950, filed May 9, 2017, the entire disclosures of all applications are expressly incorporated by reference in their entirety herein.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2017/033729 | Sep 2017 | US |
Child | 16367258 | US |