This application claims priority of Japanese Patent Application No. 2021-121958 (filed Jul. 26, 2021), the entire disclosure of which is hereby incorporated by reference.
The present disclosure relates to a trained model generating method, a user environment estimating method, a trained model generating device, a user environment estimating device, and a trained model generating system.
A known system captures images of components and creates a trained model for use in image recognition of components (see, for example, Patent Literature 1).
In an embodiment of the present disclosure, a trained model generating method includes: acquiring a first model obtained by performing training processing for an estimation target in a first environment by using first image data representing the estimation target in the first environment as learning data; acquiring second image data representing the estimation target in a second environment in which estimation is to be performed; generating a second model based on the first model by using the second image data as learning data; and outputting a trained model based on the second model. The second image data includes an image in which an appearance of the estimation target in the second environment is assumed based on user environment information about the second environment.
In an embodiment of the present disclosure, a user environment estimating method is for estimating a user environment, the user environment being an environment in which data on an estimation target is to be acquired. The user environment estimating method includes outputting a result of estimating the user environment based on image data obtained by capturing a prescribed object in the user environment as user environment information about the user environment.
In an embodiment of the present disclosure, a trained model generating device includes a controller. The controller is configured to acquire a first model obtained by performing training processing for an estimation target in a first environment by using first image data representing the estimation target in the first environment as learning data. The controller is configured to acquire second image data representing the estimation target in a second environment in which estimation is to be performed. The controller is configured to generate a second model based on the first model by using the second image data as learning data. The controller is configured to output a trained model based on the second model. The second image data includes an image in which an appearance of the estimation target in the second environment is assumed based on user environment information about the second environment.
In an embodiment of the present disclosure, a user environment estimating device includes a controller. The controller is configured to estimate a user environment. The user environment is an environment in which data on an estimation target is to be acquired. The controller is configured to output a result of estimating the user environment based on image data obtained by capturing a prescribed object in the user environment as user environment information about the user environment.
In an embodiment of the present disclosure, a trained model generating system includes a trained model generating device configured to perform the trained model generating method and a user environment estimating device configured to perform the user environment estimating method. The trained model generating device is configured to acquire the user environment from the user environment estimating device.
If the environment in which a trained model is to be used for recognition is different from the environment in which the trained model was created, recognition accuracy may be degraded. In other words, the robustness of the trained model used for recognition may be reduced. According to a trained model generating method and a user environment estimating method according to one embodiment of the present disclosure, the robustness of a trained model used for recognition can be improved.
As illustrated in
This trained model is set up, selected, or downloaded to a robot controller or the like that controls a robot such as a cooperative robot, and is used when the robot is to recognize a work target and so forth. Specifically, when the robot performs a task, the trained model can, for example, capture an image of an object in the work environment and, based on the captured image, determine whether or not the captured object is a work target, such as a recognition target or a grasping target, or estimate the grasping position of a grasping target. The robot can then be controlled in accordance with the result of that determination.
The first trained model generating device 110 generates a first trained model by performing training using captured images of recognition targets in a first environment or images in which the appearance of recognition targets in a standard environment is assumed as teacher data. In the present disclosure, the first environment is also referred to as a standard environment. In other words, in the following description, the term “standard environment” can be substituted by the term “first environment”. The first trained model generating device 110 may acquire a captured image of a recognition target in the standard environment from the image-capturing device 40. The standard environment may be an environment that reduces effects on a captured image of the recognition target or reduces effects on an image in which the appearance of the recognition target is assumed. In other words, the standard environment may be less noisy than a user environment described below. Furthermore, when a user uses a device such as a robot equipped with a trained model, factors that may vary depending on the place of use may be smaller in the standard environment than in the user environment described below. In other words, environment information can be said to be an environment in which the factors that may vary from one environment to another in which recognition is performed are small.
The second trained model generating device 210 acquires the first trained model from the first trained model generating device 110. The second trained model generating device 210 updates the first trained model and thereby generates a second trained model by performing training using images in which the appearance of the recognition target in the second environment where recognition is to be performed is assumed as teacher data. In the present disclosure, the second environment is also referred to as the user environment. In other words, in the following description, the term “user environment” can be substituted by the term “second environment”. The environment in which recognition is to be performed may be, for example, a place where a device such as a robot equipped with the final obtained trained model is to be used.
The user environment is different from the standard environment. In this embodiment, the appearance of a recognition target in the standard environment is assumed to be a standard appearance. The appearance of the recognition target in the user environment will differ from the standard appearance. Differences from the standard appearance can be said to be due to the occurrence of noise in the appearance. Therefore, differences between the user environment and the standard environment can be said to cause the occurrence of noise in the appearance of the recognition target.
The trained model generating system 1 according to this embodiment can improve the recognition accuracy for the recognition target in each environment by performing training based on the differences in the appearance of the recognition target in the individual environments. In other words, models having high robustness with respect to environmental differences can be generated. Hereafter, an example configuration of the trained model generating system 1 will be described.
As illustrated in
As illustrated in
As illustrated in
The first trained model generating device 110 and the second trained model generating device 210 may be configured as an integrated device. The user environment estimating device 310 may be configured as an integrated device including the first trained model generating device 110 or the second trained model generating device 210.
The standard environment target data generator 121 generates first image data including an image representing a recognition target in the standard environment. The standard environment target data generator 121 may acquire an image of the recognition target captured in the standard environment from the image-capturing device 40 as an image of the recognition target in the standard environment and use this image as the first image data. The standard environment target data generator 121 may generate an image in which the appearance of the recognition target in the standard environment is assumed as the first image data. In other words, the standard environment target data generator 121 may synthesize the first image data based on design data including CAD (Computer-Aided Design) data or drawings taking the state of the standard environment into account. The standard environment target data generator 121 outputs the first image data to the standard environment target recognition unit 122. The standard environment target data generator 121 may store the first image data in the first data holding unit 131.
The standard environment target recognition unit 122 acquires the first image data from the standard environment target data generator 121. The standard environment target recognition unit 122 may acquire the first image data from the first data holding unit 131. The standard environment target recognition unit 122 generates the first trained model by performing recognition training in the standard environment using the first image data as teacher data. The first trained model is also referred to as a first model. The standard environment target recognition unit 122 stores the first model, which was generated through training using the first image data as teacher data, in the first data holding unit 131.
The user environment target data generator 223 generates second image data including an image representing the recognition target in the user environment. The user environment target data generator 223 may generate an image in which the appearance of the recognition target in the user environment is assumed as the second image data. The user environment target data generator 223 acquires information about the user environment generated by the user environment acquiring unit 325 and the user environment estimating unit 326, which are described later. The information about the user environment is also referred to as user environment information. The user environment target data generator 223 generates the second image data based on the user environment information. In other words, the user environment target data generator 223 may synthesize the second image data based on design data including CAD data or drawings taking the state of the user environment into account. The user environment target data generator 223 outputs the second image data to the user environment target recognition unit 224. The user environment target data generator 223 may store the second image data in the second data holding unit 232. The user environment target data generator 223 may acquire an image of the recognition target captured in the user environment and use the image as the second image data.
The user environment target recognition unit 224 acquires the second image data from the user environment target data generator 223. The user environment target recognition unit 224 acquires the first model from the first data holding unit 131. The user environment target recognition unit 224 performs training using the second image data as teacher data and generates a second model based on the first model. In this embodiment, the user environment target recognition unit 224 generates the second model by updating the first model. Let us assume that the first model generated by the standard environment target recognition unit 122 and stored in the first data holding unit 131 is stored in the second data holding unit 232. The user environment target recognition unit 224 updates the first model by performing reading and writing on the first model stored in the second data holding unit 232, generates the second trained model, and stores the second trained model in the second data holding unit 232. The second trained model is also referred to as the second model. The user environment target recognition unit 224 outputs the second model as a trained model. In other words, the user environment target recognition unit 224 may output a trained model based on the second model. Additionally, training may be performed using images captured in the user environment.
The first model may be stored in the first data holding unit 131. The user environment target recognition unit 224 may update the first model by performing reading and writing on the first model stored in the first data holding unit 131, generate the second model, and store the second model in the first data holding unit 131. The first data holding unit 131 and the second data holding unit 232 may be configured to be indistinguishable from each other or may be configured so as to be integrated with each other.
Although an example in which the second model is generated by updating the first model is described above, the method of generating the second model is not limited to this. For example, the second model may be generated by connecting to the first model an additional trained model that is different from the first model and has undergone training processing for the user environment. The additional trained model is also referred to as an adapter module, for example.
The user environment acquiring unit 325 acquires information to be used in estimating the user environment. The information to be used in estimating the user environment is also referred to as user environment data. The user environment data may include an image captured in the user environment. Specifically, the user environment data may include, for example, an image of the recognition target captured in the user environment, an image of the surroundings of the recognition target captured in the user environment, or an image captured without the recognition target disposed in the user environment. The user environment data may include known information such as lighting conditions in the user environment. The user environment acquiring unit 325 outputs the user environment data to the user environment estimating unit 326. The user environment acquiring unit 325 may store the user environment data in the third data holding unit 333.
The user environment estimating unit 326 estimates the user environment based on the user environment data. The user environment estimating unit 326 may acquire the user environment data from the user environment acquiring unit 325 or from the third data holding unit 333. The user environment may be specified by, for example, lighting conditions. Lighting conditions may include, for example, the position or number of lights, the type of light source, the luminance, brightness, or illuminance of the lights, the color temperature of the lights, or flicker of the lights. The type of light source may be specified based on whether the light source produces parallel light or scattered light. The type of light source may be specified as a point light source, a planar light source, or a ring light source. The user environment may be specified, for example, by the specifications or settings of the image-capturing device 40 used when performing recognition. The user environment may be specified by the conditions of an object that is present other than the recognition target such as a table on which the recognition target is placed or a wall or ceiling of the room in which the recognition target is placed. The user environment may be specified by the recognition target itself, or by a surface condition or reflectance of an object other than the recognition target. The user environment may be specified by the presence or absence of windows or blinds in the room in which the recognition target is placed when recognition is performed. The user environment may be specified by a time series of changes in the sun's rays shining on the location where the recognition target is placed when recognition is performed.
The user environment estimating unit 326 outputs estimation results of the user environment to the user environment target data generator 223 as the user environment information. The user environment acquiring unit 325 may store the user environment information in the fourth data holding unit 334. The user environment target data generator 223 may generate the second image data based on the user environment information as described above. The user environment target data generator 223 may acquire the user environment information from the user environment estimating unit 326 or from the fourth data holding unit 334. When the information to be used in estimating the user environment includes information that can specify the user environment such as lighting conditions, the user environment estimating unit 326 may output the information that can specify the user environment itself as the user environment information.
The first controller 120, the second controller 220, and the third controller 320 may each include at least one processor to realize the functions of each constituent part thereof such as the standard environment target data generator 121. The processor may execute programs that realize the functions of the constituent parts. The processor may be implemented as a circuit that realizes the functions of the constituent parts. The processor may be realized as a circuit that collectively perform the functions of multiple constituent parts. The processor may be implemented as a single integrated circuit. An integrated circuit is also referred to as an IC. The processor may be implemented as multiple integrated circuits and discrete circuits connected so as to be able to communicate with each other. The processor may include a CPU (Central Processing Unit). The processor may include a DSP (Digital Signal Processor) or a GPU (Graphics Processing Unit). The processor may be realized based on various other known technologies.
The first storage unit 130, the second storage unit 230, and the third storage unit 330 may each include an electromagnetic storage medium such as a magnetic disk, or may each include a memory such as a semiconductor memory or a magnetic memory. The first storage unit 130, the second storage unit 230, and the third storage unit 330 may be each configured as a HDD (Hard Disk Drive) or an SSD (Solid State Drive). The first storage unit 130, the second storage unit 230, and the third storage unit 330 may each include an electromagnetic storage medium or a memory corresponding to each constituent part so that data is held separately in each constituent part such as the first data holding unit 131. The first storage unit 130, the second storage unit 230, and the third storage unit 330 may be each configured to hold the data of multiple constituent parts on a single electromagnetic storage medium or memory or the like. The first storage unit 130, the second storage unit 230, and the third storage unit 330 store various information, programs executed by the first controller 120, the second controller 220, and the third controller 320, and so forth. The first storage unit 130, the second storage unit 230, and the third storage unit 330 may respectively function as work memories of the first controller 120, the second controller 220, and the third controller 320. The first controller 120, the second controller 220, and the third controller 320 may respectively include at least part of the first storage unit 130, the second storage unit 230, and the third storage unit 330.
The image-capturing device 40 is configured to be able to capture an image of the recognition target or an object other than the recognition target. The image-capturing device 40 may include an image-capturing element. The image-capturing device 40 may include an optical system including a lens or a mirror. The specifications of the image-capturing device 40 may be specified by resolution or sensitivity. The image-capturing device 40 may be configured to be able to change the resolution or sensitivity when capturing an image of the recognition target or an object other than the recognition target. The specifications of the image-capturing device 40 may be specified by the shutter speed or aperture. The image-capturing device 40 may be configured to be able to change the shutter speed or aperture when capturing an image of the recognition target or an object other than the recognition target.
The first trained model generating device 110, the second trained model generating device 210, or the user environment estimating device 310 and the image-capturing device 40 may be configured to be able to communicate with each other in a wired or wireless manner. The first trained model generating device 110, the second trained model generating device 210, the user environment estimating device 310, and the image-capturing device 40 may each include a communication device. The communication device may be configured to be able to perform communication using communication methods based on various communication standards. The communication device can be configured using a known communication technology. Detailed description of the hardware and so on of the communication device is omitted. The functions of the communication device may be realized by a single interface or by separate interfaces for each connection destination.
The first trained model generating device 110 generates the first model by performing training based on the first image data containing an image of the recognition target in the standard environment. The second trained model generating device 210 generates the second model by updating the first model by performing training based on the second image data containing an image of the recognition target in the user environment, and outputs the second model as a trained model.
An example of generation of the first model and an example of generation of the second model are described below.
The first controller 120 of the first trained model generating device 110 generates the first image data containing an image of the recognition target in the standard environment. The standard environment is an environment in which images that serve as teacher data used in training to generate the first model are generated. The first controller 120 may acquire an image of the recognition target captured in the standard environment and generate the first image data containing the acquired image. The first controller 120 may generate an image assuming the appearance of the recognition target in the standard environment and generate the first image data containing the generated image. The standard environment may be an environment that at least reduces the effect of a shadow caused by the position of the light source on the captured image of the recognition target or the image in which the appearance of the recognition target is assumed.
The standard environment is, for example, an environment in which a cup 50, which is the recognition target, is illuminated by standard lighting 41, as illustrated in
The first controller 120 performs training for recognition in the standard environment using the first image data as the teacher data and generates the first trained model.
The second controller 220 of the second trained model generating device 210 generates the second image data containing an image of the recognition target in the user environment based on the user environment information. The user environment is an environment in which recognition of the recognition target is actually performed using the trained model. The second controller 220 may generate an image assuming the appearance of the recognition target in the user environment and generate the second image data containing the generated image.
Let us assume that the user environment is, for example, an environment in which an image of the cup 50 placed on a table 52 is captured by the image-capturing device 40 as a recognition target, as a second environment 200 as illustrated in
User lighting 42 may include a ring-shaped lighting device, for example. The user lighting 42 may include a variety of lighting devices. The user lighting 42 may include multiple lighting devices. The user environment may be a real environment or a virtual environment.
The second controller 220 generates the second model by updating the first model by performing training using the second image data as teacher data. The second model generated by updating the first model can improve recognition accuracy in the user environment. The second controller 220 outputs the second model as a trained model.
The third controller 320 of the user environment estimating device 310 may generate the user environment information by estimating the user environment information. The third controller 320 can estimate the user environment information based on the image of the cup 50, which is the recognition target, and the shadow 50S, as illustrated in
The marker 51 has at least two visible surfaces. The marker 51 is disposed so that the angles of incidence of illumination light from the user lighting 42 at the two surfaces are different from each other. The marker 51 is disposed so that the two surfaces having different angles of incidence of illumination light are captured as a single image by the image-capturing device 40. In
Specifically, the third controller 320 estimates various conditions that specify the user environment based on the captured image of the marker 51. The third controller 320 may, for example, estimate the lighting conditions or specifications of the image-capturing device 40. The third controller 320 may estimate information about objects other than the recognition target, such as the table 52 on which the marker 51 is placed. The third controller 320 generates or acquires conditions that specify the user environment as user environment information.
The third controller 320 may generate or acquire information specifying factors responsible for noise generated in the second image data in the user environment as user environment information. The third controller 320 may generate or acquire information specifying factors that cause the differences between the first image data and the second image data as the user environment information. The third controller 320 may generate or acquire, as the user environment information, information on the position of the light source in the user environment, the intensity of light radiated from the light source, and the light source type specifying whether the light source is a point light source system or a scattered light system. The third controller 320 may generate or acquire, as the user environment information, information on the optical properties of the table (for example, the table 52) on which the recognition target is disposed, or the walls or ceiling of the room in which the recognition target is disposed in the user environment. The third controller 320 may generate or acquire, as the user environment information, information on image-capturing parameters of image-capturing means used in recognition of the recognition target or information on vibration of the image-capturing means in the user environment. The image-capturing means may include the image-capturing device 40.
The first controller 120 of the first trained model generating device 110 and the second controller 220 of the second trained model generating device 210 may execute a trained model generating method including the procedures of the flowcharts illustrated in
The first controller 120 and the second controller 220 generate a trained model by executing the procedure of the flowchart illustrated in
The third controller 320 may generate the second image data in the procedure of Step S3 in
As described above, the trained model generating system 1, and the first trained model generating device 110 and the second trained model generating device 210 according to this embodiment generate the first model and the second model separately, and generate the second model by updating the first model based on the user environment information. The first trained model generating device 110 and the second trained model generating device 210 can improve the robustness of the trained model generated as the second model by generating the second model by updating the first model based on the user environment information.
The first trained model generating device 110 and the second trained model generating device 210 may generate, in the standard environment, a first model that is to be commonly used for multiple user environments to generate second models as trained models that can be applied to the multiple user environments. The first trained model generating device 110 and the second trained model generating device 210 can generate each second model by updating the first model through training based on information about the corresponding user environment in order to generate the second models that can be applied to the respective user environments after generating the first model. In other words, the first model is a common model for generating second models that each correspond to a respective one of the multiple user environments. The computational load for training for generating the second models that can be applied to the individual user environments can be reduced by performing training to generate the common first model. In addition, the versatility of the first model can be increased by using the common first model.
The trained model generating system 1 may further include a third trained model generating device. A third model may be generated for a different user environment from the second model. The third trained model generating device may have substantially the same configuration as the second trained model generating device 210. The third model may be generated using substantially the same method as the second model. A first model may be generated, in the standard environment, that is to be commonly used for each user environment to generate trained models that are to be applied to individual user environments such as the second model and the third model. After the first model has been generated, the second trained model generating device 210 and the third trained model generating device can generate the second model and the third model by updating the first model through training based on information about the corresponding user environments in order to generate the second model and the third model that are to be applied to the corresponding user environments. In other words, the first model is a common model used to generate the second model and the third model for the corresponding user environments. The computational load for training for generating the second model and the third model that can be applied to the respective user environments can be reduced by performing common training to generate the first model. In addition, the versatility of the first model can be increased by using the common first model. The third model, for example, may be generated on a case-by-case basis. The third model does not need to be generated at the same time as the second model. Even if the second model is generated by updating the first model in order to generate the third model, the first model may still be stored as the first model. The trained model generating system 1 may include the same number of trained model generating devices as user environments, and may generate the same number of trained models as user environments.
The trained model generating system 1 may further include a fourth trained model generating device. A fourth model may be generated based on the second model. For example, the controller of the fourth trained model generating device may acquire a captured image of the user environment and generate the fourth model based on the acquired captured image of the user environment without performing further training processing on the second model. For example, the fourth model may be generated by connecting to the second model an additional trained model that is different from the second model and has undergone training processing for the captured image in the user environment.
With the trained model generating system 1 and the first trained model generating device 110 and the second trained model generating device 210 according to this embodiment, the versatility and robustness of the trained models can be ensured.
Other embodiments are described below.
Lighting conditions in the user environment affect the appearance of an object, such as the marker 51, in the user environment. Differences in the appearance of the marker 51 under different lighting conditions are described below.
Let us assume that, as illustrated in
The first surface 511 (see
As an example of a marker 51 having another shape,
As illustrated in
The first surface 511 (see
The third controller 320 may estimate the lighting conditions under which the marker 51 is illuminated based on the image illustrated in
<<Example of Appearance of Marker 51 Illuminated from Two Directions by Spotlights 44>>
As illustrated in
The first surface 511 (see
The third controller 320 may estimate the lighting conditions under which the marker 51 is illuminated based on the image illustrated in
The third controller 320 can estimate lighting conditions and so forth in various user environments based on an image of the marker 51. The third controller 320 can generate or acquire user environment information based on estimation results. The marker 51 may be disposed so that at least two surfaces of the marker 51 are captured by the image-capturing device 40. The image-capturing device 40 may be configured to capture an image of the marker 51 from at least two directions.
The second controller 220 of the second trained model generating device 210 or the third controller 320 of the user environment estimating device 310 generates the second image data based on the user environment information as described above. The second controller 220 or the third controller 320 may generate information in which each parameter of the user environment information is varied within a prescribed range. The prescribed ranges may be set, for example, to the ranges, in the user environment, over which the environment information changes during the time period in which recognition using the second model is performed. Information in which at least one parameter, among the multiple parameters, of the user environment information is varied is also referred to as extended environment information. The second controller 220 or the third controller 320 may generate multiple sets of extended environment information and generate second image data that includes an image in which the appearance of the recognition target is assumed for each set of extended environment information. The robustness of a trained model can be improved by performing training using images in which the appearance of the recognition target is assumed in the extended environment information as teacher data.
The user environment estimating device 310 acquires image data obtained by capturing images of prescribed objects in the user environment. The prescribed objects may include the recognition target itself or an object different from the recognition target such as the marker 51. The user environment estimating device 310 may acquire image data using image-capturing means, or may acquire the image data from the outside. The user environment estimating device 310 estimates the user environment based on the image data.
The user environment estimating device 310 may estimate the user environment based on image data obtained by capturing a prescribed object from multiple directions. The user environment estimating device 310 may also estimate the user environment based on a captured image of at least two of the multiple surfaces of a prescribed object. The user environment estimating device 310 may also estimate the user environment based on images of two different surfaces of the prescribed object captured from at least two directions.
User environment information is more easily collected as a result of the user environment estimating device 310 being able to generate user environment information.
The functions of the user environment estimating device 310 may be realized as a user environment estimating method executed by the user environment estimating device 310. The functions of the user environment estimating device 310 may be realized as a user environment estimating program that is executed by a processor included in the user environment estimating device 310. For example, the user environment estimating program can estimate the user environment by comparing user environment data with reference data representing a predefined basic environment. The user environment estimating program and reference data may be stored in the third data holding unit 333 or the fourth data holding unit 334.
As described above, the trained model generating system 1 according to this embodiment generates trained models taking into account noise that occurs in the appearance of a recognition target in the user environment with respect to the standard appearance of the recognition target. Hereafter, a configuration used to acquire image data will be described as an example of a factor that causes noise to be generated.
In an image data acquisition configuration for acquiring optical target images as digital data, the illumination light source strikes the target, the reflected light is converted to a photoelectric signal by an optical sensor (image-capturing element or the like) of a camera (image-capturing device 40 or the like), the electric signal is converted to digital data, and thus, image data is acquired. Therefore, the image data is affected by various optical or electrical variations and noise.
Noise in the image data includes noise caused by the camera. Noise caused by the camera includes, for example, color variations and noise due to the ISO sensitivity of the optical sensor, or brightness variations and noise. Cameras ensure dynamic range by varying the ISO sensitivity (amplification factor) of the optical sensor based on input state of light when capturing images. An increase in the sensitivity of the optical sensor can result in increased noise. The shutter speed and aperture of the camera are parameters that alter the input state of light and are related to ISO sensitivity. These parameters can be easily referenced by being embedded in the image data as Exif (Exchangeable Image File Format) data. Noise caused by the camera includes color reproducibility variations and noise due to limitations in the color reproduction range of the optical sensor. Noise caused by the camera includes distortion variations and noise in optical systems such as optical lenses, or vignetting variations and noise. Noise caused by the camera also includes noise based on the way in which the camera is held, for example, blurring (vibration) noise between the camera and a camera holding member due to ambient vibration effects.
Noise in the image data includes noise caused by lighting. Noise caused by lighting includes, for example, shadow noise of the target associated with the lighting position (the coordinates of the lighting). Noise caused by lighting includes contrast variations and noise of the target due to the light source type (for example, parallel light or scattered light), or shadow noise of the target. Noise caused by lighting includes contrast variations and noise of the target due to illuminance (brightness), or shadow noise of the object. Noise caused by lighting includes color shift variations and noise due to the color temperature of the lighting. Noise caused by lighting includes variations in light flicker and noise caused by the type of lighting or luminance adjustment and so on.
Noise in the image data includes noise caused by a work table such as the table 52. Noise caused by a work table includes variations in reflectance due to the surface condition of the work table or noise caused by reflected light from the surface of the work table. Noise caused by the work table includes noise that is poorly separated from the target due to the color of the work table.
Noise in the image data includes noise caused by the target. Noise caused by the target includes variations in reflectance due to the surface condition of the target, or noise caused by reflected light.
The second controller 220 of the second trained model generating device 210 or the third controller 320 of the user environment estimating device 310 may estimate each factor of noise in the image data described above based on the user environment data and generate the user environment information. The second controller 220 or the third controller 320 may estimate some or all of the multiple factors of noise in the image data. In other words, the second controller 220 or the third controller 320 may estimate at least some of the multiple factors of noise in the image data. Noise caused by each of the above-described factors has a significant impact on recognition of the target. The trained model generating system 1 according to this embodiment can generate a trained model by performing training using teacher data that takes these types of noise into consideration. As a result, the robustness of the trained model can be improved.
As described above, the trained model generating system 1 according to this embodiment can generate a trained model tailored to each user environment based on the user environment information. The user environment information can be generated based on captured images of the marker 51 in the user environment. Hereafter, an example of the structure of the marker 51 will be described.
In the case where target images are acquired as digital data, the user's image acquisition environment is affected by various optical or electrical variations and noise. Therefore, acquiring the user environment is necessary in order to improve robustness.
In order to acquire the user environment, the marker 51 having the following three-dimensional structure may be used, for example. The marker 51 may be a polyhedral structure and may have at least three surfaces. The marker 51 may have a structure that, when illuminated by lighting, allows the shading of shadows produced on each surface to be determined. Specifically, the marker 51 may have ridges that define the boundaries of each surface. The marker 51 may have a structure that allows the reflectance of light on each surface to be determined. The marker 51 may have a structure that allows the size of the marker 51 to be known, for example, a marker or dimensional scale indicating the specified size. The marker 51 may include a grid pattern or the like so as to allow identification of distortion and other characteristics of the optical system of the image-capturing means. The marker 51 may include a portion that results in a known darkness, for example, a grayscale of 18%. The marker 51 may include a portion that results in a white dot.
The marker 51 may be disposed so that at least two surfaces of the marker 51 are captured by the image-capturing means. The marker 51 may be disposed so that the marker 51 is captured from at least two or more directions having different angles.
The second controller 220 of the second trained model generating device 210 or the third controller 320 of the user environment estimating device 310 may estimate, for example, the lighting position, the luminance, brightness, or illuminance of the lighting, or the type of light source as lighting conditions in the user environment based on the image data of the marker 51. The second controller 220 or the third controller 320 may also estimate the reflectance of the marker 51 or objects such as a work table that are present around the marker 51. The second controller 220 or the third controller 320 may estimate the lighting position based on the size and the shadow of the marker 51. The second controller 220 or the third controller 320 may estimate the luminance of the lighting based on the density of the image of the marker 51 and the ISO sensitivity, shutter speed, or aperture of the camera. The second controller 220 or the third controller 320 may estimate the contrast based on image data of an edge portion of the marker 51 and image data of an edge portion of the shadow, and may estimate lighting conditions such as the type of light source of the lighting (for example, parallel light or scattered light). The second controller 220 or the third controller 320 may estimate the lighting conditions based on the pixel density distributions of the edge portion of the marker 51 and the edge portion of the shadow. The second controller 220 or the third controller 320 may estimate the reflectance of the marker 51 based on a reflection image of the marker 51. The second controller 220 or the third controller 320 may estimate information about an object in the surroundings that is reflected in the marker 51 based on the reflection image of the marker 51. The second controller 220 or the third controller 320 may estimate the color temperature or spectrum of the lighting based on an image of a white point of the marker 51. The second controller 220 or the third controller 320 may estimate the distortion of the optical system of the image-capturing means based on an image of a grid pattern of the marker 51.
Hereafter, configurations that may be employed as features of the trained model generating system 1 according to this embodiment will be described. The trained model generating system 1 can be configured to, in target recognition for recognizing a target, perform first recognition to recognize a target in the standard environment and second recognition to recognize a target in the user environment, and can be configured to improve recognition of the target in the first recognition and increase the robustness of recognition in the user environment in the second recognition by first recognizing the target in the first recognition and then recognizing the target in the second recognition. The trained model generating system 1 may store at least a target recognition algorithm or a target recognition algorithm and a target dataset in the first recognition. In the trained model generating system 1, standard environment target data generating means for the first recognition may consist of lighting and a lighting holding member for holding the lighting, a target and a member for holding the target, and an image conversion system for converting the target into data. The lighting of the standard environment target data generating means of the first recognition may be configured with two or more lights. The lighting of the standard environment target data generating means of the first recognition may be configured such that the color temperature of the lighting can be adjusted. An image conversion system for converting a target of the standard environment target data generating means of the first recognition into data may be configured to generate data based on a two-dimensional color image or a three-dimensional color image and distance data. The standard environment target data generating means of the first recognition may be configured in a virtual environment. In the second recognition, the target recognition algorithm, which was sufficiently trained for target recognition and retained in the first recognition, or the target recognition algorithm and the target dataset may be copied at the beginning of the recognition training in the second recognition. User environment target data generating means of the second recognition may be configured to create the user environment based on the estimation results of the user environment and perform recognition. The user environment based on the estimation results of the user environment produced by the user environment target data generating means of the second recognition may consist of lighting and a lighting holding member for holding the lighting, a target and a member for holding the target, and an image conversion system for converting the target into data. The lighting of the user environment target data generating means of the second recognition may be configured with two or more lights. The lighting of the user environment target data generating means of the second recognition may be configured such that the color temperature of the lighting can be adjusted. The image conversion system for converting the target of the user environment target data generating means of the second recognition into data may be configured to generate data based on a two-dimensional color image or a three-dimensional color image and distance data. In the second recognition, the target may be recognized by configuring a virtual user environment based on the estimation results of the user environment.
In the first recognition and the second recognition, the standard environment and the user environment may share some environmental elements. That is, for example, if lighting is also included as an environmental element when generating the first image data taking the standard environment into account, lighting may also be included as an environmental element when generating the second image data taking the user environment into account. Data representing the standard environment and the user environment may be the same type of data. In this case, for example, the standard environment or the user environment can be used in the same or similar software.
The user environment may include means for measuring the user environment and means for estimating the user environment from information obtained from the measuring means. The means for measuring the user environment may be configured to hold a three-dimensional object and acquire user environment data such as physical information (size, density, reflection) about the object and image data of the three-dimensional object from two or more different angles. User environment information such as lighting position, number of lights, luminance, light source type, or reflectance may be estimated by the environment estimating means from the user environment data. The means for estimating the user environment from the user environment data may be configured to estimate the user environment geometrically from two sets of image data. A three-dimensional object for measuring the user environment may include a white object. A white object may be disposed in the vicinity of the three-dimensional object. The three-dimensional object for measuring the user environment may include a grayscale density object. A grayscale density object may be disposed in the vicinity of the three-dimensional object. The reflectance of the grayscale density object may be 18%. Image data of the three-dimensional object may include a two-dimensional color image or a three-dimensional color image and distance data.
The trained model generating system 1 may be configured to store or accumulate user environment data. The trained model generating system 1 may be configured to store or accumulate user environment information.
In the trained model generating system 1, the user environment target recognition unit 224 and the standard environment target recognition unit 122 may have identical or similar configurations. Even if the user environment target recognition unit 224 and the standard environment target recognition unit 122 are identical, the training results may differ depending on the input standard environment data or user environment data.
The first trained model generating device 110 and the second trained model generating device 210 may be configured as identical devices. The timing at which the first trained model generating device 110 performs first training to generate the first model and the timing at which the second trained model generating device 210 performs second training to generate the second model may be different timings.
The standard environment target data generator 121 and the user environment target data generator 223 may be configured as a common target data generator. The target data generator, which functions as the standard environment target data generator 121, generates standard environment target data by reading standard environment information. The user environment target data generator, which functions as the user environment target data generator 223, generates user environment target data by reading user environment information.
The functions of the user environment estimating unit 326 of the third controller 320 of the user environment estimating device 310 may be realized by the second controller 220 of the second trained model generating device 210. In this case, the functions of the fourth data holding unit 334 of the third storage unit 330 of the user environment estimating device 310 are realized by the second storage unit 230 of the second trained model generating device 210. Let us assume that the second trained model generating device 210 is owned by a vendor that supplies trained models. Let us also assume that the user environment estimating device 310 is owned by a user who performs recognition using a trained model. In this case, the functions of the user environment estimating unit 326 are realized by the second trained model generating device 210, and this allows the user environment to be estimated on the vendor's side. In other words, there is no longer a need for user environment estimation on the user's side. User convenience is improved.
Although embodiments according to the present disclosure have been described based on the drawings and examples, please note that one skilled in the art can make various variations or changes based on the present disclosure. Please note that, therefore, these variations or changes are included within the scope of the present disclosure. For example, the functions and so on included in each constituent part can be rearranged in a logically consistent manner, and multiple constituent parts and so on can be combined into one part or divided into multiple parts.
All of the constituent elements described in the present disclosure and/or all of the disclosed methods or all of the steps of disclosed processing can be combined in any combination, except for combinations in which their features are mutually exclusive. Each of the features described in the present disclosure may be replaced by alternative features that serve the same, equivalent, or similar purposes, unless explicitly stated to the contrary. Therefore, unless explicitly stated to the contrary, each of the disclosed features is only one example of a comprehensive set of identical or equivalent features.
Furthermore, the embodiments according to the present disclosure are not limited to any of the specific configurations of the embodiments described above. The embodiments according to the present disclosure can be extended to all novel features, or combinations thereof, described in the present disclosure, or all novel methods, or processing steps, or combinations thereof, described in the present disclosure.
Part of the methods according to the present disclosure may be performed manually by humans. For example, an instruction to begin work on generating a training model could be executed manually. In addition, specifying a folder where a training dataset is to be stored could be performed manually. The trained model generating system 1, a trained model generating device such as the first trained model generating device 110 or the second trained model generating device 210, or the user environment estimating device 310 according to the present disclosure may be configured to accept input concerning what a person is intending to perform manually. For example, the trained model generating system 1 may be communicatively connected to the trained model generating device or the user environment estimating device 310, and may also include an input device that accepts user input. The trained model generating device or the user environment estimating device 310, for example, may include an input unit that accepts user input. As a result, the trained model generating system 1 and so on can accept a user's instruction to start work or can accept a user input specifying where to store the learning data during training processing. The input device or input unit may include, for example, a touch panel or a touch sensor, or a pointing device such as a mouse. The input device or input unit may include physical keys or a voice input device such as a microphone.
The present disclosure can also be implemented as a trained model generating device. Specifically, a trained model generating device of an embodiment includes
The present disclosure can also be implemented as a user environment estimating device. Specifically, a user environment estimating device of an embodiment includes
The present disclosure can also be implemented as a trained model generating program. Specifically, a trained model generating program of an embodiment is configured to cause a trained model generating device to
The present disclosure can also be implemented as a user environment estimating program. Specifically, a user environment estimating program of an embodiment includes
The present disclosure can also be implemented as a trained model generating system. Specifically, a trained model generating system of an embodiment includes
Number | Date | Country | Kind |
---|---|---|---|
2021-121958 | Jul 2021 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/028834 | 7/26/2022 | WO |