This application is a U.S. National Stage Application of International Application No. PCT/IB2018/055978 filed Aug. 8, 2018, which designates the United States.
The disclosures of published patent documents referenced in this application are hereby incorporated in their entireties by reference into this application in order to more fully describe the state of the art to which this invention pertains.
The present invention relates to a system of operation for remotely operated vehicles (“ROV”), and methods for its use. In particular, the present invention provides a system and method of operation for ROVs leveraging synthetic data to train machine learning models.
Exploration of the last frontier on earth, the sea, is largely driven by the continuing demand for energy resources. Because humans are not able to endure the pressures induced at the depths at which energy reconnaissance occurs, we have become increasingly reliant upon technology such as autonomous vehicles and ROV technology. The future of the exploration of the oceans is only as fast, reliable and safe as the available technology. Thus, new innovations in exploration are needed.
The embodiments disclosed herein provide systems and methods such that synthetic data may be used to train machine learning models that still perform well in real data. It is known that machine learning methods generally work better as the training dataset increases. However, in most cases, annotated data is very costly to obtain and, therefore, there is a big motivation to use simulated data to train the models to reduce costs and increase the dataset's size.
For example, image segmentation generally requires a human annotator to label each pixel of an image with the corresponding pixel class. This is a very time-consuming task. Moreover, it is likely that different annotators have different policies regarding where the boundary between different objects should be placed that may lead to data inaccuracy.
With synthetic data, the object to which a pixel belongs to is known. It is even possible to obtain precise annotations for more complex problems that a human annotator cannot predict, such as a depth map or surface normals. Also, in some cases, vast amounts of synthetic data may be generated, even of events that are unlikely in the real world.
However, simulations are usually simplifications of the real world. Synthetic images tend to have simplistic textures and lighting that do not exactly mimic reality. This poses some challenges in training a deep learning model on synthetic data that generalizes to real data.
The embodiments disclosed herein solve this problem by replaying real examples on the virtual world. Then, the embodiments constrain the features that are extracted on the real and virtual images to be equal. These systems and methods work directly with images. Further, the systems and methods also work on videos, for example by dividing the videos into independent frames.
The aforementioned and other aspects, features and advantages can be better understood from the following detailed description with reference to the accompanying drawings wherein:
The invention provides a system for operating a remotely operated vehicle (ROV) leveraging synthetic data to train machine learning models comprising:
The systems and methods disclosed herein may further have one or more of the following additional features, which may be combined with one another or any other feature described herein unless clearly mutually exclusive.
The simulator module may have access to the video dataset, the telemetry dataset, the 3D model dataset, and the synthetic dataset, and the simulator module may include a ROV's piloting simulator.
The machine learning trainer module may have access to the video dataset and the synthetic dataset.
The model module may include an application using a model trained in the machine learning trainer module and the model module may be connected to at least one ROV.
The simulator module may be operable to replay a mission in a ROV's pilot training simulator.
The simulator module may replay the mission by retrieving ROV telemetry from the telemetry dataset and 3D model data from the 3D model dataset, may denoise the telemetry data, and may generate a synthetic video of the mission.
The synthetic training engine may be operable to automatically annotate the real images.
The synthetic training engine may be operable to automatically annotate the real images for object segmentation, depth map estimation, and classifying whether a specific structure is in the real image.
The synthetic training engine may be operable to replay a mission and annotate the real images.
The synthetic training engine may map both the real images and the synthetic video data into a shared feature representation.
The synthetic training engine may have three training settings: (i) a simreal setting where both simulated data and real data are available, (ii) a sim setting where only simulated data is available, and (iii) a real setting where only real data is available.
The synthetic training engine may train the three training settings simultaneously and randomly samples one of the three training settings at each training iteration.
The invention also provides a system for undersea exploration comprising:
The simulator module may have access to the video dataset, the telemetry dataset, the 3D model dataset, and the synthetic dataset and the simulator module may include a ROV's piloting simulator.
The machine learning trainer module may have access to the video dataset and the synthetic dataset.
The model module may include an application using a model trained in the machine learning trainer module and the model module may be connected to at least one ROV.
The simulator module may be operable to replay a real mission in a ROV's pilot training simulator.
The simulator module may replay the mission by retrieving ROV telemetry from the telemetry dataset and 3D model data from the 3D model dataset, denoise the telemetry data, and generate a synthetic video of the mission.
The synthetic training engine may be operable to automatically annotate the real images.
The invention also provides a method of leveraging synthetic data to train machine learning models for remotely operated vehicles (ROV) comprising:
The invention also provides a computer program product, stored on a computer-readable medium, for implementing any method according to invention as described herein.
As mentioned supra, various features and functionalities are discussed herein by way of examples and embodiments in a context of ROV navigation and machine learning for use in undersea exploration. In describing such examples and exemplary embodiments, specific terminology is employed for the sake of clarity. However, this disclosure is not intended to be limited to the examples and exemplary embodiments discussed herein, nor to the specific terminology utilized in such discussions, and it is to be understood that each specific element includes all technical equivalents that operate in a similar manner.
The following terms are defined as follows:
3D elements; 3D objects—Data defining three-dimensional shapes, obtained by modeling sonar-derived input or user-determined input.
Abstraction; layer of abstraction—A characteristic of executable software, wherein differing data formats are standardized into a common format such that components are made compatible.
Data engine—A collection of modules, according to an embodiment of this invention, which is responsible for at least the acquisition, storing and reporting of data collected over the course of a ROV mission.
Fail state—A state, defined by a user or by a standard, wherein the functionality of the system, according to some embodiments of the invention, has decreased to an unacceptable level.
Luminance threshold—A system-determined value of RGB (Red, Green, Blue) pixel color intensity which defines a visible but transparent state for the images depicted by a digital image output device.
Module—A combination of at least one computer processor, computer memory and custom software that performs one or more defined functions.
Navigation engine—A collection of modules, according to some embodiments of this invention, which is responsible for making the Navigation Interface interactive, and for producing data for displaying on the Navigation Interface.
Positioned; geopositioned; tagged—Having a location defined by the Global Positioning System of satellites and/or acoustic or inertial positioning systems, and optionally having a location defined by a depth below sea level.
ROV—A remotely operated vehicle; often an aquatic vehicle. Although for purposes of convenience and brevity ROVs are described herein, nothing herein is intended to be limiting to only vehicles that require remote operation. Autonomous vehicles and semi-autonomous vehicles are within the scope of this disclosure.
Synthetic training engine—A collection of modules, according to some embodiments, which is responsible for leveraging synthetic data to train machine learning models.
Visualization engine—A collection of modules, according to an embodiment of this invention, which is responsible for producing the displayed aspect of the navigation interface.
System
Hardware and Devices
Referring now to the drawings, wherein like reference numerals designate identical or corresponding parts throughout the several views,
As seen from
In one embodiment of the invention, the hardware for the operating system 3 includes a high-end rack computer that can be easily integrated with any ROV control system. The several software modules that further define the operating system will be described in further detail infra.
With reference to
Functional Modules Rather than developing a different operating system 3 for each brand and model of ROV 1, the embodiments described herein work by abstraction, such that the disclosed operating system 3 and associated hardware work the same way with all ROVs 1. For example, if one component delivers “$DBS,14.0,10.3” as a depth and heading coordinates, and another component delivers “$HD,15.3,16.4” as heading and depth coordinates, these data strings are parsed into their respective variables: Depth1=14.0, Depth2=16.4, Heading1=16.4, Heading2=15.3. This parsing allows both system to work the same way, regardless of the data format details.
By developing a layer of abstraction of drivers for communication between the operating system 3 and the ROV hardware, the user 4 is provided with seamless data communication, and is not restricted to using particular ROV models. This abstraction further allows users 4 and systems 3 to communicate and network information between several systems and share information among several undersea projects. The use of a single system also allows for cost reduction in training, maintenance and operation of this system.
Visualization Engine
As seen from
A 3D database module 10 includes advanced 3D rendering technology to allow all the stages of ROV operation to be executed with reference to a visually re-created 3D deep-water environment. This environment is composed by the seabed bathymetry and modeled equipment, e.g., structures of ocean energy devices.
As discussed above, the main sources of image data may be pre-recorded 3D modeling of sonar data (i.e., computer-generated 3D video) and possibly other video data; live sonar data obtain in real time; video data obtained in real time; user-determined 3D elements; and textual or graphical communications intended to be displayed on the user interface screen. The geographical position and depth (or height) of any elements or regions included in the image data are known by GPS positioning, by use of acoustic and/or inertial positioning systems, and/or by reference to maps, and/or by other sensor measurements.
In some embodiments, a virtual video generation module 11 is provided for using the aforementioned stored 3D elements or real-time detected 3D elements to create a virtual video of such 3D elements. The virtual video generation module 11 may work in concert with a synchronization module 12.
The synchronization module 12 aligns the position of the virtual camera of the virtual video with the angle and position of a real camera on an ROV. According to some embodiments the virtual camera defines a field of view for the virtual video, which may extend, for example, between 45 and 144 degrees from a central point of view.
As illustrated in
A superimposition module 13, whose function is additionally diagrammed in
Yet another feature of the superimposition module 13 is that either one or both of the virtual 20 or real videos 21 may be manipulated, based upon a luminance threshold, to be more transparent in areas of lesser interest, thus allowing the corresponding area of the other video feed to show through. According to some embodiments, luminance in the Red-Green-Blue hexadecimal format may be between 0-0-0 and 255-255-255, and preferably between 0-0-0 and 40-40-40. Areas of lesser interest may be selected by a system default, or by the user. The color intensity of images in areas of lesser interest is set at the luminance threshold, and the corresponding region of the other video is set at normal luminance. For the example shown in
Navigation Engine
The on-screen, 2D Navigation Interface for the ROV pilot involves superimposing geopositioned data or technical information on a 2D rendering system. Geopositioning or geo-tagging of data and elements is executed by reference to maps or to global positioning satellites. The resulting Navigation Interface, as seen in
The planning module enables engineers and/or supervisors to plan one or several ROV missions. Referring again to
In some embodiments, procedures 35, including timed procedures (fixed position observation tasks, for example), may be included on the Navigation Interface as text. Given this procedural information, a ROV pilot is enabled to anticipate and complete tasks more accurately. A user may also use the system to define actionable areas. Actionable areas are geopositioned areas in the undersea environment that trigger a system action when entering, leaving, or staying longer than a designated time. The triggered action could be an alarm, notification, procedure change, task change, etc.
Referring to
With reference to
Data Engine
The data engine, which mediates the data warehousing and data transfer functions of the invention, therefore incorporates the logging and supervising modules.
The logging module logs or records all information made available by the operating system and saves such data in a central database for future access. The available information may include any or all telemetry, sonar data, 3D models, bathymetry, waypoints, checkpoints, alarms or malfunctions, procedures, operations, and navigation records such as flight path information, positioning and inertial data, etc.
An essential part of any offshore operation providing critical data to the client after the operation is concluded. After the operation, during the debriefing and reporting stage, the debriefing and reporting module may provide a full 3D scenario or reproduction of the operation. The debriefing and reporting module may provide a report on the planned flight path versus the actual flight path, waypoints, checkpoints, several deviations on the plan, alarms given by the ROV, including details of alarm type, time and location, procedures, checkpoints, etc. ready to be delivered to the client. Accordingly, the operating system is configured to provide four-dimensional (three spatial dimensions plus time) interactive reports for every operation. This enables fast analysis and a comprehensive understanding of operations.
Yet another software element that interacts with of the Navigation Interface is the supervisor module. Execution of the supervisor module enables one or more supervisors to view and/or utilize the Navigation Interface, and by extension, any ROV 1 being controlled from the interface. These supervisors need not share the location of the ROV pilot or pilots, but rather may employ the connectivity elements depicted in
Leveraging Synthetic Data to Train Machine Learning Models
According to some embodiments, another feature is the ability to leverage synthetic data to train machine learning models. This is further described and shown with respect to
The ROV 71 may be similar to or the same as, and operate in a similar manner to or the same as, ROV 1 described herein and shown in
ROV 71 may be used in several underwater applications, such as inspection and maintenance of oil and gas structures. The ROVs may contain sensors that obtain real world coordinates and video camera systems, such as a monocular video camera system.
Simulator module 76 may be operable to replay a mission in a ROV's pilot training simulator. To do so, the simulator module 76 may retrieve the ROV's telemetry and the scene's 3D models from the datasets. The simulator module 76 may denoise the ROV's telemetry signal and then, by placing the simulator's camera on the ROV's position, may generate a synthetic video of the mission. In some embodiments, the simulator module 76 may be used to generate synthetic data from different views as in the real missions or even from synthetic scenes. The synthetic training engine 70 can use this pairing of synthetic and real videos to train machine learning (“ML”) models.
One technological improvement provided by the embodiments disclosed herein is that the synthetic training engine 70 can automatically annotate the real images for several tasks such as object segmentation, depth map estimation, and even classifying whether a certain structure is in the image.
Another technological improvement is that, after making the model (e.g., model 78 or 83) invariant to the domain of the input, the synthetic training engine 70 can train the ML model 80 with synthetic images and the ML model 80 will work on real images.
In some embodiments, the output of ML model 80 can perform some valuable task, such as the detection of integrity threats in underwater oil and gas structures. The model 83 is placed in a computer (e.g., operating system 3) that is connected to ROV 71 as shown in
The synthetic training engine 70 may replay a real mission in the virtual world and annotate the real images. Thus, the synthetic training engine 70 can train a standard convolutional neural network (“CNN”) g to predict a label y for a given real image x.
This is achieved by minimizing a loss function Lr(y,g(x)). Moreover, the dataset can be augmented by using the synthetic images x{circumflex over ( )} to train g. Again, this is achieved by minimizing another loss function Ls(y,g(x{circumflex over ( )})). Therefore, the full loss function to be minimized is the sum of Lr and Ls:
Lg=Lr+Ls. (Equation 1)
Even though the synthetic image represents the same information as the real image, the pixel values of the two are still different. This may happen due to, for instance, differences in texture and lighting. Therefore, the naïve approach of mixing real and synthetic images into a single dataset and training a model does not work well.
To overcome this technical problem, the synthetic training engine 70 maps the real and synthetic images to a common feature space. For that, the synthetic training engine 70 creates two models: one that extracts features from real images fr and another that extracts features from synthetic images fs. For a given pair of real and synthetic images (x,x{circumflex over ( )}) depicting the same scene, the output of both feature extraction models should be the same. The two feature extraction models should be trained to minimize the L2 norm of the difference between the real and synthetic features:
Lf=∥fr(x)−fs(x{circumflex over ( )})∥2. (Equation 2)
Then, the synthetic training engine 70 updates the classifier g to, instead of receiving an image as input, receive the output of the feature extraction models. More formally, for a real image x the output of the classifier is given by g(fr(x)) and, for a synthetic image x{circumflex over ( )} the output is given by g(fs(x{circumflex over ( )})).
The synthetic training engine 70 can use CNNs as functions fr, fs and g. Then, the three CNNs can be trained jointly by minimizing both Equations 1 and 2. A diagram depicting the described model is shown in
Although previously described with reference to the case of classification, this can also be used on the segmentation case by changing the architecture of fr and fs. For instance, the synthetic training engine 70 can use a U-Net like architecture for the feature extraction models as shown herein with respect to
Although the solution discussed with respect to
The synthetic training engine 70 may have training settings, such as (i) simreal: both simulated and real data are available, (ii) sim: only simulated data is available; and (iii) real: only real data is available. The real setting is not mandatory but may improve results. For example, the real setting may be used when an agent's state in a video or image is not known and, therefore, cannot be properly simulated, but a human annotator still labeled the video or image. Otherwise, if the agent's state is available, the data is used in the simreal setting.
The simreal setting was previously discussed. In contrast, for a single image modality, the synthetic training engine 70 may use only one branch of the feature extraction model 100 as shown in
In some embodiments, such as the sim setting and the real setting, yet another modification may be used.
Additionally or alternatively, the synthetic training engine 70 may fix the feature extraction model and only update the parameters of the classifier so the feature extraction models do not detect domain specific features. Therefore, both fs and fr are only trained on the simreal case.
In some embodiments, instead of training the models sequentially in each of these three training settings, the models are trained on all of them at the same time. At each training step, the synthetic training engine 70 randomly samples from one of these three training settings t∈[1, 3]. Then, a sample is drawn from the dataset corresponding to the training setting t and the models' parameters are updated accordingly.
This random sampling training procedure avoids known problems with neural networks, such as the problem known as catastrophic forgetting or catastrophic interference. For instance, if the synthetic training engine 70 started training the model in the simreal setting and then moved on to the real setting, after some time the model would start to become worse at generalizing from synthetic to real data. Thus, there has been shown and described a system and method of operation for ROVs leveraging synthetic data to train machine learning models. The method and system are not limited to any particular hardware or software configuration. The many variations, modifications and alternative applications of the invention that would be apparent to those skilled in the art, and that do not depart from the scope of the invention, are deemed to be covered by the invention.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2018/055978 | 8/8/2018 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2020/030950 | 2/13/2020 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
10007269 | Gray | Jun 2018 | B1 |
10394327 | Chizeck | Aug 2019 | B2 |
20140316611 | Parente Da Silva | Oct 2014 | A1 |
20190147220 | Mccormac et al. | May 2019 | A1 |
20190311546 | Tay | Oct 2019 | A1 |
20200026283 | Barnes | Jan 2020 | A1 |
20200041276 | Chakravarty et al. | Feb 2020 | A1 |
20200292817 | Jones | Sep 2020 | A1 |
20210304430 | Vendas Da Costa | Sep 2021 | A1 |
20220005332 | Metzler | Jan 2022 | A1 |
Number | Date | Country |
---|---|---|
0674977 | Oct 1995 | EP |
2949167 | Feb 2011 | FR |
2013068821 | May 2013 | WO |
WO-2013068821 | May 2013 | WO |
Entry |
---|
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2018/055979, dated Mar. 7, 2019; 13 pages. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2018/055976, dated Feb. 22, 2019; 11 pages. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2018/055977, dated Apr. 9, 2019; 21 pages. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2018/055978, dated Feb. 22, 2019; 12 pages. |
Kalwa, J. et al., “The MORPH Project: Actual Results,” Oceans 2015—Genova, IEEE, May 18, 2015, 8 pages. |
Eckstein, Sebastian et al., “Towards Innovative Approaches of Team-Oriented Mission Planning and Mission Languages for Multiple Unmanned Marine Vehicles in Event-Driven Mission,” MTS/IEEE Oceans, Bergen, Jun. 2013, 8 pages. |
Vijayanarasimhan, Sudheendra, et al., “SfM-Net: Learning of Structure & Motion from Video,” retrieved from the internet on Feb. 21, 2019, URL: https://arxiv.org/pdf/1704.07804.pdf, 5 pages. |
Bruno, Fabio, et al., “Virtual and Augmented Reality Tools to Improve the Exploitation of Underwater Archaeological Sites by Diver and Non-diver Tourists,” International Conference on Simulation, Modeling, and Programming for Autonomous Robots, SIMPAR 2010, Oct. 31, 2016, 12 pages. |
Marchand, Eric, et al., “Pose Estimation for Augmented Reality: A Hands-On Survey,” IEEE Transactions on Visualization and Computer Graphics, vol. 22, No. 12, Dec. 1, 2016, 19 pages. |
Vlahakis, Vassilio et al., “Archeoguide: an augmented reality guide for archaeological sites,” IEEE Computer Graphics and Applications, vol. 22, No. 5, Sep. 1, 2002 , 9 pages. |
Sharma, Ojaswa et al., “Navigation in AR based on digital replicas,” The Visual Computer, Springer, Berlin, DE, vol. 34, No. 6, May 2, 2018, 12 pages. |
Reitmayr, Gerhard et al., “Going out,” Mixed and Augmented Reality, ISMAR 2006. IEEE/ACM International Symposium On, IEEE, PI, Oct. 22, 2006 , 10 pages. |
Tzafestas, Costas S., “Virtual and Mixed Reality in Telerobotics: A Survey,” Industrial Robotics—Programming, Simulation and Application, Jan. 1, 2006, 34 pages. |
O'Byrne, Michael et al., “Semantic Segmentation of Underwater Imagery Using Deep Networks Training in Synthetic Imagery,” Journal of Marine Science and Engineering, vol. 6, No. 3, Aug. 3, 2018, 15 pages. |
Choi, S.K., et al., “Distributed Virtual Environment Collaborative Simulator for Underwater Robots,” Proceedings of the 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems, Oct. 2000, 6 pages. |
European Patent Office, Examination Report for EP Application No. 18 762 931.6-1218 dated Feb. 6, 2023, 4 pages. |
Brazilian Patent and Trademark Office, Preliminary Office Action Brazilian Patent Application No. BR1120210020276, dated Jun. 20, 2023, 4 pages. |
Intellectual Property Office of Singapore, Application No. 11202100945W, Written Opinion dated Nov. 25, 2022, 6 pages. |
Intellectual Property Office of Singapore, Application No. 11202100949Y, Written Opinion dated Dec. 1, 2022, 7 pages. |
Baraldi et al., “LAMV: Learning to align and match videos with kemelized temporal layers,” Facebook AI Research, Jun. 19, 2018, URL:https://ai.facebook.com/results/?content_types%5B0%5D=publication&page=10&years%5B0%5D=2018 (Year 2018). |
European Patent Office, Examination Report for EP Application No. 18 770 078.6-1207 dated Mar. 9, 2023, 4 pages. |
Number | Date | Country | |
---|---|---|---|
20210309331 A1 | Oct 2021 | US |