The subject matter described herein relates in general to driving logs and, more particularly, to the labeling of driving logs.
Labeling a dataset may involve a human manually reviewing an image or video to identify things in the data (e.g., cars, pedestrians, etc.). The identified thing(s) can be labeled by the human reviewer, such as by enclosing them in a virtual bounding box, performing a pixel by pixel identification of the thing(s), and/or by tagging the thing(s). A user interface tool can enable a human reviewer to perform such labeling. The labeled dataset can be useful for training a machine learning algorithm.
In one respect, the subject matter presented herein relates to a method of automatically labeling driving logs. The method includes receiving one or more unlabeled real-world driving logs. The one or more unlabeled real-world driving logs can include data captured by one or more vehicle sensors. The method can include automatically labeling the one or more unlabeled real-world driving logs to generate one or more labeled real-world driving logs. The automatic labeling can include analysis-by-synthesis on the one or more unlabeled real-world driving logs to generate one or more simulated driving logs. The one or more simulated driving logs include reconstructed driving scenes or portions thereof. The automatic labeling can include simulation-to-real automatic labeling on the one or more simulated driving logs and the one or more unlabeled real-world driving logs to generate one or more labeled real-world driving logs. The method can further include storing the one or more labeled real-world driving logs in one or more data stores.
In another respect, the subject matter presented herein relates to a system for automatically labeling driving logs. The system can include one or more processors. The one or more processors can be programmed to initiate executable operations. The executable operations can include receiving one or more unlabeled real-world driving logs. The one or more unlabeled real-world driving logs can include data captured by one or more vehicle sensors. The executable operations can include automatically labeling the one or more unlabeled real-world driving logs to generate one or more labeled real-world driving logs. The automatic labeling can include analysis-by-synthesis on the one or more unlabeled real-world driving logs to generate one or more simulated driving logs. The one or more simulated driving logs can include reconstructed driving scenes or portions thereof. The automatic labeling can include simulation-to-real automatic labeling on the one or more simulated driving logs and the one or more unlabeled real-world driving logs to generate one or more labeled real-world driving logs. The executable operations can include storing the one or more labeled real-world driving logs in one or more data stores.
Manually labeling vehicle data is inefficient and not scalable to large vehicle fleets. Further, acquiring labeled data can be a significant bottleneck in the development of machine learning models that are accurate and efficient enough to enable safety-critical applications, such as automated driving. Accordingly, arrangements described herein are directed automatically labeling vehicle sensor data (e.g., video, images, other sensor data, etc.) recorded during a human or autonomous driving session and stored in driving logs. Arrangements described herein are directed to automatically labeling data captured by vehicle sensors as recorded during a human and/or autonomous driving session and stored in driving logs. “Labels” include supervisory signals required to develop machine learning algorithms. Such machine learning algorithms can be for various driving systems, including, for example, Automated Driver Assistance and Autonomous Driving Systems. Labels can include, but are not limited to, localized semantic information such as bounding boxes or segmentation masks around vehicles, pedestrians, and/or other objects perceived by vehicle sensors.
Arrangements described herein can automate the labeling process using simulation tools that use systems and techniques based on analysis-by-synthesis and/or unsupervised domain adaptation. Arrangements described herein can automatically generate labels for real-world driving logs by simulating real-world driving logs or portions thereof using analysis-by-synthesis techniques. Arrangements described herein can use one or more simulators to automatically generate labels for the simulated driving logs. Arrangements described herein can train a machine learning model, including, for example, deep neural networks, using the simulated driving logs and labels. Arrangements described herein can use the trained machine learning model to predict labels for the real-world driving logs.
Detailed embodiments are disclosed herein; however, it is to be understood that the disclosed embodiments are intended only as examples. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a basis for the claims and as a representative basis for teaching one skilled in the art to variously employ the aspects herein in virtually any appropriately detailed structure. Further, the terms and phrases used herein are not intended to be limiting but rather to provide an understandable description of possible implementations. Various embodiments are shown in
It will be appreciated that for simplicity and clarity of illustration, where appropriate, reference numerals have been repeated among the different figures to indicate corresponding or analogous elements. In addition, numerous specific details are set forth in order to provide a thorough understanding of the embodiments described herein. However, it will be understood by those of ordinary skill in the art that the embodiments described herein can be practiced without these specific details.
Referring to
Some of the possible elements of the system 100 are shown in
The various elements of the system 100 can be communicatively linked to each other (or any combination thereof) through one or more communication networks 190. As used herein, the term “communicatively linked” can include direct or indirect connections through a communication channel or pathway or another component or system. A “communication network” means one or more components designed to transmit and/or receive information from one source to another. The data store(s) 140 and/or one or more of the elements of the system 100 can include and/or execute suitable communication software, which enables the various elements to communicate with each other through the communication network and perform the functions disclosed herein.
The one or more communication networks 190 can be implemented as, or include, without limitation, a wide area network (WAN), a local area network (LAN), the Public Switched Telephone Network (PSTN), a wireless network, a mobile network, a Virtual Private Network (VPN), the Internet, and/or one or more intranets. The one or more communication networks 190 further can be implemented as or include one or more wireless networks, whether short range (e.g., a local wireless network built using a Bluetooth or one of the IEEE 802 wireless communication protocols, e.g., 802.11a/b/g/i, 802.15, 802.16, 802.20, Wi-Fi Protected Access (WPA), or WPA2) or long range (e.g., a mobile, cellular, and/or satellite-based wireless network; GSM, TDMA, CDMA, WCDMA networks or the like). The communication network(s) 190 can include wired communication links and/or wireless communication links. The communication network(s) 190 can include any combination of the above networks and/or other types of networks.
Each of the above noted elements of the system 100 will be described in turn below. The system 100 can include one or more vehicles 110. “Vehicle” means any form of motorized transport, now known or later developed. Non-limiting examples of the vehicle(s) 110 include automobiles, watercraft, aircraft, spacecraft, or any other form of motorized transport. The vehicle(s) 110 may be operated manually by a human driver, semi-autonomously by a mix of manual inputs from a human driver and autonomous inputs by one or more vehicle computers, or fully autonomously by one or more vehicle computers. In at least some instances, the vehicle(s) 110 can be configured to switch between two or more of these operational modes.
The vehicle(s) 110 can include one or more processors, one or more data stores, and one or more sensors. “Sensor” means any device, component and/or system that can detect, determine, assess, monitor, measure, quantify and/or sense something. The one or more sensors can detect, determine, assess, monitor, measure, quantify and/or sense in real-time. As used herein, the term “real-time” means a level of processing responsiveness that a user or system senses as sufficiently immediate for a particular process or determination to be made, or that enables the processor to keep up with some external process.
In arrangements in which the vehicle(s) 110 includes a plurality of sensors, the sensors can work independently from each other. Alternatively, two or more of the sensors can work in combination with each other. In such case, the two or more sensors can form a sensor network.
The vehicle(s) 110 can include any suitable type of sensor. Various examples of different types of sensors will be described herein. However, it will be understood that the embodiments are not limited to the particular sensors described.
The vehicle(s) 110 can include one or more vehicle sensors. The vehicle sensor(s) can detect, determine, assess, monitor, measure, quantify and/or sense information about the vehicle(s) 110 (e.g., position, orientation, speed, driver/computer inputs, settings, etc.). For example, in one or more arrangements, the vehicle sensor(s) can include accelerometers, gyroscopes, inertial measurement unit (IMU) sensors, speedometers, yaw rate sensors, pedal position/pressure sensors, steering wheel position sensors, engine sensors, and/or other suitable sensors. In one or more arrangements, the vehicle sensor(s) can include a global navigation satellite system (GNSS), a global positioning system (GPS), a navigation system (which can be the navigation system, and/or other suitable sensors.
The vehicle(s) 110 can include one or more environment sensors configured to acquire, detect, determine, assess, monitor, measure, quantify and/or sense driving environment data. “Driving environment data” includes and data or information about the external environment in which a vehicle is located or one or more portions thereof. As an example, in one or more arrangements, the environment sensors can include one or more cameras, one or more radar sensors, one or more LIDAR sensors, one or more sonar sensors, and/or one or more ranging sensors. The environment sensors can be detect, determine, assess, monitor, measure, quantify and/or sense, directly or indirectly, the presence of one or more obstacles in the external environment of the vehicle(s) 110 and information about such obstacles (e.g., position, distance, speed, etc.).
Driving environment data acquired by one or more sensors of the vehicle 110 can be stored as a driving log 120. Such a driving log may be referred to herein as a real-world driving log, indicating that the driving log includes data acquired in the real world (as opposed to simulation data). The driving logs 120 can include raw sensor data without any associated labels. The driving logs 120 can be sent to one or more other elements of the system 100.
The system 100 can include one or more processors 130. “Processor” means any component or group of components that are configured to execute any of the processes described herein or any form of instructions to carry out such processes or cause such processes to be performed. The processor(s) 130 may be implemented with one or more general-purpose and/or one or more special-purpose processors. Examples of suitable processors include microprocessors, microcontrollers, DSP processors, and other circuitry that can execute software. Further examples of suitable processors include, but are not limited to, a central processing unit (CPU), an array processor, a vector processor, a digital signal processor (DSP), a field-programmable gate array (FPGA), a programmable logic array (PLA), an application specific integrated circuit (ASIC), programmable logic circuitry, and a controller. The processor(s) 130 can include at least one hardware circuit (e.g., an integrated circuit) configured to carry out instructions contained in program code. In arrangements in which there is a plurality of processors 130, such processors can work independently from each other or one or more processors can work in combination with each other.
The system 100 can include one or more data stores 140 for storing one or more types of data. The data store 140 can include volatile and/or non-volatile memory. Examples of suitable data stores 140 include RAM (Random Access Memory), flash memory, ROM (Read Only Memory), PROM (Programmable Read-Only Memory), EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), registers, magnetic disks, optical disks, hard drives, or any other suitable storage medium, or any combination thereof. The data store 140 can be a component of the processor(s) 130, or the data store 140 can be operatively connected to the processor(s) 130 for use thereby. The term “operatively connected,” as used throughout this description, can include direct or indirect connections, including connections without direct physical contact.
In one or more arrangements, the system 100 can include map data. The map data can be stored in one or more of the data stores 140. The map data can include maps of one or more geographic areas. In some instances, the map data can include information or data on roads, traffic control devices, road markings, structures, features, and/or landmarks in the one or more geographic areas. The map data can be in any suitable form. In some instances, the map data can include aerial views of an area. In some instances, the map data can include ground views of an area, including 360 degree ground views. The map data can include measurements, dimensions, distances, and/or information for one or more items included in the map data and/or relative to other items included in the map data. The map data can include a digital map with information about road geometry. In one or more arrangement, the map data can information about the ground, terrain, roads, surfaces, and/or other features of one or more geographic areas. The map data can include elevation data in the one or more geographic areas. The map data can define one or more ground surfaces, which can include paved roads, unpaved roads, land, and other things that define a ground surface. The map data can be high quality and/or highly detailed. In one or more arrangements, the map data can be included in one or more of the data stores 140.
The system 100 can include one or more modules, at least some of which will be described herein. The modules can be implemented as computer readable program code that, when executed by a processor, implement one or more of the various processes described herein. One or more of the modules can be a component of the processor(s) 130, or one or more of the modules can be executed on and/or distributed among other processing systems to which the processor(s) 130 is operatively connected. One or more of the modules can be stored on one or more data stores 140. The modules can include instructions (e.g., program logic) executable by one or more processor(s) 130. Alternatively or in addition, one or more data stores 140 may contain such instructions.
In one or more arrangements, one or more of the modules described herein can include artificial or computational intelligence elements, e.g., neural network, fuzzy logic or other machine learning algorithms. Further, in one or more arrangements, one or more of the modules can be distributed among a plurality of the modules described herein. In one or more arrangements, two or more of the modules described herein or portions thereof can be combined into a single module.
The system 100 can include one or more auto-labeling modules 150.
The auto-labeling module(s) 150 can generate labels for the unlabeled driving logs using, at least in part, analysis-by-synthesis. Thus, the system 100 can include one or more analysis-by-synthesis modules 152. The analysis-by-synthesis module(s) 152 can include an analysis-by-synthesis engine coupled with a transfer mechanism. The analysis-by-synthesis module(s) 152 can be configured to perform any analysis-by-synthesis technique, now known or later developed. Generally, analysis-by-synthesis refers to a recognition process in which hypotheses are formulated and compared with input data until one of the hypotheses produces a match. The analysis-by-synthesis module(s) 152 can use a simulator (e.g., Unreal Engine available from Epic Games, Inc., Cary, N.C.). The simulator can be a combination of software and hardware configured to create a synthetic or simulated driving log of the external environment of the vehicle. The simulator can create, for example, a simulated image of a cityscape with objects therein (e.g., buildings, roads, people, vehicles, green spaces, etc.). The simulator can also create a set of labels (e.g., semantic segmentation labels) and/or privileged information (e.g., depth information, instance segmentation information, object detection information, optical flow information, etc.) that correspond to the simulated image or data. The simulator may internally compute the relationships between simulated objects depicted within the simulated image. The simulator may use real-world data acquired by one or more vehicle sensors.
The analysis-by-synthesis module(s) 152 can also use any suitable computer vision and computer graphics techniques, now known or later developed. Non-limiting examples of computer vision and computer graphics techniques include structure-from-motion algorithms, CAD model to 2D image fitting, de-rendering, and other techniques described such as those described in U.S. patent application Ser. No. 15/893,864 and U.S. Pat. No. 10,019,652, which are incorporated herein by reference.
Using real-world unlabeled driving logs, the analysis-by-synthesis module(s) 152 can generate a set of diverse simulated logs. The simulated driving logs can contain photo-realistic renderings of the observed real-world scenario along with ground truth labels computed programmatically. The analysis-by-synthesis module(s) 152 can use the simulator and computer vision and computer graphics techniques together to reconstruct the recorded driving scenes, or relevant parts thereof. The ground truth labels computed by the simulator (e.g., using its rendering and physics engine) can include semantic segmentation, depth, 2D and 3D object detection bounding boxes, object class labels, and other metadata that is directly accessible within the simulation engine. The resulting set of simulated logs along with their ground truth labels (“set of simulated tlogs” in
The auto-labeling module(s) 150 can generate labels for the unlabeled driving logs using, at least in part, simulation-to-real techniques. Thus, the system 100 can include one or more simulation-to-real modules 154. The simulation-to-real module(s) 154 can use any suitable technique for simulation-to-real transfer, now known or later developed. Referring to
As with any unsupervised procedure, a high quality in the output labels should be ensured and the introduction of noisy labels along the process should be avoided. To remedy this potential side-effect, a predictive post-processing step can be incorporated to enhance the overall label quality of the automatically labeled logs inferred from the learned machine learning model output. Accordingly, referring to
In some instances, the system 100 can include one or more external data sources 180. In one or more arrangements, the one or more external data sources 180 can include information about the external environment of the vehicle(s) 110. Examples of such other information include the time of day, weather conditions, road conditions, road construction, maps, etc. Information from the one or more external data sources 180, such as a remote server or data store, can be accessed by one or more elements of the system 100. Such information can be useful is various situations.
Now that the various potential systems, devices, elements and/or components of the system 100 for labeling visual data have been described, various methods will now be described. Various possible steps of such methods will now be described. The methods described may be applicable to the arrangements described above, but it is understood that the methods can be carried out with other suitable systems and arrangements. Moreover, the methods may include other steps that are not shown here, and in fact, the methods are not limited to including every step shown. The blocks that are illustrated here as part of the methods are not limited to the particular chronological order. Indeed, some of the blocks may be performed in a different order than what is shown and/or at least some of the blocks shown can occur simultaneously.
Referring to
Labels for these driving logs can be programmatically estimated by the auto-labeling module(s) 150. The driving logs and their automatically generated labels (“set of auto-labeled tlogs”) can be stored and indexed using standard database techniques in a data store, such as a cloud-based data store (e.g., Labeled Data Mart (LDM) 520), for future retrieval and training purposes as described herein. In one or more arrangements, the LDM 520 can be configured to manage all the labels available for any particular driving log (simulated and real-world logs).
Referring now to
At block 620, the unlabeled real-world driving log(s) can be automatically labeled to generate one or more labeled real-world driving logs. The automatic labeling can include analysis-by-synthesis on the one or more unlabeled real-world driving logs to generate one or more simulated driving logs. The automatic labeling can further include simulation-to-real automatic labeling on the one or more simulated driving logs and the one or more unlabeled real-world driving logs to generate one or more labeled real-world driving logs. The labeling can be performed by, for example, the auto-labeling modules 150, the analysis-by-synthesis module(s) 152, the simulation-to-real module(s) 154, and/or the processor(s) 130. The method 600 can continue to block 630.
At block 630, the one or more labeled real-world driving logs can be stored in one or more data stores of labeled driving logs. The method 600 can end. Alternatively, the method 600 can return to block 610 or to some other block. The method 600 can be repeated at any suitable point, such as at a suitable time or upon the occurrence of any suitable event or condition.
It should be noted that the labeled real-world driving logs can be used for various purposes. For instance, the labeled real-world driving logs can be used in connection with machine learning for training, validation, evaluation, and/or model management purposes. Referring to
Referring to
In some arrangements, a portion of the real-world driving logs can be held out for validating and testing the performance of the unsupervised trained machine learning model. One example of a manual quality assurance process is shown in
It will be appreciated that arrangements described herein can provide numerous benefits, including one or more of the benefits mentioned herein. For example, arrangements described herein can create a more streamlined and/or automated approach to labeling of driving logs. Arrangements described herein can enable the development of high performance algorithms by training supervised machine learning models on very large amounts of automatically labeled data. Arrangements described herein can reduce the need for human labeling of the driving logs. Arrangements described herein can continue to scale the various labeling procedures more efficiently to build a pipeline of labeled images to be processed more quickly and categorically. Arrangements described herein enables a scalable procedure to learn machine learning models using large volumes of unlabeled data collected by a fleet of vehicles, with little to no added cost of acquiring ground truth labels. Arrangements described here can result in a high quality data set that can be used for various purposes, such as testing vehicle driving software.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments. In this regard, each block in the flowcharts or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
The systems, components and/or processes described above can be realized in hardware or a combination of hardware and software and can be realized in a centralized fashion in one processing system or in a distributed fashion where different elements are spread across several interconnected processing systems. Any kind of processing system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software can be a processing system with computer-usable program code that, when being loaded and executed, controls the processing system such that it carries out the methods described herein. The systems, components and/or processes also can be embedded in a computer-readable storage, such as a computer program product or other data programs storage device, readable by a machine, tangibly embodying a program of instructions executable by the machine to perform methods and processes described herein. These elements also can be embedded in an application product which comprises all the features enabling the implementation of the methods described herein and, which when loaded in a processing system, is able to carry out these methods.
Furthermore, arrangements described herein may take the form of a computer program product embodied in one or more computer-readable media having computer-readable program code embodied or embedded, e.g., stored, thereon. Any combination of one or more computer-readable media may be utilized. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The phrase “computer-readable storage medium” means a non-transitory storage medium. A computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk drive (HDD), a solid state drive (SSD), a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), a digital versatile disc (DVD), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber, cable, RF, etc., or any suitable combination of the foregoing. Computer program code for carrying out operations for aspects of the present arrangements may be written in any combination of one or more programming languages, including an object oriented programming language such as Java™, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
The terms “a” and “an,” as used herein, are defined as one or more than one. The term “plurality,” as used herein, is defined as two or more than two. The term “another,” as used herein, is defined as at least a second or more. The terms “including” and/or “having,” as used herein, are defined as comprising (i.e. open language). The phrase “at least one of . . . and . . . ” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. As an example, the phrase “at least one of A, B and C” includes A only, B only, C only, or any combination thereof (e.g., AB, AC, BC or ABC).
Aspects herein can be embodied in other forms without departing from the spirit or essential attributes thereof. Accordingly, reference should be made to the following claims, rather than to the foregoing specification, as indicating the scope of the invention.