Certain aspects of the present disclosure generally relate to systems and methods of driving monitoring. Certain aspects are directed to systems and methods for improving the operation of a vehicle in the vicinity of traffic control devices, such as traffic signs. In addition, certain aspects are particularly directed to systems and methods for constructing a geolocated database of permissible and prohibited U-Turn maneuvers. In addition, certain aspects are particularly directed to improving U-Turn driving behavior monitoring systems and methods, including by using a geolocated database.
Internet-of-things (IOT) applications may include embedded machine vision for intelligent driver or driving monitoring systems (IDMS), advanced driving assistance systems (ADAS), autonomous driving systems, camera-based surveillance systems, smart cities, and the like. A user of IOT systems in a driving context may desire accurate scene comprehension and driving behavior recognition in the vicinity of substantially all traffic control devices. Scene comprehension may be considered a form of image recognition that may, in some cases, involve recognition of objects within an image.
Real-world deployments of such systems may face challenges, however, due to several factors. One class of challenges relates to variability in the placement of traffic control devices, such as U-turn signs, relative to driving locations where the sign is intended to apply. Another class of challenges relates the use of subtle distinctions in road signage, wherein small differences in the visual appearance of two signs may correspond to very different or even opposite meanings for traffic control.
The present disclosure is directed to methods that may overcome challenges associated with deploying driving systems, which may include IDMS, ADAS, or autonomous driving systems that are expected to work in all geographical locations in a country.
Certain aspects of the present disclosure generally relate to providing, implementing, and using a method of creating and/or using a road location or vehicle trajectory and traffic sign database. Certain aspects are directed to quickly and accurately identifying challenging real-world variations in road scenes, wherein a detected traffic control device may correspond to more than one traffic control device of a visually similar family of traffic control devices. A family of traffic signs that control U-turn driving behaviors, for example, may include a traffic sign that indicates that a U-turn is permissible and a traffic sign that indicates that a U-turn sign in not permissible, and the two signs may appear similar to each other in certain scenarios. Certain aspects are further directed to constructing and/or using databases of observed traffic control devices, so that precision and recall of the driving system may improve into the increasingly challenging cases that one encounters on the long-tail distribution of traffic phenomena.
Certain aspects of the present disclosure provide a method. The method generally includes detecting, by at least one processor of a computing device, a traffic control device in an image, wherein the image was captured by a camera, and wherein the camera is mounted on or in a vehicle; determining, by the at least one processor, a location of the vehicle; querying, by the at least one processor, a database to produce a query result, wherein the query is based on the location of the vehicle; and determining, by the at least one processor, whether the traffic control device does not apply to the vehicle or the traffic control device does apply to the vehicle based on the query result.
Certain aspects of the present disclosure provide a computer program product for classifying video data captured by a camera in a vehicle. The computer program product generally includes a non-transitory computer-readable medium having program code recorded thereon, the program code comprising program code to detect a traffic control device in an image, wherein the image was captured by a camera, and wherein the camera is mounted on or in the vehicle; determine a location of the vehicle; query a database to produce a query result, wherein the query is based on the location of the vehicle; and determine whether the traffic control device does not apply to the vehicle or the traffic control device does apply to the vehicle based on the query result.
The detailed description set forth below, in connection with the appended drawings, is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of the various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form in order to avoid obscuring such concepts.
Based on the teachings, one skilled in the art should appreciate that the scope of the disclosure is intended to cover any aspect of the disclosure, whether implemented independently of or combined with any other aspect of the disclosure. For example, an apparatus may be implemented or a method may be practiced using any number of the aspects set forth. In addition, the scope of the disclosure is intended to cover such an apparatus or method practiced using other structure, functionality, or structure and functionality in addition to or other than the various aspects of the disclosure set forth. It should be understood that any aspect of the disclosure disclosed may be embodied by one or more elements of a claim.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
Although particular aspects are described herein, many variations and permutations of these aspects fall within the scope of the disclosure. Although some benefits and advantages of the preferred aspects are mentioned, the scope of the disclosure is not intended to be limited to particular benefits, uses or objectives. Rather, aspects of the disclosure are intended to be broadly applicable to different technologies, system configurations, networks and protocols, some of which are illustrated by way of example in the figures and in the following description of the preferred aspects. The detailed description and drawings are merely illustrative of the disclosure rather than limiting, the scope of the disclosure being defined by the appended claims and equivalents thereof.
Certain aspects of the present disclosure are directed to searching visual data, such as video streams and images, captured at one or more devices. The number of devices may be denoted as N. The value of N may range from one (a single device) to billions. Each device may be capturing one or more video streams, may have captured one or more video streams, and/or may capture one or more video streams in the future. It may be desired to search the video streams in a number of these devices. In one example, a user may desire to search all of the captured video streams in all of the N devices. In another example, a user may desire to search a portion of the video captured in a portion of the N devices. A user may desire, for example, to search a representative sample of devices in an identified geographical area. Alternatively, or in addition, the user may desire to search the video captured around an identified time.
A search query may include an indication of specific objects, objects having certain attributes, and/or a sequence of events. Several systems, devices, and methods of detecting objects and events (including systems and methods of detecting safe and unsafe driving behaviors as may be relevant to an IDMS system) are contemplated, as described in PCT application PCT/US17/13062, entitled “DRIVER BEHAVIOR MONITORING”, filed 11 Jan. 2017, which is incorporated herein by reference in its entirety.
Examples of distributed video search may include an intelligent driver monitoring system (IDMS), where each vehicle has an IDMS device and the user/system may search and retrieve useful videos constructing a database in support of driver safety functionality. An example of useful videos may be videos in which traffic control devices may appear partially or fully obscured, such as videos of road scenes with visible snow, videos in which there are visible pedestrians, videos in which there are visible traffic lights, or videos corresponding to certain patterns of data on non-visual sensors. Examples of patterns of data in non-visual sensors may include inertial sensor data corresponding a U-Turn maneuver by a vehicle, which is a change of direction on a roadway. A database may be constructed based on detected U-Turn maneuvers, detected U-Turn signage, and combinations thereof. Other examples of non-visual sensor data may include system monitoring modules. A system monitoring module may measure GPU utilization, CPU utilization, memory utilization, temperature, and the like. In some embodiments, a video search may be based solely on data from non-visual sensors, which may be associated with video data. Alternatively, or in addition, a video search may be based on raw, filtered, or processed visual data.
In one embodiment, the number of devices receiving a search query may be limited to a subset of the available devices. For example, the cloud may transmit the search query to devices that are in a particular geographic location. In some embodiments of the present disclosure, the location of a device where video data is stored may be correlated with the location where the video data was captured. In one example, a search query may be broadcast from a number of cell phone towers corresponding to the desired location of the search. In this example, the search query may be restricted to the devices that are within range of the utilized cell phone towers. In another example, the cloud server may keep track of the location of each connected device. Upon receiving a search query, the cloud server may limit the transmission of the search queries to devices that are in a given geographical region. Likewise, the cloud server may restrict the transmission of the search query to devices that were in a given geographical region for at least part of a time period of interest.
In another embodiment, a video or image search request may specify a particular location. For example, a search may request images of all vehicle trajectories in the vicinity of a traffic control device, such as a traffic sign, at a particular time. According to certain aspects of the present disclosure, certain location specific search efficiencies may be realized. For example, a search request may be sent to devices embedded within security cameras on or near the traffic control device in question. Furthermore, a search request may be sent to traffic lights or gas stations in the vicinity of the traffic control device if there were enabled devices at those locations that may have collected video data, as described above. In addition, a search request may be sent to all vehicle-mounted devices that may have travelled near the traffic control device in question around the time of interest.
A centralized database may be partitioned so that videos from different countries or regions are more likely to be stored in data centers that are geographically nearby. Such a partitioning of the data may capture some of the efficiencies that may be enabled according to the present disclosure. Still, to enable a search of one traffic control device and its surrounding environment, it may be necessary to store video data from substantially all traffic control devices that a user might expect to search. If the number of search requests per unit of recorded video is low, this approach could entail orders of magnitude more data transmission than would a system of distributed search in which the video data is stored at locations that are proximate to their capture. In the latter system, only the video data that is relevant to the search query would need to be transferred to the person or device that formulated the query. Therefore, in comparison to a system that relies on searching through a centralized database, a system of distributed video search may more efficiently use bandwidth and computational resources, while at the same time improving the security and privacy of potentially sensitive data.
Certain aspects of the present disclosure may be directed to visual search that is based on certain objects or events of interest without regard to the location where they were collected. Likewise, a search query may request examples of a particular pattern in visual data, such as visual data, which may be images or image sequences, corresponding to on-device detection of traffic control devices belonging to particular families of traffic control devices. Database construction based on such visual searches and may further request that the examples represent a range of geographical locations.
While machine learning has been advancing rapidly in recent years, one hindrance to progress has been the availability of labeled data. In safety critical applications such as autonomous driving, for example, a particular issue relates to the availability of data that reflects rare but important events. Rare but important events may be referred to as “corner cases.” Control systems may struggle to adequately deal with such events because of the paucity of training data. Accordingly, certain aspects of the present disclosure may be directed to more rapidly identifying a set of training images or videos for a training sample in the context of computer vision development. Likewise, such systems may be improved by constructing and maintaining a database of geolocated visual observations, since such a database may be used to improve the precision of scene comprehension at the recorded locations.
In one example, it may be desirable to automatically detect when a driver performs a U-Turn. A deep learning model may be trained to detect such activity from a set of labeled video and associated positional sensor data captured at cars at times that the driver performed a U-Turn or driving maneuvers that resemble a U-Turn.
A U-Turn detection model may be formulated and deployed on devices that are connected to cameras in cars. The device may be configured to detect that a driver has made a U-Turn in the presence of a traffic sign that controls U-Turn behavior in the vicinity. Such traffic signs may expressly prohibit a U-Turn (No U-Turn traffic sign) or expressing permit (U-Turn permitted traffic sign) a U-Turn in the vicinity of the sign. In the early development of the detection model, there may be many false alarms. For example, U-Turns may be incorrectly detected, or the U-Turn maneuver in the presence of a U-Turn sign may be incorrectly classified as a U-Turn violation due to a misclassification of the traffic sign. Thus, a putative detection of a U-Turn in the presence of a No U-Turn sign may actually correspond to a safe and normal driving behavior because the detection of the traffic control device as a “No U-Turn” sign was erroneous. In one approach to developing such a detection model, a set of devices with the deployed model may transmit detects (both true and false) to a centralized server for a period of two weeks. Based on the received detections, the model may be iteratively refined and re-deployed in this manner.
In addition, or alternatively, in accordance with certain aspects of the present disclosure, a U-Turn detection model may be deployed on devices. Rather than wait for two weeks, however, the U-Turn detection model could be made part of a search query to each of the devices. Upon receiving the search query, each device may reprocess its local storage of data to determine if there have been any relevant events in the recent past. For example, the device may have a local storage that can accommodate two to four weeks of driving data. In comparison to the first approach described above which had two-week iteration cycles, this approach using distributed video search on the locally stored data of edge devices could return example training videos within minutes or hours.
Likewise, subsequent iterations of the detection model may be deployed as search requests to a non-overlapping set of target devices. In this way, each two-week cycle of machine learning development could substantially eliminate the time associated with observing candidate events. Similarly, subsequent iterations of road scene comprehension logic may feature updates to a geolocated U-Turn sign family database. The comprehension logic may rely on the updated database to improve precision at previously (or sufficiently recently, etc.) visited locations.
Furthermore, rather than re-processing all of the video stored on each local device, the search query may be processed based on stored descriptors, as described above. Likewise, a search query may entail re-processing a sample of the locally stored videos, in which the subsample may be identified based on a search of the associated descriptor data.
Certain aspects of the present disclosure are directed to constructing a database of permissible and prohibited U-Turn maneuvers organized by road segment identifiers. Road segment identifiers may indicate portions of a road in which all or substantially all traffic tends to move in one direction. For example, a two-lane road supporting two directions (eastbound and westbound) of traffic may be identified by two road identifiers. The first road segment identifier may be associated with eastbound traffic and the second road segment identifier may be associated with westbound traffic. Road segment identifiers may extend over a behaviorally relevant distance. For example, road segment identifiers may extend up to an intersection, so that if a vehicle travels eastbound through an intersection, the driver will travel from a portion of road associated with a first road segment identifier, and then to a portion of road associated with a third road segment identifier, where the third road segment identifier is associated with a portion of the road on the far (Eastern) side of the intersection. Likewise, if a driver performs a U-Turn at the intersection, he may travel from the portion of the road associated with the first road segment identifier (which is a portion of road associated with eastbound traffic), and to a portion of the road associated with the second road segment identifier (which is a portion of the same road and on the same side of an intersection as the first road segment, but that is associated with westbound traffic).
A driving monitoring system may detect a U-Turn maneuver in various ways. For example, a driving monitoring system may integrate inertial signals to determine that a monitored vehicle make an approximately 180-degree turn in a short time span. Alternatively, or in addition, a driving monitoring system may process a sequence of GPS position and/or heading estimates and thereby determine that a U-Turn was performed. In some embodiments, a driving monitoring system may include a camera system, which may capture images of a scene in front of the vehicle. Processing of image or video data may result in a detection of a traffic sign indicating the permissibility of U-Turns in the vicinity. Common U-Turn signs include “No U-Turn” signs, “U-Turn permitted” signs, and “Conditional U-Turn signs” which may be of either of the preceding types, and may further indicate a condition such as a time of day or days of the week when U-Turns are permitted or prohibited, and/or one or more vehicle class to which the sign applies. For example, a U-Turn may only apply to trucks, may apply to all vehicles except buses, and the like. To determine if such a U-Turn sign applies to the vehicle from which the sign was detected, it may be necessary to determine the class of the vehicle in question.
As part of a driving monitoring system, U-Turn behavior may be monitored. In one example, detecting a U-Turn maneuver in the presence of an applicable “No U-Turn” sign may be detected as a safety violation. Upon detecting such a safety violation, a system may be configured to transmit a video record of the event and/or a data representation of the event to a remote server, to a smartphone app, and the like. Such events may then be incorporated into a driver training program to modify habitual driving behavior in the vicinity of U-Turn signs, may be incorporated into a testing regimen for an autonomous vehicle controller, and the like.
Due to the variations in U-Turn traffic signs, as well as the conditions that may be associated with them, and in further view of challenges associated with properly determining that a detected U-Turn traffic sign is applicable to a vehicle from which the U-Turn traffic sign was observed, there may be significant challenges to accurately detecting driving safety violations with respect to U-Turn traffic signs.
As shown in the Table in
The remaining 62 putative violations were “false positives.” In most of these false positive cases, the driver was correctly determined to have performed a U-Turn and a U-Turn sign was also correctly determined to be visible in video data captured around the time of the U-Turn maneuver. However, for various reasons, some of which are detailed below, the U-Turn maneuver was not a U-Turn violation.
Returning again to the Table in
Solutions that increase the overall precision of road scene classification, particularly with respect to permissible vehicle trajectories in the vicinity of traffic signs, may have multiple technical effects. First, increased precision may improve a navigational system. After a driver misses a turn, for example, a navigational system may indicate where along the current road a U-Turn is or is not permissible. Second, increased precision may improve a safety system that acts to improve habitual driving behaviors. For example, a safety system may issue audible feedback after a driver completes a U-Turn at a location where U-Turns are not permitted. Such immediate feedback may act on brain circuits that underly habitual behaviors and thereby lessen the valence of an unsafe driving habit. Third, increased precision can make a driver assistance system more effective. If a driver slows in a road way where other drivers tend to perform U-Turns at an elevated rate, but which is governed by a No U-Turn sign, the driver assistance system could alert the driver prior to making a U-Turn that no U-Turn is permitted at that location, and thereby nudge the driver to avoid making an unsafe driving maneuver at that location.
These “rules of the road” may be collected together in a database format in accordance with certain aspects of the present disclosure. In one embodiment, video data, which may include the image 300 associated with the U-Turn false alarm alert, may be reviewed by a human reviewer and then two entries may be made into a database. For example, a driver may interact with video captured around the time of the vehicle event via a smartphone app. The driver may indicate that the vehicle event should be reviewed and/or may reject the automated characterization of the vehicle event. Alternatively, or in addition, such events may be further processed by a neural network that has been trained on examples of true positive and false positive U-Turn violation video data. The table in the bottom right corner of
Based on the vehicle event illustrated in
These same facts may be added to the database in an automated fashion. For example, a visual perception engine may be able to determine that the No U-Turn sign is applicable to trajectories such as trajectory 302b illustrated in the bottom left panel. Here, the bottom left panel illustrates a travel path 302b observed at the same location but at a different time or from a different vehicle. For this travel path 302b, the observable No U-Turn Sign 320 is applicable and indicates that U-Turn 302b was a violation.
According to certain embodiments, a database look up may be performed in the cloud. The example illustrated in the bottom left panel, from road segment 314b to road segment 316b may be “confirmed” by a “Second Pass Analytics” (SPA). SPA may refer to a “microservice” that may receive trajectory data and then determine an associated sequence of road segments. The SPA service may then query the database as just described, and may then determine that a U-Turn was or was not permitted. In this example, a subsequent U-Turn as illustrated in the top right panel would be “rejected” by SPA, meaning that the putative alert would be determined to be a “false alarm” and therefore, not a violation. Future U-Turns like the one depicted in the bottom left, however, would be “confirmed” by SPA, meaning that they would be included in a set of U-Turn violations that may be presented to a driver and/or safety manager in the context of an Intelligent Driving Monitoring System, may be used to determine the appropriateness of an autonomous vehicle controller, provide a navigational instruction, and the like.
The top left panel of
Here, the database schema, relative to the one shown in
Another error mode is illustrated in the bottom left panel. As per the entries in the second row, a vehicle trajectory from first road segment identifier 414 to second road segment identifier 416 may be a U-Turn, but at a distance away from the location recorded in the first row. While the U-Turn trajectory recorded in the second row is near to the No U-Turn sign 420, that No U-Turn sign does not apply to U-Turns that are performed at this location. A driver monitoring system may still detect a putative alert based on the elapsed time between the detection of the No U-Turn sign and the detection of the U-Turn maneuver. If this occurs, however, an entry may be made into the database to indicate that there is no detectable or applicable U-Turn sign in that location. This may be encoded with a “None” entry for all columns associated with first road segment identifier 414 and second road segment identifier 416. In this way, a database system may be used to aid a navigation system not just to avoid U-Turns at locations where they are not permitted, but also to suggest U-Turns at nearby locations where U-Turns are permitted.
As used herein, the term “determining” encompasses a wide variety of actions. For example, “determining” may include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database, or another data structure), ascertaining, and the like. Additionally, “determining” may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory), and the like. Furthermore, “determining” may include resolving, selecting, choosing, establishing, and the like.
As used herein, a phrase referring to “at least one of” a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover: a, b, c, a-b, a-c, b-c, and a-b-c.
The methods disclosed herein comprise one or more steps or actions for achieving the described method. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is specified, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims.
The processing system may be configured as a general-purpose processing system with one or more microprocessors providing the processor functionality and external memory providing at least a portion of the machine-readable media, all linked together with other supporting circuitry through an external bus architecture. Alternatively, the processing system may comprise one or more specialized processors for implementing the neural networks, for example, as well as for other processing systems described herein.
Thus, certain aspects may comprise a computer program product for performing the operations presented herein. For example, such a computer program product may comprise a computer-readable medium having instructions stored (and/or encoded) thereon, the instructions being executable by one or more processors to perform the operations described herein. For certain aspects, the computer program product may include packaging material.
Further, it should be appreciated that modules and/or other appropriate means for performing the methods and techniques described herein can be downloaded and/or otherwise obtained by a user terminal and/or base station as applicable. For example, such a device can be coupled to a server to facilitate the transfer of means for performing the methods described herein. Alternatively, various methods described herein can be provided via storage means (e.g., RAM, ROM, a physical storage medium such as a compact disc (CD) or floppy disk, etc.), such that a user terminal and/or base station can obtain the various methods upon coupling or providing the storage means to the device. Moreover, any other suitable technique for providing the methods and techniques described herein to a device can be utilized.
It is to be understood that the claims are not limited to the precise configuration and components illustrated above. Various modifications, changes and variations may be made in the arrangement, operation and details of the methods and apparatus described above without departing from the scope of the claims.
This application claims the benefit of and priority to U.S. Provisional Patent Application No. 63/216,469, filed on Jun. 29, 2021, and entitled TRAJECTORY AND TRAFFIC SIGN DATABASE SYSTEM, the contents of which are incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/035499 | 6/29/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63216469 | Jun 2021 | US |