Vehicle computers and mobile devices within vehicles often run software that collects and forwards travel-related data to a data collection service. Travel-related data usually includes, among other things, locations and corresponding times captured during travel. The data collection service receives and stores the travel data from the vehicles, and the aggregate travel data from many vehicles or mobile devices may have practical value. Aggregate travel data can be used for traffic analysis, characterizing roads in a road network, supplementing the artificial intelligence of an automated driving system, improving road navigation systems, and so forth.
Along with the many beneficial uses of aggregated travel data come privacy and security concerns. Travel-related data reported from a vehicle may also have information about a vehicle, a driver, or other metadata that may be considered private or sensitive. Even if personally identifiable information is anonymized from travel-related data before its aggregation, it can be difficult to guarantee that personally identifiable information cannot be inferred from the travel data as a whole. Data mining techniques combined with large databases of diverse personally identifiable information can potentially be used to infer the identities of specific persons in aggregated travel data and thus link specific persons to specific travel data, even if that travel data has been theoretically anonymized in isolation.
The reporting of travel-related data has been recognized as a privacy and security concern. In some places, privacy rules reflect a consensus that people prefer not to have it known where they have been and when. Consequently, most software that reports and collects travel-related data for aggregation and long-term use usually removes personally identifiable information or applies some means for anonymization. Nonetheless, prior techniques for anonymizing travel-related data have not protected some sensitive aspects of users' travel data.
The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein like reference numerals are used to designate like parts in the accompanying description.
Embodiments discussed below relate to partially obscuring travel data. As a computing device travels on a trip from a starting location to a destination location, geographic coordinates are captured and stored. The geographic coordinates are tested against an obscuring condition to determine a portion of the geographic coordinates that satisfy the obscuring condition. Based on the determined portion of geographic coordinates satisfying the obscuring condition, the geographic coordinates in the determined portion of the geographic coordinates are altered to reduce their ability to identify personal information about a driver. The computing device transmits the altered geographic coordinates and the unaltered geographic coordinates that did not satisfy the obscuring condition.
The travel data includes locations of the vehicles 102 captured by the computing devices 100 as they travel and then transmitted to the cloud service 108. The travel data may also include other information such as timestamps of the respective vehicle locations (that is, the times when the location data is captured), vehicle information, sensor information (for example, pictures of road surfaces or vehicle surroundings, vehicle speeds, etc.), driver information, and so forth. In some embodiments, the travel data 104 is anonymized before being transmitted by the computing devices 100. That is, personally identifiable information may be scrubbed from the travel data 104 before transmission to the cloud service 108. For example, identities of persons or vehicles may be replaced with hashes of travel data or ephemeral random identifiers, either allowing travel data to be associated with a single anonymous person or vehicle. In one embodiment, a travel data 104 record from a vehicle may include only a vehicle location and some identifier such as an identifier of a discrete trip, an identifier of a vehicle, etc.
The aggregate travel data 110 is stored by the cloud service 108. The cloud service 108 may itself make use of the aggregate travel data 110. The cloud service 108 may provide the aggregate travel data 110 to other applications 112, for example, a road network database, a traffic analysis system, an autonomous-driving support system, and so forth.
As noted above, even if the travel data is anonymized by omitting or replacing personally identifiable information, it may be possible to infer identities of persons within the aggregate travel data 110, and, therefore, the places that identified people travel to and from. For example, trips frequently beginning from a specific home address can be used to associate those locations with a small possible number of computing devices 100, vehicles 102, and households, effectively de-anonymizing the travel data 104. Embodiments described herein may determine to partially obscure (for example, reduce the accuracy or precision of) some locations within the travel data that are determined to meet various conditions such as being near the beginning or ending of trips, being within predefined geographical regions (geofence regions), or others. A geofence region may be at least in part defined prior to the trip starting. Further, the geofence region may be based at least in part on user input (for example, a region selected by a user), automatically defined based on vehicle operation (for example, based on one or both of the engine starting and/or stopping) defined based on crowd sourcing, defined based on road construction, and/or based on a property boundary (for example, a military base, a campus, etc.).
In the case of a cloud service that uses the travel data to analyze or characterize a road network, for example, it is likely that the reduced precision or accuracy of small portions of aggregate travel data 110 will be compensated for by non-obscured location data at or near the obscured locations. That is to say, for a given trip/vehicle, even if there are areas for the given trip/vehicle that are determined to require location obscuring for portions of that trip/vehicle, those areas may not require location obscuring for other trips/vehicles and therefore, overall, there may be an adequate amount of accurate location data within those areas. For example, if locations at an origin area or destination area of one trip are obscured, locations of other trips that merely pass through those areas may not be obscured (as used herein, “destination” means a trip endpoint, regardless of whether the endpoint was set in advance or whether the endpoint was merely the point where a trip ended). Or, even if one vehicle is associated with a geofenced obscuring area, and therefore, provides less-accurate or less-precise locations for the geofenced area, other vehicles may not have a geofence for that area and may therefore potentially provide accurate locations for that area. Not only may the aggregate travel data 110 be accurate and precise on the whole despite partial location obscuring, as discussed below, locations whose fidelity has been reduced may be flagged to allow those locations to be specially handled by software making use of the aggregate travel data 110.
A location service 126 executes on the computing device 100. The location service 126 may be implemented in many ways. The location service 126 may be an application dedicated to collecting and reporting location data. The location service 126 may be an operating system component configured to provide location data to any applications executing on the computing device 100. The location service 126 may be a software shim operating between the GPS module 120 and some other location-using application executing on the computing device 100 or elsewhere. The location service 126 may be a component of a road-reporting system where vehicles provide location data to a cloud service that characterizes roads in a road network. The location service 126 may be any application that obtains geographic locations 124 and transmits them via the wireless network 106. Regardless of its other functions or whether it is a system component or application, the location service 126 may perform step 130 of partially obscuring some of the geographic locations 124 received from the GPS module 120.
Step 130 includes receiving the geographic locations 124. In some embodiments, for example when it cannot be immediately determined whether to obscure a given geographic location (for example, determining a destination of a trip in progress), the geographic locations may be cached in a buffer 132. In embodiments where it can be immediately determined whether a geographic location needs to be obscured (for example, a user may have activated an “obscuring mode” during which all captured locations are obscured), the geographic locations 124 may not need to be cached. Step 130 further includes determining whether geographic locations 124 are to be obscured. This may involve, for example, determining whether a given geographic location 124 is within an area associated with an origin of a trip, is within an area associated with a destination of a trip, or is within a geofenced area, to name some examples. Any method for defining location-obscuring areas may be used. In other words, many types of conditions may be used to define when to perform obscuring or to define which geographic locations are to be obscured. Step 130 also involves obscuring the geographic locations 124 that have been determined to need obscuring. That is, whichever geographic locations 124 are determined to need obscuring, any of a variety of techniques (discussed in detail below) may be used to obscure those geographic locations 124. In addition to obscuring those determined locations, the obscured locations may be flagged to inform downstream users that the locations have been obscured. Finally, step 130 involves transmitting the partially obscured location data 132 to the cloud service 108. This transmission may be in real-time or near real-time, periodic (for example, daily or weekly), or based on a trigger (for example, at the conclusion of a trip). Location data not obscured due to not falling within an obscuring area may also be transmitted with it is original accuracy and precision.
At step 134 the cloud service 108 receives the partially obscured location data 132. In analyzing or otherwise using the aggregated location data, the cloud service 108 may specially handle any locations flagged as having been obscured. For example, such locations might be disregarded by some algorithms, they might be given less weight if used as training data, and so on. Nonetheless, obscured locations may have practical value for many applications. For example, aggregate latitudes and longitudes within a given area, even when truncated to two digits, may be relevant for determining usage of some ADAS (advanced driver-assist systems) features such as automated parking.
Obscuring conditions may be global, specific to individual users or vehicles, specific to individual trips, and so forth. A user may have one or more associated obscuring conditions that are applied to that user's captured trip locations. There may also be obscuring conditions that are applied to all users, or some that are applied to a subset of users.
In some embodiments, obscuring conditions may be managed by a cloud service that provides the obscuring conditions to the computing devices 100. For example, obscuring conditions may come from crowd-sourced data, where an obscuring condition is auto-applied to a location based on a threshold plurality of users having enabled obscuring for that location. Obscuring conditions may also be managed by the computing devices 100. In sum, the location service 126 may perform step 149 of determining which trip locations 148 of the trip 140 stored in the buffer 132 are to be obscured by determining which trip locations 148 satisfy one or more obscuring conditions.
In the example of
Regarding the buffer 132, as a vehicle 100 travels the trip 140, in some embodiments the location service 126 may accumulate the locations 148 into the buffer 132 where they are retained to be checked, partially obscured, and transmitted sometime after the trip has finished. Caching may be helpful because in some cases it may not be possible or desirable to determine whether a trip location needs to be obscured at the time when the trip location is captured. For example, it may not be possible to know whether which captured trip locations are near the destination because the destination may not be known until the trip has ended. Or, it may be desirable to periodically (for example daily) batch-process trip location data. Therefore, it may be helpful to store the trip locations 148 in the buffer 132 before obscuring them and transmitting them to the cloud service.
While truncation is an effective and efficient technique, other obscuring techniques may be used. A location's coordinates may be truncated and also randomized A location may be moved in a random direction and a random distance (within some range of distance, for example half a mile to one mile). A coordinate may have its accuracy reduced by a rounding function (which will include at least some truncation). Any technique which reduces the granularity or accuracy of a coordinate will suffice.
Geohashing is another technique that may be used to reduce the granularity or accuracy of a location. With geohashing, some or all trip locations may be encoded as geohashes. A geohash of a vehicle location indicates that the vehicle was within a certain polygon (often a square). A geohash encoding of a location's coordinates indicates that a vehicle was in a specific area and the size of the area can be varied according to the geohash value, thus allowing location obscuring. Non-obscured locations may have precise geohashes, for example, on the order of several meters. Obscured locations may be geohashed to represent larger areas, for example several kilometers. The size of polygons for obscured locations may be specifically established so as to coincide with geographic features such as a town, neighborhood, subdivision, etc. As used herein, “coordinate” and “location” refer to traditional geographical coordinates as well as any encoding of a coordinate, including for example geohashes.
Regardless of the obscuring technique, the degree of reduced accuracy or precision may vary depending on factors such as the road density at the target location (higher road density may not require as much obscuring) or a geographic characteristic of the location (for instance, a neighborhood). In one embodiment, displaced locations may be “snapped” to a nearest road. In yet another embodiment, locations may be displaced with a simulated randomized drive, for example, a drive from the original location with random turns that travels within some range of distance. In another embodiment, there may be a grid of pre-defined global “snap-to” locations, and a location is obscured by setting it to the nearest snap-to location. In most cases, the obscured locations will be moved (or expanded, in the case of geohashes) based on their original locations and will remain near their original locations.
In addition to obscuring coordinates of target locations, the location service 126 may also set flags 162 for the respective target locations that have been obscured. Because other software will likely use the partially obscured location data, it may be helpful to allow software that consumes the aggregate travel data 110 which locations are original and which have been modified. Applications requiring high fidelity location data may disregard flagged locations. The modified buffer 132B contains the unmodified locations, the obscured locations from the first and second target sets, and, optionally, flags distinguishing the modified locations. The location service 126 transmits the modified locations to the cloud service 108 through a communication module 164 (for example a network stack or interface) of the computing device 100. It may be convenient to set the flags 162 when the respective target locations are selected for obscuring, which will allow the later obscuring operation 160 to know which locations to obscure.
Regarding the timing of transmission of partially obscured location data for a trip, in embodiments that obscure locations near a trip origin but not locations near a destination trip, the location data for a trip may be obscured and transmitted as soon as received, and locations may stop being obscured as soon as they pass out of the obscuring area around the origin. However, for embodiments that obscure locations near a trip destination, because the locations that are to be obscured may not be fully known until the trip has ended, it may be more convenient, as discussed above, to accumulate locations of a trip in the buffer, identify and modify the target locations within the buffer after the trip is complete, and then transmit the modified and unmodified locations for the trip. A batch approach may be used to allow many trips to be stored over time and then be periodically (for example, daily) processed and transmitted.
In addition to obscuring location data, it may be helpful to also obscure timestamps. In some embodiments, as a vehicle travels, timestamps are captured with the respectively captured locations of the vehicle. In addition to obscuring targeted locations, the timestamps for the target locations may also be obscured. Any technique to reduce the accuracy and/or granularity of a timestamp may be used, for example, truncation, partial randomization, addition of random noise, or others.
In one embodiment, the location service 126 executing on a computing device 100 maintains a local set of geofences which are applied to any trips captured for the vehicle. In another embodiment, geofences are obtained from a cloud service by the location service 126.
Geofences may be manually defined by a user or automatically defined. A user may define a geofence with an interactive mapping application that allows a user to interactively define geographic areas (geofences). A geofence may instead be defined by instructing the location service 126 to automatically detect common origins and destinations and generate geofences for them. This approach may also be automatic; the location service 126 may automatically detect common origins, destinations, or visitation locations (for example, where a vehicle stops to drop off a passenger) and create respective geofences at those locations. Geofences may also be generated by one or more cloud services using any of the techniques mentioned above, and, as noted above, the generated geofences may be provided to a vehicle's computing device for use by its location service 126. Using this approach can also allow geofences to be applied to a group of computing devices. For example, there may be a sensitive area (for example a military base) or an area changing (for example road construction) where it is preferable to have the location data of all participating vehicles obscured. In sum, the functionality for managing geofences can be at the computing devices, at a cloud service, or both.
Other obscuring techniques may be used. For example, geofences may or may not be centered over a vehicle's true origin or destination. That is, the geofences and obscuring regions may have a randomized offset and variable range from the true origins and destinations so as to prevent analysis techniques that may attempt to determine a set of candidate de-anonymized origins and destinations of a trip, thus maintaining the user's privacy. Consider also that if all points outside of a radius are obscured, then it can be possible to find the center of the circle with sufficient data as the vehicle leaves the circle. Thus, another obscuring technique is to slightly grow or shrink the radius for different trips to add randomization. That is, the radius could be 0.9 miles on day one, and 1.1 miles on day two. In addition, the center may be randomly moved by 0.1 miles every day, which will make it difficult to determine the center of the radius.
The computing device 100 may have one or more displays 402, a network interface 404 (or several), as well as storage hardware 406 and processing hardware 408. The processing hardware 408 may be a combination of any one or more: central processing units, graphics processing units, analog-to-digital converters, bus chips, FPGAs, ASICs, Application-specific Standard Products (ASSPs), or Complex Programmable Logic Devices (CPLDs), etc. The storage hardware 406 may be any combination of magnetic storage, static memory, volatile memory, non-volatile memory, optically or magnetically readable matter, etc. The meaning of the term “computer-readable storage” and “storage hardware”, as used herein does not refer to signals or energy per se, but rather refers to physical apparatuses and states of matter. The hardware elements of the computing device 100 may cooperate in ways well understood in the art of machine computing. In addition, input devices may be integrated with or in communication with the computing device 100. The computing device 100 may have any form-factor or may be used in any type of encompassing device.
In the above disclosure, reference has been made to the accompanying drawings, which form a part hereof, which illustrate specific implementations in which the present disclosure may be practiced. It is understood that other implementations may be utilized, and structural changes may be made without departing from the scope of the present disclosure. References in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such labels or phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, one skilled in the art will recognize such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
While various embodiments of the present disclosure have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various changes in form and detail can be made therein without departing from the spirit and scope of the present disclosure. Thus, the breadth and scope of the present disclosure should not be limited by any of the above-described example embodiments but should be defined only in accordance with the following claims and their equivalents. The foregoing description has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the present disclosure to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. Further, it should be noted that any or all of the aforementioned alternate implementations may be used in any combination desired to form additional hybrid implementations of the present disclosure. For example, any of the functionality described with respect to a particular device or component may be performed by another device or component. Further, while specific device characteristics have been described, embodiments of the disclosure may relate to numerous other device characteristics. Further, although embodiments have been described in language specific to structural features and/or methodological acts, it is to be understood that the disclosure is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as illustrative forms of implementing the embodiments. Conditional language, such as, among others, “can,” “could,” “might,” or “may,” unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments could include, while other embodiments may not include, certain features, elements, and/or steps. Thus, such conditional language is not generally intended to imply that features, elements, and/or steps are in any way required for one or more embodiments.