The present disclosure relates to systems and methods for processing a transportation service request, and more particularly to, estimating a trip risk associated with a transportation service request using one or more machine learning models.
An online ride-hailing platform (e.g., DiDi™ online) can receive a rideshare service request from a passenger and then route the service request to at least one transportation service provider (e.g., a taxi driver, a private car owner, or the like). After the transportation service request is answered by the driver, the driver will pick up the passenger, and drive the passenger to the requested destination.
The driver and the passenger otherwise do not know each other prior to the rideshare service. Sometimes, a person may disguise himself as the passenger to request a transportation service in order to commit crimes against the driver, such as robbery, assault, battery, or sexual harassment. For example, the disguised passenger may rob the driver either during the trip or at the destination. Sometimes, a passenger may commit a crime spontaneously triggered by circumstances during the trip, such as a conflict between the driver and the passenger. For example, the passenger may disagree about the route the driver takes for the trip or the fees charged for the service, the passenger then may attack and force the driver to change the route or reduce the fees charged for the service.
A rideshare vehicles may equip with an in-vehicle crime detection system for real-time detecting the above-mentioned crimes in the rideshare service. For example, if an in-vehicle crime is detected, the crime detection system may automatically notify the ride-hailing platform to call a nearby police to report the crime. However, crime detection system could only serve to mitigate the harm after the fact, but could not prevent the crime from happening. For example, existing ride-hailing platforms with in-vehicle crime detection systems may not allow or block the service request based on an estimated trip risk.
Embodiments of the disclosure provide systems and methods for estimating a trip risk associated with a transportation service request using machine learning models and processing the transportation service request based on the trip risk.
Embodiments of the disclosure provide a system for processing a transportation service request. An exemplary system may include a communication interface configured to receive the transportation service request from a terminal device. The system may further include at least one processor. The at least one processor may be configured to generate a passenger score based on the received transportation service request using a first machine learning model trained with sample passenger data associated with past impacted drivers. The at least one processor may further be configured to generate a trip score based on the generated passenger score and the received transportation service request using a second machine learning model trained with sample trip data associated with the past impacted drivers. The at least one processor may also be configured to allow or block the received transportation service request based on the generated trip score.
Embodiments of the disclosure also provide a method for processing a transportation service request. An exemplary method may include receiving the transportation service request, by a communication interface, from a terminal device. The method may further include generating, by at least one processor, a passenger score based on the received transportation service request using a first machine learning model trained with sample passenger data associated with past impacted drivers. The method may also include generating, by the at least one processor, a trip score based on the generated passenger score and the received transportation service request using a second machine learning model trained with sample trip data associated with the past impacted drivers. The method may additionally include allowing or blocking the received transportation service request, by the at least one processor, based on the generated trip score.
Embodiments of the disclosure further provide a non-transitory computer-readable medium having instructions stored thereon that, when executed by at least one processor, causes the at least one processor to perform a method for processing a transportation service request. An exemplary method may include receiving the transportation service request from a terminal device. The method may further include generating a passenger score based on the received transportation service request using a first machine learning model trained with sample passenger data associated with past impacted drivers. The method may also include generating a trip score based on the generated passenger score and the received transportation service request using a second machine learning model trained with sample trip data associated with the past impacted drivers. The method may additionally include allowing or blocking the received transportation service request based on the generated trip score.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
The disclosed systems and methods automatically estimating a trip risk associated with a transportation service request using machine learning models and processing the transportation service request based on the trip risk. In some embodiments, the machine learning models include passenger score models, trip score models, and a rule-based model. The passenger score models are trained with passenger data associated with impacted drivers being victims of at least one of reported safety events. The trip score models are trained with trip data associated with impacted drivers and passenger scores generated using the passenger score models. In some embodiments, the different passenger score models and trip score models are trained for different groups of passengers. The rule-based model is trained with the passenger data and the trip data associated with impacted drivers. In some embodiments, when the ride-hailing platform receives a transportation service request, a passenger score model is selected among the ones trained based on the group the passenger requesting the transportation service belongs. The selected passenger score model may be applied on passenger information of the received transportation service request to generate a passenger score. Accordingly, a trip model trained for the passenger group is selected. The selected trip model may be applied on trip information of the received transportation service request and the generated passenger score to generate a trip score.
In some embodiments, if the generated trip score is above a predetermined score threshold, a whitelist (e.g., a list of safe passengers and a list of safe scenarios) may be compared with the passenger information and the trip information of the transportation service request. If the passenger information or the trip information matches any key in the whitelist, the transportation service request will be allowed to route to the service providers. Otherwise, the transportation service request will be blocked. If the generated trip score is equal to or below the predetermined score threshold, the rule-based model may be applied on the passenger information and the trip information of the transportation service request. If the rule-based model identifies the passenger or the request scenario as unsafe, the transportation service request will be blocked. Otherwise, the transportation service request will be allowed to route to the service providers.
In some embodiments, system 100 may optionally include a network 106 to facilitate the communication among the various components of system 100, such as training database 101, model training device 102, request processing device 120, and a terminal device 110. For example, network 106 may be a local area network (LAN), a wireless network, a cloud computing environment (e.g., software as a service, platform as a service, infrastructure as a service), a client-server, a wide area network (WAN), etc. In some embodiments, network 106 may be replaced by wired data communication systems or devices.
In some embodiments, the various components of system 100 may be remote from each other or in different locations, and be connected through network 106 as shown in
As shown in
Most trips can be safely completed without reported crimes. For example, only 50 out of 100,000 completed trips may have crimes reported. Therefore, using all the trip data for training the machine learning models may cause training imbalance. In some embodiments, model training device 102 may filter the sample data (e.g., passengers and completed trips) before training machine learning models. In some embodiments, model training device 102 may select passenger data and trip data associated with impacted drivers for training the machine learning models. For example, the impacted drivers may be drivers who have experienced at least one crime such as a robbery. Data associated with transportation service requests made by passengers who have been served by the impacted drivers are then selected as passenger training data. In some embodiments, the training data is labeled based on whether a crime is committed to the impacted driver. For example, a passenger who has committed a crime to the impacted driver is labeled as a positive sample (e.g., unsafe passenger); otherwise, the passenger is labeled as a negative sample (e.g., safe passenger). Similarly, trips that the impacted drivers have completed are labeled as safe trips and unsafe trips. For example, a safe trip is not associated with any reported crimes. An unsafe trip is associated with at least one of the reported crimes.
In some embodiments, the training phase may be performed “online” or “offline.” An “online” training refers to performing the training phase contemporarily with the request processing phase, e.g., learning the models in real-time just prior to estimating the trip risk based on the service request. An “online” training may have the benefit to obtain a most updated machine learning models based on the training data that is then available. However, an “online” training may be computational costive to perform and may not always be possible if the training data is large and/or the models are complicate. Consistent with the present disclosure, an “offline” training is used where the training phase is performed separately from the request processing phase. The learned models trained offline are saved and reused for estimating the trip risk.
Model training device 102 may be implemented with hardware specially programmed by software that performs the training process. For example, model training device 102 may include a processor and a non-transitory computer-readable medium. The processor may conduct the training by performing instructions of a training process stored in the computer-readable medium. Model training device 102 may additionally include input and output interfaces to communicate with training database 101, network 106, and/or a user interface (not shown). The user interface may be used for selecting sets of training data, adjusting one or more parameters of the training process, selecting or modifying a framework of the machine learning models, and/or manually or semi-automatically providing ground-truth associated with the training data.
Trained learning models 105 may be used by request processing device 120 to estimate the trip risk based on the service request that is not associated with a ground-truth. Request processing device 120 may receive trained learning models 105 from model training device 102. Request processing device 120 may include a processor and a non-transitory computer-readable medium (discussed in detail in connection with
Request processing device 120 may communicate with terminal device 110 to receive one or more service requests 115. In some embodiments, terminal device 110 may be a mobile phone, a wearable device, a PDA, etc. used by the user (e.g., the passenger) to make a transportation service request (e.g., service request 115). In some embodiments, service request 115 may include passenger information and trip information. For example, the passenger information may include passenger demographic information (e.g., passenger name, gender, register time, contact, address, and the like). The trip information may include a pickup time and location, and a destination for the trip. Request processing device 120 may use trained learning models 105 received from model training device 102 to perform one or more of: (1) generating a passenger score based on the received transportation service request, (2) generating a trip score based on the received transportation service request and the generated passenger score, and (3) allowing or blocking the received service request based on the generated trip score.
For example,
In some embodiments, as shown in
Communication interface 202 may send data to and receive data from components such as terminal device 110 via communication cables, a Wireless Local Area Network (WLAN), a Wide Area Network (WAN), wireless networks such as radio waves, a cellular network, and/or a local or short-range wireless network (e.g., Bluetooth™), or other communication methods. In some embodiments, communication interface 202 can be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection. As another example, communication interface 202 can be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links can also be implemented by communication interface 202. In such an implementation, communication interface 202 can send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information via a network.
Consistent with some embodiments, communication interface 202 may receive service request 115 from terminal device 110. Communication interface 202 may further provide the received data to storage 208 for storage or to processor 204 for processing.
Processor 204 may be a processing device that includes one or more general processing devices, such as a microprocessor, a central processing unit (CPU), a graphics processing unit (GPU), and the like. More specifically, processor 204 may be a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a processor running other instruction sets, or a processor that runs a combination of instruction sets. Processor 204 may also be one or more dedicated processing devices such as application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), digital signal processors (DSPs), system-on-chip (SoCs), and the like.
Processor 204 may be configured as a separate processor module dedicated to performing processing service request 115 received from terminal device 110. Alternatively, processor 204 may be configured as a shared processor module for performing other functions. Processor 204 may be communicatively coupled to memory 206 and/or storage 208 and configured to execute the computer-executable instructions stored thereon.
Memory 206 and storage 208 may include any appropriate type of mass storage provided to store any type of information that processor 204 may need to operate. Memory 206 and storage 208 may be a volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible (i.e., non-transitory) computer-readable medium including, but not limited to, a ROM, a flash memory, a dynamic RAM, and a static RAM. Memory 206 and/or storage 208 may be configured to store one or more computer programs that may be executed by processor 204 to perform service request processing and trip risk estimation disclosed herein. For example, memory 206 and/or storage 208 may be configured to store program(s) that may be executed by processor 204 to generate the passenger score, generate the trip score, and determine allowing or blocking the received service request.
Memory 206 and/or storage 208 may be further configured to store information and data used by processor 204. For instance, memory 206 and/or storage 208 may be configured to store the various types of data such as data related to service requests received from terminal device 110 and data related to estimating the trip risk (e.g., passenger historical behaviors, passenger's travel patterns, and passenger's relationship graph). Memory 206 and/or storage 208 may also store intermediate data such as the generated passenger score and trip score based on service request 115. Memory 206 and/or storage 208 may further store the various machine learning models used by processor 204, such as the passenger score models, the trip score models, the rule-based model, and the like. The various types of data may be stored permanently, removed periodically, or disregarded immediately after each frame of data is processed.
As shown in
In some embodiments, units 240-244 of
In some embodiments, the passenger's cancelation rates may be calculated to measure how often service orders are canceled after they are placed during a certain time window. For example, the time window may be one day, three days, one month, three months, and the like. In some embodiments, a time window close to the pickup time (e.g., within three hours prior to the pickup time) is used for calculating the cancelation rates. In some embodiments, a cancelation rate may be defined as a ratio between the quantity of canceled orders and (a quantity of completed trips+1) in the corresponding time window.
In some embodiments, a frequent cancelation tag of the passenger may be used as a model input feature. For example, if the passenger canceled more than two transportation service orders within two hours prior to the service request, a value of the frequent cancelation tag of the passenger is set to true. A high cancelation rate usually associates with a high risk of crime. For example, the criminal may cancel an order if he sees that the answered driver is a strong male. The criminal may keep requesting and canceling service requests till a request he finds a driver as a suitable target.
In some embodiments, the passenger may frequently change the destination during the transportation. A high destination changing rate can also associate with a high risk of crime, because it reflects that the criminal is looking for a better place to commit a crime. The passenger's destination changing rates may also be calculated similarly to the calculation of the passenger's cancelation rate described above. The calculation may again use different time windows such as one day, three days, one month, three months, and the like.
In some embodiments, the passenger's travel patterns may include travel dates, travel times, destinations, etc. For example, a safe passenger may use the service to go to same destinations at similar times (e.g., on each Tuesday and Thursday at 9 AM and 5 PM). That is, the trips conducted by a safe passenger usually establishes a regular travel pattern. In some embodiments, the passenger's relationship graph may include information of whether a contact method (e.g., phone number) is associated with multiple passenger accounts and whether a known criminal is linked to the passenger through the relationship graph.
In some embodiments, passenger data 301 may include three passenger datasets: a new passenger dataset, an infrequent passenger dataset, and a frequent passenger dataset. The datasets are generated based on the quantity of the trips completed by the passengers in a predetermined time window (e.g., latest 12 months). Consistent with the present disclosure, passengers whose data are included in the new passenger dataset have not completed any trips. Passengers whose data are included in the infrequent passenger dataset have completed no more than the predetermined number threshold trips. Passengers whose data are included in the frequent passenger dataset have completed more than the predetermined threshold number of trips.
As shown in
In some embodiments, model training device 102 may use one or more tree-based methods (e.g., random forests, XGBoost, LightGBM) to train models 310, 320, and 330, respectively. As a result, models 310, 320, and 330 may be tree-based classification models. In some embodiments, though only data associated with the passengers served by the impacted drivers are included in the training datasets, the training data may still be imbalanced. For example, a ratio of positive training samples (e.g., unsafe passengers) to negative training samples (e.g., safe passengers) may still be very small, such as less than 1/250. To solve the imbalanced training data issue, model training device 102 may further apply a positive scale weight on the imbalanced training data. Additionally, the imbalanced training data may result in an overfitted model. In some embodiments, model training device 102 may employ L1/L2 regularization methods to solve the overfitting issue. In some embodiments, model training device 102 may also use data transformation techniques (e.g., square root, log, 1/x) to reduce the skewness of the generated passenger scores.
As shown in
Consistent with some embodiments, model training device 102 may further receive scores 315, 325, and 335 for training the trip models. In some embodiments, model training device 102 may use one or more tree-based methods (e.g., random forests, XGBoost, LightGBM) based on trip data 401 and scores 315, 325, and 335 to train a trip score model 410 for new passengers (refers to as “model 410” hereafter), a trip score model 420 for infrequent passengers (refers to as “model 420” hereafter), and a trip score model 430 for frequent passengers (refers to as “model 430” hereafter), respectively. In some embodiments, trip data 401 may be imbalanced because most of trip data 401 are associated with safe trips (e.g., negative samples). To solve the imbalanced data issue, model training device 102 may apply another positive scale weight on trip data 401. In some embodiments, model training device 102 may further employ L1/L2 regularization methods to solve the overfitting issue due to the imbalanced data in trip data 401. In some embodiments, model training device 102 may also use data transformation techniques (e.g., square root, log, 1/x) to reduce the skewness of the generated trip scores.
In some embodiments, model training device 102 may jointly train a passenger score model (e.g., for new passengers) and a trip score model (e.g., for new passengers), using passenger data 301 and trip data 401. That is, the set of parameters of the two models are trained together, e.g., using a joint loss function. In some embodiments, the joint loss function may be a weighted combination of a passenger score loss and a trip score loss. For example, the passenger score loss may be a difference between a ground truth passenger score and a predicted passenger score. The trip score loss may be a difference between a ground truth trip score and a predicted trip score. In some embodiments, L1 loss may be used to represent the passenger score loss and the trip score loss. It is contemplated that other forms of differences, such as L2 loss, may be used to construct the loss function. The parameters of the two models may be initially set to certain values. The initial values may be predetermined, selected by an operator, or decided by model training device 102 based on prior data of similar passengers and trips.
In step S502, request processing device 120 may be configured to communicate with terminal device 110 to receive service request 115. Consistent with some embodiments, service request 115 may include trip information (e.g., pickup time and location) and passenger information (e.g., name, gender, and contact methods). If any passenger or trip data of service request 115 is missing or unknown, request processing device 120 may fill the information based on values of the training data in the corresponding passenger group. For example, if the value of the passenger's gender is missing in service request 115 and the passenger has not completed any trips, request processing device 120 may fill the gender information for the passenger with the gender of the majority in the new passenger training dataset. In some embodiments, passenger score generation unit 240 of request processing device 120 may receive the passenger information of service request 115.
In step S504, passenger score generation unit 240 of request processing device 120 may be configured to generate a passenger score based on the received passenger information of service request 115. For example,
In some embodiments, passenger score generation unit 240 may further be configured to communicate with memory 206 and/or storage 208 to receive more passenger related information that may not be included in service request 115. For example, passenger score generation unit 240 may receive passenger travel behaviors and patterns data from memory 206 and/or storage 208 (e.g., passenger frequent cancelation tag, passenger's cancelation rates, passenger's night order rates, passenger's relationship graph). In some alternative embodiments, passenger travel behaviors and patterns data are not stored in memory 206 and/or storage 208, passenger score generation unit 240 may generate the passenger travel behaviors and patterns based on other data stored in memory 206 and/or storage 208. In step S604, passenger score generation unit 240 may be configured to apply the selected passenger score model on the passenger data to generate the passenger score.
Returning to
Returning to
If the passenger or the scenario matches any of the passengers or the scenarios in the whitelist (step S512: Yes), service request 115 will be allowed to route to the transportation service providers in step S516. If none of the passengers or the scenarios in the whitelist matches the passenger or the scenario of service request 115 (step S512: No), service request 115 will be blocked by request processing unit 244 in step S518. In some alternative embodiments, instead of directly blocking service request 115, request processing unit 244 may conditionally allow service request 115 to route to the transportation service providers in step S516. For example, request processing unit 244 may require the passenger provide authentication information (e.g., a social security number or a driver's license number) and/or a payment method (e.g., payment card number) as part of step S516 in order to continue to route the service request to the transportation service providers.
Returning to step S510, if the generated trip score is not greater than the predetermined trip score threshold (step S510: No), request processing unit 244 may be configured to apply a rule-based model on service request 115. In some embodiments, the rule-based model may identify whether service request 115 includes any unsafe passengers and unsafe scenarios. For example, if the value of passenger frequent cancelation tag is true (e.g., the passenger has a frequent cancelation behavior), the rule-based model may identify the passenger as an unsafe passenger and does not allow service request 115 (step S514: No) to continue. Service request 115 therefore may be blocked or required to provide further information to continue the request in step S518. If the rule-based model does not identify any unsafe passengers and unsafe scenarios in service request 115 (step S514: Yes), request processing unit 244 may allow the ride-hailing platform to route service request 115 to the transportation service providers in step S516.
Another aspect of the disclosure is directed to a non-transitory computer-readable medium storing instructions which, when executed, cause one or more processors to perform the methods, as discussed above. The computer-readable medium may include volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices. For example, the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed. In some embodiments, the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed system and related methods. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed system and related methods.
It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.