Transportation timetables, such as train schedules, bus schedules and flight schedules, are typically created by the associated transit agency using models to compute an estimated transit time between stops. The models are often complex, considering factors such as distance, type of road or rail, time of day, time of year, day of week, typical weather, expected traffic, equipment to be used, and the like, etc. This approach suffers from two weaknesses. First, this modeling approach often does not produce an accurate schedule. For example, one particular train may typically run 20 minutes late from this a planned model-based schedule. As another example, one particular fight might typically arrive 20 minutes early. Prior art schedules may be viewed as “planned” performance. A second weakness is that modeling generally assumes that equipment is ready at the planned departure time. However, in many cases the equipment arrives late, as an incoming train or plane may be delayed. That is, planned timetables do not take into account variations in arrival time of equipment. Prior art includes modifying a planned arrival time for a single trip, using real-time data.
Embodiments of this invention overcome weaknesses of prior art.
Some prior art focuses on updating a single arrival time for an individual route and stop based on a current, that is, real-time, location/time of a vehicle, typically generating a single, updated “expected arrival time.” Other prior art focuses on having a human scheduler adjust for real-time vehicle activity, such as trains passing each other to change one-time arrival times of specific vehicles. Such updates are of minimal use for passengers and connections because it does not allow for advance planning by those parties.
The problem solved by embodiments of this invention is to create a new, more accurate, fixed schedule based on comprehensive, actual, historical data from an operational transit system that was operating on a previous, fixed schedule.
Often, data about actual operating performance, that is, exact departure and arrival times, for every route and stop, after the fact, are publicly available. The first step is to collect, acquire, or download this data, which is often available on a web site of the transit agency, or via a standardized transit stop feed, such as “Google Generated Transit Feed Specification,” or “General Transit Feed Specification,” or GTFS. We also refer to an initial schedule as “existing,” or a “planned” schedule.
The next step is to continually “scrape” a website to extract actual arrival times. Again this is for every stop for every route, or a selected subset. Although we refer to a “website,” such a data source may be an alternative source, such as an app (application on a personal electronic device, or similar), or a data feed (such as an RSS or GTFS feed). Although initial schedules are usually provide by the transit agency directly, sometimes actual arrival times are provide by a third party. This data needs to be explicitly or implicitly show routes and stops identification. By “routes,” we mean a regularly scheduled, identified trip. For train, “train numbers” are used. For buses, a “bus number” is used, although sometimes bus routes are named, instead of numbered. For airlines, flight numbers are the route identification. Service types may be explicit or implicit, as are AM/PM, Inbound/Outbound identification and day of week, for example.
Such web (“initial”) schedule data for all routes and stops for the entire transit fleet, or a selected subset. This is typically done for a time period such as one year, although the time period may be different.
In a first embodiment, the second step is to time-sort the historical arrival times for each stop for each route. Then a “cut-off” in the sorted list is selected based on a desired on-time percentage, such as, “98% of trains will arrive by this time.”
In a second embodiment, the next step is a statistical analysis of each route or trip number, such as a bus route, train number, or flight number, for each station. In a third step, new arrival times, and optionally new departure times, are computed from the statistical analysis that shows “likely” performance, rather than “planned” performance. New arrival times may be computed for a particular statistical likelihood, such as, “98% of trains will arrive by this time.”
The set of newly computed times, for all routes and stops considered, is then published, on paper or electronically, as a new timetable or schedule. Note that schedules typically include additional information beyond a timetable that includes transit number, stop and arrival time. For example, they typically include type of service, which may include special services, such as trains with bicycle cars, or extra busses for major events, as examples.
An alternative embodiment is to provide a range or “time bracket”, such as “90% of trains arrive between this time and that time.” Yet another embodiment has two ranges, a first is “typical,” or “usually,” that might include 75% to 90% of historical arrivals. Also, a “nearly always” time or time bracket that might include 98% or 99% of historical arrival times.
A new or improved timetable or schedule may be printed on paper, posted on boards, displayed on electronic signs, available on web sites, available on apps running on personal electronics, or available for other processing, such as a trip planner, social media site, a navigation service or device, or autonomous vehicles.
Note that in many cases one arrival time affects a subsequent departure time. For example, for many bus and train times, the equipment must first arrive at a station, than shortly depart after a dwell time for unloading and loading. If a train is 20 minutes late arriving, it may also be 20 minutes late departing.
Transit types applicable to embodiments with regularly schedule transit, including: trains, busses, aircraft, ships, cruises, tours, tourist trips or events, space flights, employee shuttles with schedules, and the like. Embodiments only apply to these modes of transportation if they are regularly scheduled, with routes and stops identified, with initial schedule and real-time data available electronically: autonomous vehicle trips, personal bicycle and scooter rentals, car rentals, taxis, space flights, drone flights, and ride sharing services.
Scenarios and options are non-limiting embodiments.
The technical problem to solve is: creating accurate, new transit timetables for a route and stops, based on historical performance.
Collecting data on actual, historical performance of transit agency trips is non-trivial. One method is to look at real-time data of individual trips. This information is continually updated, but only shows “current” trips. Thus, any such web site must be continually monitored in order to collect data on all trips. This is an interactive, on-going process, as typically information, such as a flight number or train number, must be entered into the web site before it will display time data about that trip. An app on a personal electronic device, such as a smart phone, smart watch, tablet, personal computer, virtual reality or augmented reality screen, or heads-up display, is for the purposes of this patent application, also a web site.
For convenience we refer to any organization with a responsibility for a schedule or operation to be a, “transit agency.” Such an agency may or may not be the same agency that owns or operates the equipment. We refer to a, “transit vehicle,” as any vehicle that operates to the schedule. It may be a bus, train or plane, for example. In some case, rather than a traditional transit agency, another identifier for a group of routes may be used. For example, instead of, “United Airlines,” we might use, “all flights out of SFO airport.” We refer to a, “route,” as any identified regular trip with one or more stops. It might be a bus number, train number of flight number, as examples. We refer to a, “transit stop,” as any location associated with an arrival or departure time on a schedule. We refer to a, “fixed schedule,” as a timetable that is generally repeated for each time the route is traveled, as compared, for example, to one-time prediction for one particular vehicle for one particular stop, typically in the future, such as an updated expected arrival time for a single flight on the same day. We refer to an, “analysis period,” as a time period when real-time data is scraped, collected, acquired, aggregated, or harvested.
Please refer now to claim 1 and
A first step includes determining a source and format of electronic schedule data for an existing public schedule, and identifying any data conversion necessary to process that data. We refer to this as a fixed schedule, or initial schedule, retrieval protocol. This step includes acquiring and converting this current, fixed-schedule data. We refer to this data as an initial timetable, which may be placed in a database or other non-volatile, convenient electronic storage. See claim 1(a) or 101 in
Often, data about actual operating performance, that is, exact departure and arrival times, for every route and stop, after the fact, are publicly available. The first step is to collect, acquire, or download this data, which is often available on a web site of the transit agency, or via a standardized transit stop feed, such as “Google Generated Transit Feed Specification,” or “General Transit Feed Specification,” or GTFS. GTFS data may be available as a single file. If necessary data could be keyed or OCR generated from a printed schedule, such as shown in
A second step includes determining a source and format of historical transit data comprising actual arrival times for routes and stops, and identifying any data retrieval protocol and conversion necessary to process that data. We refer to this as actual or real-time transit data and its associated retrieval protocol. The second step includes loading this real-time data retrieval protocol and conversion fixed schedule retrieval protocol into a monitoring processor. The second step includes executing this protocol and collecting the retrieved data for an analysis period, such as one year or another time period. A sample protocol is shown in
Collection and analysis of data may be restricted to a subset of all data: a selected set. Typically, if a route or stop changes during the analysis period, that route and stop are deleted from the selected data set.
An exemplary scenario may be to scrape, acquire, collect or download data from a web site or RSS feed, for a train transit district in one city, for 100 trains (“routes,” “train numbers,” or “transit number”), for 50 stops, for a period of one year.
A third step is to sort the acquired data, typically in ascending order of lateness (time), for each stop of each route: a sorted subset. Subsets may be additionally or alternatively sorted based on service. Subsets may be sored in ascending (or descending) time order. Subsets may be compressed prior to, during, or after sorting. For example, for each minute late, only a count is maintained. Data may be kept in a list, table, database, array, hash table, data structure, (OOP) object or other format known in the art such as GTFS. See claim 1(c) or 104 in
A fourth step is to select a proposed arrival time, for each subset (e.g., each stop on each route), such that a predetermined percentage of actual arrival times are less than or equal to the proposed arrival time. For example, if 250 arrival times are in one sorted subset, and the desire is that 90% of trains arrive on time (under the new schedule), then a cutoff in the sorted list would be at or about the 225th entry. See claim 1(d) or 105 in
A fifth step is then computing a proposed time offset, by subtracting the initial scheduled arrival time from the proposed arrival time. This is, in essence, an, “expected late time,” if using the initial, fixed schedule. See claim 1(e) or 105 in
A new timetable or schedule is created using the proposed arrival times for each subset, or each selected route and stop. However, a key element is to first compare the proposed time offset to a predetermined time threshold. If the propose arrival time differs from the initial fixed scheduled time, then the initial fixed scheduled time is still used for the arrival time in the new timetable or schedule. A benefit of this element of a method is minimal changes to a previous schedule that was well known. It may also permit easier memorization, such as, “trains arrive every 20 minutes after the hour,” even if that is true for only some of the trains. Exemplary time thresholds may be one minute for busses, two minutes for trains, and five minutes for airplanes. See claim 1(f) or 106 in
The new, or final, computed timetable or schedule of embodiments is then published or available, on paper or electronically, as described above for initial schedules. It may then be displayed on electronic signage, used by apps, such as navigation, travel apps, social networks, and scheduling apps, such as reminder or calendar apps, which may use this data to create or modify a time-to-leave, for example. See claim 1(f) or 106 in
An alternative embodiment uses a statistical model of the collected actual arrival times. Such a model might be a standard distribution: a Gaussian distribution, or may be an asymmetric distribution, such as one that includes skew or kurtoses, or one that includes an exponential decay. Fitting data to such a statistical model may include compute a best mean, skew and/or kurtoses, or computing using one or more predetermined skews or kurtoses, or computing a exponential decay time constant. Another alternative embodiment includes creating such a statistical model using more than one subset. For example, subsets may be grouped by route, by stop, by service, by day of week, or another grouping. Such groupings have the advantage of many more data points for fitting. A predetermined transit distribution model may be used. Selecting a cutoff is similar. Again a target on-time arrival percentage is used, and from the curve-fit distribution a proposed time-offset is computed. For example, for a Gaussian distribution shape, at two sigma about 97.7 percent of trains would have arrived on time.
Yet another embodiment provides a range of times. Typically such a range is provided for either or both departures and arrivals. For this embodiment, two proposed arrival time are used based on two predetermined percentages such as 10% and 90%. The predetermined time threshold may be applied to only one or to both the times in the proposed range.
Yet more alternative embodiments include information on cancellation probabilities and route or stop alternatives. For example, a route may be cancelled 5% of the time, or a stop might be skipped 10% of the time. These probabilities may be included in the final timetable or schedule.
If a “cutoff” late time is not desired, or is zero, the fifth step and comparison step to a predetermined time threshold may not be used. Rounding or truncation to a nearest minute or five minutes, for example, may be used, where rounding or truncation may be upward or downward.
Yet another embodiment applies these methods to non-scheduled trips, such as car or bicycle rentals, or on-request transit. For these embodiments, rather than transit numbers, trips are broken into units of time, such as every half hour or 15 minutes. Subsets may be organized by source and destination regions, such as from an airport to a particular zip code, as well as by time of day, day of week, and the like.
Although individual steps of embodiments may or may not involve well understood, conventional and routine automated activities, the particular combination of steps generates a novel result: a more accurate transit schedule. Embodiments may be viewed as transforming a historical transit performance into an accurate future fixed schedule.
There is an industry-standard format of encoding schedules into electronic data, known as, “Google Generated Transit Feed Specification,” or “General Transit Feed Specification,” or GTFS. In this specification, a stop comprises a location, a route number, and a service. A “route” may be a bus number (or name) a train number, a flight number. We refer to these also as transit numbers. The term, “stop,” may thus include more information than just a location. GTFS may be a static block of data, may be or include streaming data, and may be or include retrievable data.
The term, “service,” varies by transit type, agency, and routes. It may include, as non-limiting examples, maximum passenger count, seating and service level options (e.g., “first class,”), inbound or outbound direction, type of equipment, speed limits or speed limit zones, construction activity, ridership levels or type (e.g., bicycles), track sharing, unions or union rules, local jurisdictional rules (e.g., “no train horns”), connecting services, other related passenger services (e.g., rental cars), related parking, associated public events (e.g., ball games) and transit agency or jurisdiction. Any combination of services may be included in selecting, isolating, sorting or computing data subsets. Any combination of services may be included or indicated in a final timetable or schedule.
Turning now to
The challenge in automating the scraping or automatic collection of data from such web sites is significant. The software may have locate a link, then “click on” the link, then parse another page, find a link or data on that page, then extract an actual arrival time. Changes to web page design, interference from announcements, or ads must be considered.
Line 12 is exemplary for collecting the initial timetable.
Line 21 is exemplary for scraping real-time data, kept in “$trainView.”
Lines 25-27 are exemplary for copying and converting.
Line 63 is exemplary for sorting arrival times.
Line 64-65 is exemplary selecting a proposed arrival time.
Line 66-68 is exemplary for applying a performance threshold to create a proposed time offset.
Line 70 is exemplary for applying a change threshold.
Line 70 is exemplary for output a new, proposed schedule.
Data structures are indeed, “structures.” Appropriate data structures include GTFS and its contents. Data may be kept in objects, such as used by an OOP (‘Object oriented programming”) language, such as php. Data may be stored in lists, arrays, a table, or a database (which may in tabular format, such as in
Reference D1, “Harker,” U.S. Pat. No. 5,177,684 is in the field of train scheduling. The problem Harker is trying to solve is to keep trains from colliding when at least one train is off its planned schedule. Because track, switches and stations are usually shared among multiple trains, one late train will often delay other trains. Harker uses a physical system model, plus real-time data to warn human operators when trains and switches must be re-directed to avoid collisions. Harker uses neither historical arrival times nor does he produce a new, revised, fixed schedule. His invention is directed to individual incidents and involves a human in the process.
In reference D2, “Roulland,” publication US 2017/0169373 A1, Roulland collects historical data, however his only goal and output is to compute a “cost,” which he calls a “metric.” He merely “evaluates reliability,” but does not generate a new, fixed schedule for a route. His “cost” includes: “a perceived waiting cost, a cost of lateness at a final destination, a difference between scheduled arrival time and an actual arrival time, and an annoyance cost,” [abstract]. His invention may be used to assess an overall “performance” of a transit agency but cannot be used, nor is it intended to be used, to generate an improved fixed schedule.
Ideal, Ideally, Optimum and Preferred—Use of the words, “ideal,” “ideally,” “optimum,” “optimum,” “should” and “preferred,” when used in the context of describing this invention, refer specifically a best mode for one or more embodiments for one or more applications of this invention. Such best modes are non-limiting, and may not be the best mode for all embodiments, applications, or implementation technologies, as one trained in the art will appreciate.
All examples are sample embodiments. In particular, the phrase “invention” should be interpreted under all conditions to mean, “an embodiment of this invention.” Examples, scenarios, and drawings are non-limiting. The only limitations of this invention are in the claims.
May, Could, Option, Mode, Alternative and Feature—Use of the words, “may,” “could,” “option,” “optional,” “mode,” “alternative,” “typical,” “ideal,” and “feature,” when used in the context of describing this invention, refer specifically to various embodiments of this invention. Described benefits refer only to those embodiments that provide that benefit. All descriptions herein are non-limiting, as one trained in the art appreciates.
Embodiments of this invention explicitly include all combinations and sub-combinations of all features, elements and limitation of all claims. Embodiments of this invention explicitly include all combinations and sub-combinations of all features, elements, examples, embodiments, tables, values, ranges, and drawings in the specification and drawings. Embodiments of this invention explicitly include devices and systems to implement any combination of all methods described in the claims, specification and drawings. Embodiments of the methods of invention explicitly include all combinations of dependent method claim steps, in any functional order. Embodiments of the methods of invention explicitly include, when referencing any device claim, a substation thereof to any and all other device claims, including all combinations of elements in device claims. Claims for devices and systems may be restricted to perform only the methods of embodiments or claims.