The present disclosure relates generally to the handling of high volumes of traffic over the internet. More particularly, this disclosure relates to monitoring and measuring the response of a primary ecommerce platform by an external ecommerce system using machine learning to redirect transaction requests to the external system when the primary platform has stability or performance issues.
Occasionally, novel products are introduced into the marketplace with their demand far exceeding their supply. Take, for example, Apple's iPad and iPad2. These items were launched to crowds of consumers waiting outside stores for hours to purchase a limited supply of products. While the seller (i.e. manufacturer, retailer, etc.) may be overjoyed at the demand, such a situation does not create a good customer experience. Purchasers may stand in line for hours only to be told when they reach the counter that there are no products left for them. Anger flares when some consumers purchase multiple quantities for resale on internet auction sites or in foreign countries where the items are not available via legitimate markets, and leave those waiting in line behind them with nothing.
An ecommerce solution solves some of these problems. Rather than wait in line for hours with uncertain results, consumers may access an online store to purchase the item. The purchaser could even preorder the item in order to receive it when it hits the market. However, business, technical and functional challenges still exist, especially if the ecommerce system infrastructure and surrounding technology are not prepared to handle unusually high demand.
Periods during which online applications receive an exceptionally high volume of transactions, such as holidays or during preorder or release of long-awaited products, present a number of technical and user-experience related issues. As those of ordinary skill in the computer arts are aware, there are some general issues around traffic that all online applications face with being on the internet. Designing an online system involves designing for performance. Performance is impacted by the interest and interaction with the application online. Whereas a desktop application may involve just one user operating the software at a particular time, users of an online application may number several to millions of people at the same time.
Online applications are architected and sized for performance under normal or typical conditions and do not work well when traffic spikes. For example, the owner of a population of servers with information that is provided frequently to the internet will want that information to be cached in memory so that repeated access to that information is quick and inexpensive. Under load, that cache can benefit performance. However, if the system does not have enough traffic hitting the cache servers, enough to keep the cache “warm” (where cached items are frequently requested), then users who trickle into the site will have poor performance. So, when developers design and build an internet application, they build in an amount of cache and population of web servers that size to the amount of traffic that will normally be expected. That system is then tuned to expect that in most circumstances the users are getting the benefit of the cache. If a spike event occurs, a period of high volume, high demand on the system, the system breaks down.
It is unrealistic, in time and cost, to grow the size of the infrastructure to meet spike, or high demand, loads as they occur, and system issues develop if the system is scaled to operate at the increased level of traffic. Referring again to the caching example above, once the traffic subsides the typical amount of traffic received will not induce the algorithms behind the scenes to keep the cache populated effectively. The infrastructure required for a spike event is poisonous to normal traffic and the infrastructure needed for normal traffic is insufficient to meet the needs of the spike.
For online commerce, poor performance of a web store is associated with a very low close rate. Although customers users are not physically standing in line, they are in a very real, electronic queue. An electronic queue may be even more congested than a physical queue because of the centralized nature of ecommerce purchasing and the system issues discussed above, resulting in connectivity or communication problems if the system is not sufficiently robust. A high volume of requests headed for the same web server must pass through the network, the network interface to the server and the server's operating system prior to getting to the web server itself. An overload of requests may cause issues at any one of these points, frustrating the user's efforts to get to the destination web server. An overload of requests hitting the web server will result in an unwanted error page presented to a user, who may give up trying to access the site in frustration, or just forget to come back later. If limits are placed on the quantity that may be purchased, the system must ensure that the purchaser not lose his place in queue or he risk losing the item and the merchant risks losing the sale. If the merchant loses the sale, the commerce system provider may lose the client merchant, especially if Service Level Agreements criteria are not met.
High volume, high demand and limited supply orders aside, an online store may experience instability or performance issues for any number of technical reasons, and it may happen at the worst of times, such as during holiday shopping periods. Performance issues and inventory issues leave customers so frustrated they may abandon their online cart and go elsewhere for the product. Instability and downtime are expensive, result in loss of sales and increased maintenance costs. An effective solution to this problem allows the commerce provider to utilize a system optimized for normal traffic during those periods, but change over to a system designed for spikes in traffic when periods of instability in the normal system are detected. The system and methods described herein provide that solution and offer other advantages over the prior art.
HVTQ is a reserve ecommerce solution that automatically engages and queues orders when a primary back-end transaction processing system becomes unresponsive or unstable. Through algorithms such as those described herein, embodiments of the invention apply machine learning techniques to control transaction submission rates by queuing orders and throttling the rate at which they are processed based on self-awareness and constant monitoring, feedback and health checks of the primary system. When metrics indicate that the third-party system can begin accepting orders again, HVTQ automatically feeds the queued orders along with real-time orders at a rate that the third-party system can successfully manage.
Many ecommerce platforms have the ability to queue transactions. These queuing solutions notify shoppers after holding the order for a day or two that the order has failed or is successfully placed. The shopper does not know, at the time the order is placed, whether the order will be successful or not. HVTQ was designed to maintain a high-quality user experience. Upon placing an order when HVTQ is running, the user receives an order confirmation and an email stating that their order was received and is being processed. Further, once the order is originally placed, inventory is allocated against that order even though it is in a queued state to ensure successful completion of the order.
The HVTQ system and method described herein is self-aware. Using algorithms, it constantly monitors the health of the third-party back end system. When system performance degrades to a certain threshold, HVTQ automatically engages (no human intervention is required) and queues orders. Because HVTQ constantly monitors the health of the third-party system, it knows when and at what rate it can begin feeding queued orders back to the third-party system. HVTQ also has an “intermediate” state where it feeds both queued and real-time orders simultaneously to the third-party system. Again, because it is constantly checking the third-party system's health, it knows the velocity at which it can successfully send orders for processing. This HVTQ solution is innovative because it is system agnostic. It can be integrated into any third-party processing system, giving it ultimate flexibility.
The HVTQ solution ensures that ecommerce orders are not lost due to degraded system performance or instability. If a transaction processing system is not performant, orders are not successfully placed, and sales are not captured. HVTQ functions almost like an insurance policy—it's there to take over as soon as a third-party transaction processing system goes down, and ensures that sales are captured and the shopper has the same high quality experience they have when all systems are performing normally.
Ecommerce platforms have a need for a system and method that can measure the performance of the system and step in to capture transactions when the ecommerce system experiences instability or poor performance. The solution described herein provides that system and method and offers other improvements over the prior art.
Having described embodiments of the invention in general terms, reference will now be made to the accompanying drawings, wherein:
Embodiments of the present invention may be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all, embodiments of the invention are shown. The invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that the disclosure may enable one of ordinary skill in the art to make and use the invention. Like numbers refer to like components or elements throughout the specification and drawings.
Embodiments of a high-volume transaction queueing with machine learning system and method (also referred to herein as “external system”) provide an external backup to a primary ecommerce system, (also referred to as a “third-party system,” “primary system” or “primary platform”), when the primary system experiences system overload or performance and stability issues, preventing loss of revenue and providing a satisfying customer experience. This situation frequently arises when a new item is offered for sale, or an item is of particularly high demand, or during expected periods of high internet traffic to a merchant site, such as is experienced during the holidays.
Embodiments of the invention are designed to monitor the health of the primary (merchant) system by transaction type (e.g. submit shopping cart, a request to calculate tax, or some other type of request). For the purpose of this document, the terms “transaction,” “request,” “call,” “message,” and “order” are synonymous and may be used interchangeably. Other embodiments may protect other calls processed by the primary system using the same system and methods and although they are not described in detail here, the solution is the same as is applied to the order call.
In some environments operating embodiments of the system, some type of transaction processing takes place on the external system before it is forwarded to the primary system. In other words, the primary system may be integrated with the external system to capture a transaction, perform fraud checks, allocate inventory, etc., before passing an enhanced request on to the primary platform. The external system collects data regarding its attempts to feed transactions to the primary system and, when it determines that the primary system is in distress, it begins queueing transactions. Transactions are queued until the system determines that the primary system is ready to receive new transactions and slowly begins to forward them to the primary system. The queued transactions may be fed to the primary system gradually to avoid overloading it again.
There are many benefits to this type of system. The order takers 114 can continue to accept transactions while maintenance is done on the ODS 116. Communications between the order taker and the ODS 116 may be two-way—maintenance data changes (look-up data, e.g. site, product, catalog) may be pushed from the ODS 116 to the order taker using a data replication tool. The concept of shared data exists in this type of system as well, by including a shared database. Shared data is data for which there can be only one copy that must be visible to all applications instances at all times. Examples of shared data include originator/user data and Digital Rights. This type of configuration provides parallel, redundant, executing applications. The ODS 116 may provide all of the backend services required for transaction processing, or some may be provided by the ODS 116 and some by the primary platform 108. Each may complete transaction processing and then forward a request for fulfillment to fulfillment center systems 118, such as a physical goods warehouse or a digital downloads center.
In at least one exemplary integration, a web merchant 108 may prefer to perform much of the transaction processing itself, while contracting with a global ecommerce services provider 110 for any number of front-end or back-end services. Front-end services might be related to fraud screening, inventory allotment, for example, and back-end services might be related to tax calculations, payment processing, and other services. The global ecommerce services provider 110 performs the contracted services and forwards a message containing the user request, enhanced with data related to the services it has processed, to the primary system 108.
Web sites and services, being machines with limited resources, may experience instability when a high volume of transactions is submitted during a very short period of time, as can happen with high demand products sold during the holidays. During periods of peak traffic on a node (e.g. web server, application server, etc), the ecommerce service provider 110 must be able to continue to accept orders and provide the transaction originator with an acceptable experience while preventing further instability of the primary selling platform 108. When this occurs, the ecommerce system 110 must hold incoming orders until the primary system 108 has regained stability, and then continue to monitor system health.
In some embodiments, a shopper node (order taker server 114 as described above) processes the transaction and submits it to the primary platform 108 while monitoring the health of the primary system 108 to determine a course of action for an incoming transaction. A service monitoring module residing on the shopper node 114 or util node (an ODS server 116 capturing and reprocessing queued transactions) may comprise computer code, which when executed by the processor collects data on each call made to the primary system, calculates metrics and sets or resets a circuit breaker that directs transactions to queue in times of primary system 108 distress. Processes resident on the util node, discussed below, resubmits queued transactions for processing. As transactions are presented to the primary platform 108, data is collected regarding the health of the primary system 108 and is used to control the flow of transactions to the primary platform 108. Similarly, when an attempt is made to resubmit queued orders, a service monitoring module on the util node 116 collects data and calculates metrics that set the circuit breaker on that node.
Nodes (processing locations) comprise computing devices, such as the user computing device, and the servers hosting the ecommerce systems (global ecommerce and primary platform) and HVTQ modules used to implement embodiments of the invention, may include a communication device, a processing device and a memory device. The processing device is operatively coupled to the communication device and the memory device. As used herein, “processing device” generally includes circuitry used for implementing the communication and/or logic functions of the particular system. For example, a processing device may include a digital signal processor device, a microprocessor device, and various analog-to-digital converters, digital to analog converters and other support circuits alone or in combination. Control and signal processing functions of the system are allocated between these processing devices according to their respective capabilities. The processing device may include functionality to operate one or more software programs based on computer readable instructions which may be stored in a non-transitory memory device, typically as modules encapsulating code to implement a particular functionality. The processing device uses the communication device to communicate with the network and the devices and systems on the network, such as, but not limited to the user computing device, the global ecommerce (external) system, and the primary platform. As such, the communication device generally comprises a modem, server or other device for communicating with other devices on the network.
Referring again to
When the service monitoring module 218 determines that the primary system 108 is unstable or in distress, the circuit breaker 202, 204 is closed. As long as the circuit breaker 202, 204 is closed, transactions are pre-processed normally, but instead of being submitted to the primary platform, their status is set to ‘queued’ 212 and the transaction requests are held in queue. All transactions are then stored in the shopper node database 114 as ‘submitted’ or ‘queued’ and will be transferred to the ODS (
Referring again to
As described above, the service monitoring module aggregates data related to the health of the primary platform in order to determine the value of parameters used in setting the circuit breaker, which determines whether the transactions are processed normally or are queued to be retried when the primary platform is healthy again. Each shopper node may have one of three states; each util node one of two, as are listed in Table 1.
Under normal circumstances, both the util node and the shopper node circuit breakers will be closed. Under high load, if the circuit breakers trip, the size of the queue will grow rapidly. If the queue size reaches a configurable limit (Q1), all circuit breakers on all nodes will be opened. After a configurable cool down period (S) the circuit breakers on the util nodes will be set to a rate-limited state. After this point the health of the system will be evaluated by querying the number of orders that were retried successfully versus the number of orders that were retried and requeued since the nodes were all set to open. If a percentage of these requests greater than a configurable threshold (P) are successful, the util node circuit breakers will be set to closed. If the percentage of successful requests is below the threshold, it indicates that the system is not yet healthy, and the util nodes circuit breakers will be set to open again. Once the util node circuit breakers are in a closed state, they will remain in this state until the size of the queue drops below a configurable size (Q2) at which point the shopper nodes will be set to a rate limited state. If instead the size of the queue increases to above the limit Q1, indicating that the system is once again unhealthy, the util node circuit breaker will be set to open again. When the shopper nodes are in the rate limited state and the queue size drops to below a low configurable size (Q3), the shopper node circuit breaker will return to a closed state. Table 2 describes these configurable and calculated parameters. The specified configurable values are set in a config properties file.
Data, primarily time values, counts and response results (failures or successful attempts and data about them), is collected and metrics calculated at both a global level and a local level, which allows the system to remove a node from service if it appears to be unstable. Configurable settings may be first estimated using historical knowledge of the number of transactions expected, and then recalculated or re-estimated as production data is received. Table 3 provides exemplary global settings, and Table 4 provides exemplary local settings, both with exemplary values. The values listed in Tables 3 and 4 merely provide examples of the configurable values that may be used. Those of ordinary skill in the art will understand that these values should be set to optimize the system practicing the embodiment.
While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that this invention not be limited to the specific constructions and arrangements shown and described, since various other changes, combinations, omissions, modifications and substitutions, in addition to those set forth in the above paragraphs, are possible. Those skilled in the art will appreciate that various adaptations and modifications of the just described embodiments can be configured without departing from the scope and spirit of the invention. Therefore, it is to be understood that, within the scope of the appended claims, the invention may be practiced other than as specifically described herein.
This application claims the benefit of U.S. Provisional Application No. 62/335,740 filed 13 May 2016, entitled “High Volume Transaction Queueing with Machine Learning,” which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
6163772 | Kramer | Dec 2000 | A |
6324525 | Kramer | Nov 2001 | B1 |
6983320 | Thomas | Jan 2006 | B1 |
7013323 | Thomas | Mar 2006 | B1 |
7110967 | Espenes | Sep 2006 | B1 |
7249059 | Dean | Jul 2007 | B2 |
7979297 | Shivananda | Jul 2011 | B1 |
8078483 | Hirose | Dec 2011 | B1 |
8108299 | Waelbroeck | Jan 2012 | B1 |
8255288 | Gupta | Aug 2012 | B1 |
8281309 | Meda | Oct 2012 | B2 |
8306870 | Mesaros | Nov 2012 | B2 |
8311863 | Kemp | Nov 2012 | B1 |
8370395 | Gupta | Feb 2013 | B1 |
8429049 | Smith | Apr 2013 | B2 |
8533105 | Silverman | Sep 2013 | B1 |
8538843 | Smith | Sep 2013 | B2 |
8561184 | Marsa | Oct 2013 | B1 |
8606644 | Bruckhaus | Dec 2013 | B1 |
8688545 | Angilivelil | Apr 2014 | B1 |
9059938 | Strand | Jun 2015 | B1 |
9558465 | Arguelles | Jan 2017 | B1 |
9747618 | Reiss | Aug 2017 | B1 |
9805402 | Maurer | Oct 2017 | B1 |
9875500 | Garnepudi | Jan 2018 | B2 |
9916560 | Vasantham | Mar 2018 | B2 |
10057333 | Pitio | Aug 2018 | B2 |
10063661 | Jan | Aug 2018 | B2 |
10223729 | Douglas | Mar 2019 | B2 |
10373242 | Garnepudi | Aug 2019 | B2 |
20010042023 | Anderson | Nov 2001 | A1 |
20020013731 | Bright | Jan 2002 | A1 |
20020052801 | Norton | May 2002 | A1 |
20020083175 | Afek | Jun 2002 | A1 |
20020099649 | Lee | Jul 2002 | A1 |
20020143655 | Elston | Oct 2002 | A1 |
20020188499 | Jenkins | Dec 2002 | A1 |
20030188013 | Nishikado | Oct 2003 | A1 |
20040010626 | Gillam | Jan 2004 | A1 |
20050065855 | Geller | Mar 2005 | A1 |
20050203804 | Suzuki | Sep 2005 | A1 |
20060026682 | Zakas | Feb 2006 | A1 |
20060080196 | Griffin | Apr 2006 | A1 |
20060282346 | Kernodle | Dec 2006 | A1 |
20070055554 | Sussman | Mar 2007 | A1 |
20070106546 | Esau | May 2007 | A1 |
20070124213 | Esau | May 2007 | A1 |
20070130313 | King | Jun 2007 | A1 |
20070239556 | Wagner | Oct 2007 | A1 |
20080027735 | Bruchlos | Jan 2008 | A1 |
20080086435 | Chesla | Apr 2008 | A1 |
20080243536 | Dworkin | Oct 2008 | A1 |
20080243576 | Czupek | Oct 2008 | A1 |
20080243634 | Dworkin | Oct 2008 | A1 |
20080270268 | Pacha | Oct 2008 | A1 |
20080294996 | Hunt | Nov 2008 | A1 |
20080319829 | Hunt | Dec 2008 | A1 |
20090006156 | Hunt | Jan 2009 | A1 |
20090077233 | Kurebayashi | Mar 2009 | A1 |
20100088212 | Czupek | Apr 2010 | A1 |
20100088214 | Czupek | Apr 2010 | A1 |
20100088216 | Czupek | Apr 2010 | A1 |
20110082814 | Chiulli | Apr 2011 | A1 |
20110153839 | Rajan | Jun 2011 | A1 |
20110295722 | Reisman | Dec 2011 | A1 |
20110320395 | Dada | Dec 2011 | A1 |
20120084165 | Hirose | Apr 2012 | A1 |
20120089486 | Mahakian | Apr 2012 | A1 |
20120143741 | Lopez de Prado | Jun 2012 | A1 |
20120158566 | Fok | Jun 2012 | A1 |
20120330954 | Sivasubramanian | Dec 2012 | A1 |
20130013487 | Sellberg | Jan 2013 | A1 |
20130080351 | Schneider | Mar 2013 | A1 |
20130080635 | Ho | Mar 2013 | A1 |
20130262678 | Tung | Oct 2013 | A1 |
20140025535 | Douglas | Jan 2014 | A1 |
20140052835 | Felton | Feb 2014 | A1 |
20140067780 | Lipscomb | Mar 2014 | A1 |
20140081802 | Oiwa | Mar 2014 | A1 |
20140095285 | Wadell | Apr 2014 | A1 |
20140122453 | Gocek | May 2014 | A1 |
20140149273 | Angell | May 2014 | A1 |
20140188668 | Brabec | Jul 2014 | A1 |
20140330426 | Brunner | Nov 2014 | A1 |
20140330700 | Studnitzer | Nov 2014 | A1 |
20150073967 | Katsuyama | Mar 2015 | A1 |
20150073970 | Merold | Mar 2015 | A1 |
20150081485 | Angelovski | Mar 2015 | A1 |
20150088697 | Garnepudi | Mar 2015 | A1 |
20150100465 | Dornbush | Apr 2015 | A1 |
20150287108 | Monk | Oct 2015 | A1 |
20150324882 | Ouimet | Nov 2015 | A1 |
20150332398 | Brkic | Nov 2015 | A1 |
20150339162 | Kose | Nov 2015 | A1 |
20150348169 | Harris | Dec 2015 | A1 |
20160055490 | Keren | Feb 2016 | A1 |
20160055503 | Chan | Feb 2016 | A1 |
20160078370 | McEwen | Mar 2016 | A1 |
20160078536 | Rooney | Mar 2016 | A1 |
20160078538 | Katsuyama | Mar 2016 | A1 |
20160140519 | Trepca | May 2016 | A1 |
20160182328 | Bhasin | Jun 2016 | A1 |
20160182330 | Iannaccone | Jun 2016 | A1 |
20160182331 | Iannaccone | Jun 2016 | A1 |
20160196606 | Damodaran | Jul 2016 | A1 |
20160205174 | Pitio | Jul 2016 | A1 |
20160205180 | Jan | Jul 2016 | A1 |
20160226955 | Moorthi | Aug 2016 | A1 |
20180075519 | Garnepudi | Mar 2018 | A1 |
20180375964 | Jan | Dec 2018 | A1 |
Number | Date | Country | |
---|---|---|---|
20170330267 A1 | Nov 2017 | US |
Number | Date | Country | |
---|---|---|---|
62335740 | May 2016 | US |