The present invention relates generally to data processing, and in some embodiments, to detecting anomalies in computer-based systems.
In many cases, enterprises maintain and operate large numbers of computer systems (e.g., servers) that may each run a layered set of software. In some cases, these computer systems provide functionality for the operation of the enterprise or to provide outbound services to their customers. In many cases, the enterprise may monitor the hardware and software layers of these servers by logging processing load, memory usage, and many other monitored signals at frequent intervals.
Unfortunately, the enterprise may occasionally suffer disruptions, where some of its services were degraded or even completely unavailable to customers. To resolve these disruptions, the enterprise will perform a post-mortem analysis of the monitored signal in an effort to debug the system. For example, the enterprise may analyze the memory usage to identify a program the may be performing improperly, or view the processing load to determine whether more hardware is needed.
Thus, traditional systems may utilize methods and systems for addressing anomalies that involves debugging a computer system after the anomaly has affected the computer system, and, by extension, the users.
Some embodiments are illustrated by way of example and not limitations in the figures of the accompanying drawings, in which:
The headings provided herein are for convenience only and do not necessarily affect the scope or meaning of the terms used.
Described in detail herein is an apparatus and method for detecting anomalies in a computer system. For example, some embodiments may be used to address the problem of how to monitor signals in a computer system to detect disruptions before they affect users, and to do so with few false positives. Some embodiments may address this problem by analyzing signals for strange behavior that may be referred to as an anomaly. Example embodiments can then scan multiple monitored signals, and raise an alert when the site monitoring system detects an anomaly.
Various modifications to the example embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the scope of the invention. Moreover, in the following description, numerous details are set forth for the purpose of explanation. However, one of ordinary skill in the art will realize that the invention may be practiced without the use of these specific details. In other instances, well-known structures and processes are not shown in block diagram form in order not to obscure the description of the invention with unnecessary detail. Thus, the present disclosure is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
For example, in some embodiments, multiple probes (e.g., queries) are executed on an evolving data set (e.g., a listing database). Each probe may return a result. Property values are then derived from a respective result returned by one of the probes. A property value may be a value that quantifies a property or aspect of the result, such as, for example, a number of listing returned, a portion of classified listings, a measurement of the prices in a listing, and the like.
Surprise scores corresponding to the property values are generated, where each surprise score is generated based on a comparison between a corresponding property value and historical property values. The corresponding property value and the historical property values are derived from results returned from the same probe. Historical surprise scores generated by the anomaly detection engine are accessed. Responsive to a comparison between the plurality of surprise scores and the plurality of historical surprise scores, a monitoring system is alerted of an anomaly regarding the evolving data set.
Each of the device machines 110, 112 comprises a computing device that includes at least a display and communication capabilities with the network 104 to access the networked system 102. The device machines 110, 112 comprise, but are not limited to, remote devices, work stations, computers, general purpose computers, Internet appliances, hand-held devices, wireless devices, portable devices, wearable computers, cellular or mobile phones, portable digital assistants (PDAs), smart phones, tablets, ultrabooks, netbooks, laptops, desktops, multi-processor systems, microprocessor-based or programmable consumer electronics, game consoles, set-top boxes, network PCs, mini-computers, and the like. Each of the device machines 110, 112 may connect with the network 104 via a wired or wireless connection. For example, one or more portions of network 104 may be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless WAN (WWAN), a metropolitan area network (MAN), a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), a cellular telephone network, a wireless network, a WiFi network, a WiMax network, another type of network, or a combination of two or more such networks.
Each of the device machines 110, 112 includes one or more applications (also referred to as “apps”) such as, but not limited to, a web browser, messaging application, electronic mail (email) application, an e-commerce site application (also referred to as a marketplace application), and the like. In some embodiments, if the e-commerce site application is included in a given one of the device machines 110, 112, then this application is configured to locally provide the user interface and at least some of the functionalities with the application configured to communicate with the networked system 102, on an as needed basis, for data and/or processing capabilities not locally available (such as access to a database of items available for sale, to authenticate a user, to verify a method of payment, etc.). Conversely if the e-commerce site application is not included in a given one of the device machines 110, 112, the given one of the device machines 110, 112 may use its web browser to access the e-commerce site (or a variant thereof) hosted on the networked system 102. Although two device machines 110, 112 are shown in
An Application Program Interface (API) server 114 and a web server 116 are coupled to, and provide programmatic and web interfaces respectively to, one or more application servers 118. The application servers 118 host one or more marketplace applications 120 and payment applications 122. The application servers 118 are, in turn, shown to be coupled to one or more databases servers 124 that facilitate access to one or more databases 126.
The marketplace applications 120 may provide a number of e-commerce functions and services to users that access networked system 102. E-commerce functions/services may include a number of publisher functions and services (e.g., search, listing, content viewing, payment, etc.). For example, the marketplace applications 120 may provide a number of services and functions to users for listing goods and/or services or offers for goods and/or services for sale, searching for goods and services, facilitating transactions, and reviewing and providing feedback about transactions and associated users. Additionally, the marketplace applications 120 may track and store data and metadata relating to listings, transactions, and user interactions. In some embodiments, the marketplace applications 120 may publish or otherwise provide access to content items stored in application servers 118 or databases 126 accessible to the application servers 118 and/or the database servers 124. The payment applications 122 may likewise provide a number of payment services and functions to users. The payment applications 122 may allow users to accumulate value (e.g., in a commercial currency, such as the U.S. dollar, or a proprietary currency, such as “points”) in accounts, and then later to redeem the accumulated value for products or items (e.g., goods or services) that are made available via the marketplace applications 120. While the marketplace and payment applications 120 and 122 are shown in
Further, while the system 100 shown in
The web client 106 accesses the various marketplace and payment applications 120 and 122 via the web interface supported by the web server 116. Similarly, the programmatic client 108 accesses the various services and functions provided by the marketplace and payment applications 120 and 122 via the programmatic interface provided by the API server 114. The programmatic client 108 may, for example, be a seller application (e.g., the TurboLister application developed by eBay Inc., of San Jose, Calif.) to enable sellers to author and manage listings on the networked system 102 in an off-line manner, and to perform batch-mode communications between the programmatic client 108 and the networked system 102.
The networked system 102 may provide a number of publishing, listing, and/or price-setting mechanisms whereby a seller (also referred to as a first user) may list (or publish information concerning) goods or services for sale or barter, a buyer (also referred to as a second user) can express interest in or indicate a desire to purchase or barter such goods or services, and a transaction (such as a trade) may be completed pertaining to the goods or services. To this end, the networked system 102 may comprise at least one publication engine 202 and one or more selling engines 204. The publication engine 202 may publish information, such as item listings or product description pages, on the networked system 102. In some embodiments, the selling engines 204 may comprise one or more fixed-price engines that support fixed-price listing and price setting mechanisms and one or more auction engines that support auction-format listing and price setting mechanisms (e.g., English, Dutch, Chinese, Double, Reverse auctions, etc.). The various auction engines may also provide a number of features in support of these auction-format listings, such as a reserve price feature whereby a seller may specify a reserve price in connection with a listing and a proxy-bidding feature whereby a bidder may invoke automated proxy bidding. The selling engines 204 may further comprise one or more deal engines that support merchant-generated offers for products and services.
A listing engine 206 allows sellers to conveniently author listings of items or authors to author publications. In one embodiment, the listings pertain to goods or services that a user (e.g., a seller) wishes to transact via the networked system 102. In some embodiments, the listings may be an offer, deal, coupon, or discount for the good or service. Each good or service is associated with a particular category. The listing engine 206 may receive listing data such as title, description, and aspect name/value pairs. Furthermore, each listing for a good or service may be assigned an item identifier. In other embodiments, a user may create a listing that is an advertisement or other form of information publication. The listing information may then be stored to one or more storage devices coupled to the networked system 102 (e.g., databases 126). Listings also may comprise product description pages that display a product and information (e.g., product title, specifications, and reviews) associated with the product. In some embodiments, the product description page may include an aggregation of item listings that correspond to the product described on the product description page.
The listing engine 206 also may allow buyers to conveniently author listings or requests for items desired to be purchased. In some embodiments, the listings may pertain to goods or services that a user (e.g., a buyer) wishes to transact via the networked system 102. Each good or service is associated with a particular category. The listing engine 206 may receive as much or as little listing data, such as title, description, and aspect name/value pairs, that the buyer is aware of about the requested item. In some embodiments, the listing engine 206 may parse the buyer's submitted item information and may complete incomplete portions of the listing. For example, if the buyer provides a brief description of a requested item, the listing engine 206 may parse the description, extract key terms and use those terms to make a determination of the identity of the item. Using the determined item identity, the listing engine 206 may retrieve additional item details for inclusion in the buyer item request. In some embodiments, the listing engine 206 may assign an item identifier to each listing for a good or service.
In some embodiments, the listing engine 206 allows sellers to generate offers for discounts on products or services. The listing engine 206 may receive listing data, such as the product or service being offered, a price and/or discount for the product or service, a time period for which the offer is valid, and so forth. In some embodiments, the listing engine 206 permits sellers to generate offers from the sellers' mobile devices. The generated offers may be uploaded to the networked system 102 for storage and tracking
In a further example embodiment, the listing engine 206 allows users to navigate through various categories, catalogs, or inventory data structures according to which listings may be classified within the networked system 102. For example, the listing engine 206 allows a user to successively navigate down a category tree comprising a hierarchy of categories (e.g., the category tree structure) until a particular set of listing is reached. Various other navigation applications within the listing engine 206 may be provided to supplement the searching and browsing applications. The listing engine 206 may record the various user actions (e.g., clicks) performed by the user in order to navigate down the category tree.
Searching the networked system 102 is facilitated by a searching engine 208. For example, the searching engine 208 enables keyword queries of listings published via the networked system 102. In example embodiments, the searching engine 208 receives the keyword queries from a device of a user and conducts a review of the storage device storing the listing information. The review will enable compilation of a result set of listings that may be sorted and returned to the client device (e.g., device machine 110, 112) of the user. The searching engine 208 may record the query (e.g., keywords) and any subsequent user actions and behaviors (e.g., navigations, selections, or click-throughs).
The searching engine 208 also may perform a search based on a location of the user. A user may access the searching engine 208 via a mobile device and generate a search query. Using the search query and the user's location, the searching engine 208 may return relevant search results for products, services, offers, auctions, and so forth to the user. The searching engine 208 may identify relevant search results both in a list form and graphically on a map. Selection of a graphical indicator on the map may provide additional details regarding the selected search result. In some embodiments, the user may specify, as part of the search query, a radius or distance from the user's current location to limit search results.
The searching engine 208 also may perform a search based on an image. The image may be taken from a camera or imaging component of a client device or may be accessed from storage.
In addition to the above described modules, the networked system 102 may further included an anomaly detection engine 212 and a probe module 210 to perform various anomaly detection functionalities or operations as set forth in greater detail below.
As explained above, some example embodiments may be configured to detect anomalies in an evolving data set by comparing surprise scores of property values received from a probe module. However, before describing the methods and systems for detecting anomalies in a computer system in great detail, some simplified examples of analyzing property values are now described to highlight some potential aspects addressed by example embodiments. For example, as a warm-up problem, consider a signal from a high software layer: the number of searches (“srp”) received or performed by the networked system 102 of
But now consider
Another example of detecting anomalies in a computer system is now described with reference to
In some cases, the anomaly detecting engine 212 may determine whether a value of a property represents an anomaly caused by a site disruption based in part on calculating surprise scores for the property value. A surprise score may be a measurement used to quantify how out of the norm a value for a property is based on historical values for that property. For example, the anomaly detecting engine 212 may quantify the surprise score for a value of a property by computing the (unsigned) deviation of each property value from an expected value. For example, one specific implementation of calculating a surprise score may involve dividing the deviation of a value from the expected value (e.g., the fitted line 504) by the median deviation of all the values. Assuming the deviation for the value 502 is 97.9 and the median deviation for all the values of the property values 500 is 13.4, the anomaly detecting engine 212 may assign the value 502 a surprise score of 7.3 (e.g., 97.9/13.4).
Some embodiments may address the issue of whether a particular surprise score (e.g., a surprise score of 7.3, as discussed above) should trigger an alert that there may be an anomaly in the computer system.
To clarify surprise scores shown in
Incidentally,
One way around the difficulty of avoiding false positive may be through aggregation of surprising values across multiple queries. Since there will always be a few queries with a high surprise, the anomaly detection engine 212 can construct a feature based on the number of surprise scores that deviate from historical norms. A sudden change in the number of high surprise scores, for example, might be a good indicator of a site disruption. This is done separately for each property being monitored by the anomaly detection engine 212. To make this quantitative, instead of counting the number of queries with a high surprise, some embodiments of the anomaly detection engine 212 can examine a quantile (e.g., 0.9th quantile) of the surprise values for a property. Using the quantiles to detect anomalies is now described.
The surprise score of the most recent property value, as computed above with reference to
An example of an anomaly that corresponds to a genuine disruption is now described.
Summarizing the above, it is expected that an individual property for a particular query will have sudden jumps in values. Although these sudden jumps may represent outliers, an outlier, in and of itself, should not necessarily raise an alert. Instead, example embodiments may use the number of queries that have such jumps as a good signal for raising an alert of an anomaly. So a selection of features may go like this. For each (probe, property value) pair, we compute a measure of surprise and measure whether the latest property value of the property is an outlier. The anomaly detection engine 212 then has a surprise number for each query. It is expected to have a few large surprise numbers, but not too many. To quantify this, the anomaly detection engine 212 may in some embodiments select the 90th quantile of surprise values (e.g., sort the surprise values from low to high and, return the 90-th value, or using a non-sorting function to calculate a quantile or ranking of surprise scores). This is our feature. Now we can use any outlier detection method to raise an alert. For example, in an example embodiment, the anomaly detection engine 212 may take the last 30 days' worth of signals, compute their mean and standard deviation. If the latest quantile of the signal is more than threshold deviation (e.g., 56 from the mean), the anomaly detection engine 212 raises an alert.
A method for detecting an anomaly in a computer system is now described in greater detail. For example,
As
In some embodiment, as part of operation 1202, the probe module 210 is further configured to derive property values for each result returned from the probes. As discussed above, a property value may include a data that quantifies a property or aspect of a result. To illustrate, again by way of example and not limitation, where the probe module 210 is configured to transmit a set of queries to the searching engine 208, the property value may represent, for example, a value for the property of the number of items returned in the result, the average list price in the result, a measurement of the number of items that are auctions relative to non-auction items in the result, a number of classified listings in the result, or any other suitable property.
Thus, in some embodiments, the execution of operation 1202 may result in a data table that includes a number of property values that each correspond to one of the probes executed by the probe module 210. Further, as the probe module 210 may monitor more than one property type, the table may include multiple columns, where each column corresponds to a different property type. This is shown in
With reference back to
As discussed above with respect to the operation 1204 shown in
With reference back to
At operation 1208, responsive to a comparison between the plurality of surprise scores and the plurality of historical surprise scores, the anomaly detection engine 212 may alert a monitoring system of an anomaly regarding the evolving data set. With momentary reference to
The comparison used by operation 1208 may be based a feature derived from the surprise scores. To illustrate,
As part of operation 1208, the feature of the surprise scores may then be compared against historical surprise scores from past iterations of executing the probes. This is shown in
It is to be appreciated that the operations 1206 and 1208 shown in
It is to be further appreciated that although much of this disclosure discusses anomaly detection in the context of a search engine, other example embodiments may use the anomaly detection methods and systems described herein to detect anomalies in other types of computer systems. For example, the computer system may be an inventory data store. In such a case, the probe module 210 may be configured to detect as property types, among other things, the number of items stored per category, the number of auction items per category, and the like.
As another example, the computer system may be a computer infrastructure (e.g., a collection of computer servers). In such a case, the probe module 210 may be configured to detect as property types, among other things, a processor load, bandwidth consumption, thread count, running processes count, memory usage, throughput count, rate of disk seeks, rate of packets transmitted or received, or rate of response.
In other embodiments, the property values tracked by the anomaly detection engine 212 may include dimensions in addition to what is described above. For example, in the embodiment discussed above, one may conceptualize that a table may be used to store the property values, where the columns are the metrics tracked by the different probes modules, and the rows are the different values for those property types at different times. An extension would be a 3D-table or cube. For each (property, value) cell, there may be a series of aspects instead of a single number. An aspect may be a vertical stack out of the page. In example (1), the aspect might be different countries. Thus, a cell in a table may be related to a specific query (perhaps ‘iPhone 5S’) and property (perhaps number of results). But using the aspects, the results vary by country, so the single cell is replaced by a stack of entries, one for each country.
As mentioned in the previous section, the anomaly detection engine 212 may be configured to detect a problem with the search software (a disruption) before users do. In some cases, the property values may be received from the same interfaces and using the same computer systems used by the end users. In such cases, the metric data received from the probe module 210 is a proxy, if not identical, to the user experience of users of that computer system. Accordingly, it is to be appreciated that when this disclosure states the anomaly detection engine 212 may detect a problem before users do, it may simply mean that the anomaly detection engine 212 can detect and report a problem without intervention from a user. Thus, compared to traditional systems, example embodiments may use the anomaly detection engine 212 to provide comparatively quick detection of site problems.
The example computer system 1800 includes a processor 1802 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), or both), a main memory 1804 and a static memory 1806, which communicate with each other via a bus 1808. The computer system 1800 may further include a video display unit 1810 (e.g., liquid crystal display (LCD), organic light emitting diode (OLED), touch screen, or a cathode ray tube (CRT)). The computer system 1800 also includes an alphanumeric input device 1812 (e.g., a physical or virtual keyboard), a cursor control device 1814 (e.g., a mouse, a touch screen, a touchpad, a trackball, a trackpad), a disk drive unit 1816, a signal generation device 1818 (e.g., a speaker) and a network interface device 1820.
The disk drive unit 1816 includes a machine-readable medium 1822 on which is stored one or more sets of instructions 1824 (e.g., software) embodying any one or more of the methodologies or functions described herein. The instructions 1824 may also reside, completely or at least partially, within the main memory 1804 and/or within the processor 1802 during execution thereof by the computer system 1800, the main memory 1804 and the processor 1802 also constituting machine-readable media.
The instructions 1824 may further be transmitted or received over a network 1826 via the network interface device 1820.
While the machine-readable medium 1822 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the present invention. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical and magnetic media, and carrier wave signals.
It will be appreciated that, for clarity purposes, the above description describes some embodiments with reference to different functional units or processors. However, it will be apparent that any suitable distribution of functionality between different functional units, processors or domains may be used without detracting from the invention. For example, functionality illustrated to be performed by separate processors or controllers may be performed by the same processor or controller. Hence, references to specific functional units are only to be seen as references to suitable means for providing the described functionality, rather than indicative of a strict logical or physical structure or organization.
Certain embodiments described herein may be implemented as logic or a number of modules, engines, components, or mechanisms. A module, engine, logic, component, or mechanism (collectively referred to as a “module”) may be a tangible unit capable of performing certain operations and configured or arranged in a certain manner. In certain example embodiments, one or more computer systems (e.g., a standalone, client, or server computer system) or one or more components of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) or firmware (note that software and firmware can generally be used interchangeably herein as is known by a skilled artisan) as a module that operates to perform certain operations described herein.
In various embodiments, a module may be implemented mechanically or electronically. For example, a module may comprise dedicated circuitry or logic that is permanently configured (e.g., within a special-purpose processor, application specific integrated circuit (ASIC), or array) to perform certain operations. A module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software or firmware to perform certain operations. It will be appreciated that a decision to implement a module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by, for example, cost, time, energy-usage, and package size considerations.
Accordingly, the term “module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), non-transitory, or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. Considering embodiments in which modules or components are temporarily configured (e.g., programmed), each of the modules or components need not be configured or instantiated at any one instance in time. For example, where the modules or components comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different modules at different times. Software may accordingly configure the processor to constitute a particular module at one instance of time and to constitute a different module at a different instance of time.
Modules can provide information to, and receive information from, other modules. Accordingly, the described modules may be regarded as being communicatively coupled. Where multiples of such modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the modules. In embodiments in which multiple modules are configured or instantiated at different times, communications between such modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple modules have access. For example, one module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further module may then, at a later time, access the memory device to retrieve and process the stored output. Modules may also initiate communications with input or output devices and can operate on a resource (e.g., a collection of information).
Although the present invention has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. One skilled in the art would recognize that various features of the described embodiments may be combined in accordance with the invention. Moreover, it will be appreciated that various modifications and alterations may be made by those skilled in the art without departing from the scope of the invention.
The Abstract is provided to allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.
Number | Date | Country | |
---|---|---|---|
61762420 | Feb 2013 | US |