This application claims the benefit of European Application No. 17172321.6, filed May 22, 2017, in the European Intellectual Property Office, the disclosure of which is incorporated herein by reference.
The embodiments relate to an apparatus and method for generating a multiple-event pattern query for use in the retrieval of event information. Also disclosed are a computer program which, when executed by a computer, causes the computer to perform the method, and a non-transitory computer readable medium comprising the computer program.
In recent years the amount of data generated and available for analysis in various technological areas has increased rapidly. This is particularly the case since the advent of the Internet, which has both significantly increased the amount of information generated in various technological areas and has also simultaneously increased the availability of information. The increase in the availability of information may be useful when seeking to analyze the trends and processes in technological areas, or make predictions about future trends.
Additional aspects and/or advantages will be set forth in part in the description which follows and, in part, will be apparent from the description, or may be learned by practice of the embodiments.
Problems may arise as techniques for obtaining and analyzing available information, and in particular for filtering out useful information for a given purpose from the available information, have not matched the pace at which the amount of available information has increased. This may lead to issues whereby it is not possible to extract relevant information from all of the available information in a useful time frame, and consequently it may be difficult to obtain, filter and analyze all or the majority of the useful information.
Some areas in which the problems of filtering useful information from irrelevant information are particularly acute include financial, medical, transport and logistics, and so on. However, equivalent problems may arise in any technological area in which high volumes of data are generated.
With specific reference to financial analyses, it is necessary to take into account the actions of a large number of different entities such as various companies, banks, government regulators, as well as other factors, in order to fully analyze aspects of financial markets or make useful predictions of future trends. The amount of potentially relevant information generated which may influence financial markets may be large, therefore filtering the information may be a difficult task. Also, the data may be generated in a large number of disparate forms. For example, and taking only the forms which are widely used on the internet in relation to financial markets, useful information may be obtained from disparate and heterogeneous sources such as news reports, RSS feeds, stock trackers, blogs, and so on.
Because of the large volume of information available, and also the disparate nature of the sources of the information, it may be very difficult for a user to keep up with the flow of information provided and extract useful information from the flow in a reasonable time frame. As a consequence of this, a user may elect to monitor a subset of the available data sources (a single blog or a single stock tracker, for example), however this approach may lead to potentially key information not being monitored and therefore being omitted from a subsequent analysis.
For the above reasons, it may be desirable to use automated systems to extract events from disparate and heterogeneous information sources, and to standardize the form of the events to allow relevant events (for a given purpose) to be identified. If events that have been converted into a standardized form are available, it is then desirable to provide a system for receiving an input query from a user, and converting this input query into a form which may be used to interrogate events generated in the standardized form to allow relevant events to be identified.
Embodiments include an apparatus and a method for generating a multiple-event pattern query for use in the retrieval of event information, the apparatus comprising: an inputter configured to receive an input query from a user; a retriever configured to analyze the received input query, and to retrieve an event query template from an event template database on the basis of the analysis; a pattern query formulation unit configured to receive the retrieved event query template, and to modify the event query template using the received input query to formulate the multiple-event pattern query; and a transmitter configured to transmit the multiple-event pattern query to an event information source, such that events matching the multiple-event pattern query may be identified. The apparatus may allow event sources to be accurately and efficiently searched.
The event query database may be configured to obtain the event query templates by training a machine intelligence using results obtained from unstructured information sources. This provides an effective means for automatically providing a source of events.
The inputter may comprise a specialized graphical user interface (GUI), and may be configured to receive the input query from the user via the specialized GUI. This provides an intuitive means for the user to input the input query, while also assisting with accurate query input to facilitate the remaining operations of the system.
If the analysis of the input query indicates that plural event query templates are required, the retriever may be configured to retrieve a plurality of event query templates, and the pattern query formulation unit may be configured to modify the plurality of event query templates to formulate the multiple-event pattern. This allows the system to provide responses to complex queries which may be divided into several sub queries.
The inputter may be configured to semantically annotate the input query inputted by the user, such that the multiple-event pattern query formulated by the pattern query formulation unit may be used to more accurately retrieve events relevant to the received input query.
The inputter may be configured to perform an initial check of the structure of the received input query and, if the initial check suggests problems with the structure, to indicate to the user that there may be a problem with the received input query. The inputter may also be further configured to suggest alternative structures for the received input query when the initial check suggests problems with the structure. These features may assist with accurate query entry, particularly for less experienced users.
The multiple-event pattern query formulated by the pattern query formulation unit may specify a chronological ordering of plural events. The multiple-event pattern query may further specify a time interval threshold into which events satisfying the multiple-event pattern query will fall. This allows the responses to the query to be tailored to the specific area of interest, thereby filtering the most useful responses from the available information.
The multiple-event pattern query may comprise instructions for the retrieval of provenance information from the event information source for events matching the multiple-event pattern query. The provenance information may be useful to users wishing to assess the reliability of given information.
The event information may be retrieved from an event stream, allowing the ongoing collection of event information.
These and/or other aspects and advantages will become apparent and more readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to the embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout. The embodiments are described below by referring to the figures.
The nature of the events which may be identified using multiple-event pattern queries formulated by the system varies between different areas of application. Typically, an event is an occurrence or change of situation which may have ramifications for a given technological area. For example, in the technological area of medical research, an event may relate to the release of a new drug, or to an announcement that a particular drug has been found to cause unwanted side effects. In financial and business related areas, an event may relate to some sort of detectable change in a company, sector or customer behavior. An example of this may be the release of a new product by a company, or a company completing a merger with another company. Customer specific events may relate to events such as the issuing of a warning of potentially problematic economic circumstances (which may suppress customer spending) or a fluctuation in a currency market.
By analyzing the response of a technological area to particular events, it may be possible to identify future events following a similar pattern at an earlier stage, and thereby predict future occurrences in the technological area on the basis of early identification of these events. This has clear benefits, for example, in terms of dealing in stocks and shares, allowing a company to predict stock requirements or financial reserves required.
An event pattern is a form of template for ternary information, which maps onto a specific event when generic variables are replaced with specific values. For example, and in a business context, an event pattern may be of the form <unknowncompanyX; sells; unknowncompanyY>. In this event pattern, there are two partially defined entities: unknowncompanyX and unknowncompanyY. These entities are specified to be of the type “company” but are not limited to a specific company. The two partially defined entities unknowncompanyX and unknowncompanyY may also be referred to as the subject and object respectively. The event pattern also contains an operator (or predicate); in this case “sells”. The general form of an event may therefore be abbreviated to <S; P; O>, that is, <Subject; Predicate; Object>. A specific event matching the example event pattern would be of the same general form, but with specific entities in the place of the generic placeholders used above. For example, a specific event matching this event pattern may be of the form <A Ltd.; sells; B Ltd.>. In this event, the subject and object (both entities) are “A Ltd.” and “B Ltd.” respectively (both defined entities of the type “company”), and the predicate is “sells”.
As evidenced by the explanation above, the event patterns are a technical format, primarily configured to be understandable by computers. As such, the event patterns may be difficult for a user to understand and to intuitively work with. For this reason, even in the case that a database of event information is available, it may be difficult for a user to successfully query this database in order to retrieve useful information. The same is true where, instead of a database, a stream of event information is retrieved from a plurality of disparate sources, such as news blogs, RSS feeds, and so on.
Typically, a user would formulate a query using natural language, that is, using common terminology and grammar in the same way as the query would be formulated were it to be put to a human respondent. An example of a query in a natural language form would be “how many Spanish companies have had a successful product launch in the last month”. While this natural language query is easily understandable by a human user, for a computer the query would not easily be understood, and it would accordingly be difficult for a computer to reliably and efficiently monitor for events which may be relevant to the answer to this query.
Provided is an apparatus 300 configured to generate a multiple-event pattern query for use in the retrieval of event information, either from an event stream or from a stored database of events. The apparatus 300 is shown in
Optionally the inputter 301 may comprise a graphical user interface (GUI) 302 which is specifically configured to facilitate the entering of the natural language query into the inputter 301 by the user. As an example of the functionality which may be provided by the GUI 302, it is possible for this system to comprise a plurality of drop down menus which a user may select a specific entity (for example “A Ltd.”, where A Ltd. is a company), or class of entities (for example “Japanese companies”) to be entered into the query. The GUI 302 may also comprise various other settings, such as an option to set a time window of interest over which matching events should be searched for. The exact settings which may be configured using the GUI 302 will depend on the technological area to which the natural language query relates. As such, the inputter 301 may also be linked to a database of entities, such that the user may select known entities from the field of interest to be inputted into the query. Where a GUI 302 is used in the inputting of the initial natural language query, this may make the subsequent generation of the multiple-event pattern query (as discussed below) far simpler. Alternatively, for more skilled users, it may be possible for a user to input a query using a programming language, such as Subject Query Language (SQL) based languages.
The inputter 301 may be further configured to perform an initial check of the structure of the received input query. That is, the inputter 301 may be configured to ensure that the received input query may be understood, and conforms to known grammatical structures. In the event that the initial checks suggests that there may be problems with the structure, this may be indicated to the user so that the user may reformulate the input query into a form which may be understood by the inputter 301. In the event that the input query has been written in a SQL based language, the inputter 301 may check the syntax of the query. Optionally, the inputter 301 may be further configured to suggest alternative structures for the same query to the user, such that it is easier for the user to reformulate the query into a structure that may be understood. Again, this is particularly efficient when paired with the GUI 302, because the alternative structures may easily be indicated (using alternative options in selection menus, and so on).
The inputter 301 may additionally or alternatively be further configured to semantically annotate the input query inputted by the user. That is, the inputter 301 may be configured to either automatically or via further input from the user attach metadata tags to the input query explaining the type of information sought. As an example of this, metadata tags could be added indicating the type of information sought as a class type of the event. Again, semantic annotation of the input query may improve the retrieval of information in response to the query (once the query has been reformatted into the multiple-event pattern query format).
Once the input query has been received by the inputter 301 as discussed above, and optionally wherein additional semantic annotation has been performed where this is to be used, the query is then passed onto the retriever 303. The retriever 303 is connected to an external database, such as an event template database 1. The external database comprises a plurality of event template types, which are annotated semantically with indications of the type of event to which they relate.
The external database 1 may include an event term database, which may be used in the identification of predicates. The event term database operates as a glossary of relevant terminology for a given area, and as such the information contained therein is generally area specific. In addition to identifying relevant predicates, the event term database may be used to identify relationships between predicates (such as equivalence or inverse relationships).
The retriever 303 is configured to analyze the input query, as shown in step S32 of
Following the analysis of the input query, the retriever 303 may establish that the query appears to relate to multiple event query types. In this event, the retriever 303 may be further configured to retrieve a plurality of event query templates.
The pattern query formulation unit 305 receives the event query templates from the retriever 303, and also receives the initial input query (along with any semantic annotation if this is available) from the inputter. The pattern query formulation unit 305 then attempts to formulate a multiple-event pattern query utilizing the information it has received, as shown in step S34 of
A multiple-event pattern query is an event template form containing a plurality of variables. Among the variables, the entities are typically limited by type (for example, an entity known to be a product, an entity known to be a company, and so on) and any predicates involved in the entities may also be similarly limited (for example, related to the acquisition or merger of companies). Returning to the example discussed above of Spanish companies having successful product launches in the last month, a multiple-event pattern query based on this initial input query may include a definition of a successful product launch as, for example, a product launch which is followed chronologically by three consecutive days of increase in the company's stock price. A sample event pattern for a successful product release is shown in
In order to satisfy the consecutive three days stock price increase event pattern, three StockEvents are required, each of which relates to a company (“a”, “b” or “c” in
The “successful product release” pattern further requires that an event that is the release of the product occurs (chronologically) before the consecutive three day stock price increase event. Accordingly, a series of events comprising first the consecutive three day stock price increase and then the release of a product would not satisfy the “successful product release” pattern.
The above example also has the additional requirement of a time interval threshold into which events satisfying the multiple-event pattern query must fall. In this instance, in order to satisfy the consecutive three day stock price increase pattern query, it is necessary for the stock price rise event to occur on three consecutive days (as indicated by the “win:time(3 days)” requirement. Accordingly, if a day of neutral stock price, falling stock price or simply the absence of knowledge of the stock price were to interrupt the three stock price rise days, this would not satisfy the event criteria.
A further option which may be implemented in the multiple-event pattern query is the retrieval of provenance information for the events, where said provenance information is available. This is different to the chronological requirements and geographical requirements (for example, reference to companies originating from a specific country, such as Spain) as discussed above. This is because the chronological and geographical requirements are hard requirements; if these requirements are not satisfied by an event, the event cannot be considered to satisfy all or part of the pattern query. By contrast, the provenance requirements are typically soft requirements, not hard requirements. That is, in the event that provenance information is not available, this will not cause patterns satisfying the multiple-event pattern query to be rejected. Other hard and soft requirements may also be used.
Provenance information is used to indicate the origin of events which are deemed to satisfy the multiple-event pattern query. This information can be useful, for example, where certain sources of events are known to be more reliable than other sources of events. As an example of this, typically information retrieved from a stock market site would be deemed to be more reliable than a personal blog. This information may then be provided to a user once events satisfying the multiple-event pattern query have been retrieved, such that the user may factor the reliability of the source of the event into a subsequent analysis when determining future actions to take.
Once the multiple-event pattern query has been generated by the pattern query formulation unit 305, the multiple-event pattern query may then be sent to the transmitter 307 for transmission to an event information source, as shown in step S35 of
In the embodiments, an automated system for receiving input queries and providing queries in multiple-event pattern query form which may be understood and used to interrogate event information sources is provided. When used in conjunction with a system for automatically extracting event information, the combined system allows a far broader range of sources and types of information to be processed than may reasonably be analyzed by alternative techniques (such as processing by a human operator). The system therefore allows useful information to be obtained and therefore predictions of future occurrences to be made in an informed way on the basis of available information.
Applications of the embodiments are primarily related to fields wherein a large amount of data is generated at a high rate. An example of such a field is the financial or business field, however medical fields, transportation fields and so on may also benefit from applications of the embodiments. Essentially, the embodiments may be applied to any field wherein it is useful to retrieve information from a plurality of disparate sources for analysis, and wherein the process of obtaining this information is unduly difficult or time consuming for a human operator.
An example may be composed of a network of such computing devices, such that components of the apparatus 300 are split across a plurality of computing devices. Optionally, the computing device also includes one or more input mechanisms such as keyboard and mouse or touch screen interface 996, and a display unit such as one or more monitors 995. The components are connectable to one another via a bus 992.
The memory 994 may include a computer readable medium, which term may refer to a single medium or multiple media (e.g., a centralized or distributed database and/or associated caches and servers) configured to carry computer-executable instructions or have data structures stored thereon. The memory 994 may be the same memory 9 as may be used for the storage of the event term database 91, or a separate memory. Computer-executable instructions may include, for example, instructions and data accessible by and causing a general purpose computer, special purpose computer, or special purpose processing device (e.g., one or more processors) to perform one or more functions or operations. Thus, the term “computer-readable storage medium” may also include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methods of the present disclosure. The term “computer-readable storage medium” may accordingly be taken to include, but not be limited to, solid-state memories, optical media and magnetic media. By way of example, and not limitation, such computer-readable media may include non-transitory computer-readable storage media, including Random Access Memory (RAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory devices (e.g., solid state memory devices). In particular, the computer readable medium may comprise a computer program which, when executed on a computer, causes the computer to perform a method for generating a multiple-event pattern query as discussed above.
The processor 993 is configured to control the computing device and execute processing operations, for example executing code stored in the memory to implement the various different functions of the inputter, retriever 303, pattern query formulation unit 305 and transmitter 307 described here and in the claims. The memory 994 stores data being read and written by the processor 993. As referred to herein, a processor may include one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. The processor may include a complex instruction set computing (CISC) microprocessor, reduced instruction set computing (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. The processor may also include one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. In one or more embodiments, a processor is configured to execute instructions for performing the operations and steps discussed herein.
The display unit 997 may display a representation of data stored by the computing device and may also display a cursor and dialog boxes and screens enabling interaction between a user and the programs and data stored on the computing device. The display unit may also comprise a touch screen interface. The input mechanisms 996 may enable a user to input data and instructions to the computing device, in particular, to input a natural language query to the inputter.
The network interface (network I/F) 997 may be connected to a network, such as the Internet, and is connectable to other such computing devices via the network. The network I/F 997 may control data input/output from/to other apparatus 300 via the network. The network interface may also be used in receiving sample event patterns, retrieving terms information, sending generated refined event patterns, selecting raw information and storing events.
Other peripheral devices such as microphone, speakers, printer, power supply unit, fan, case, scanner, trackerball etc. may be included in the computing device.
The inputter of
The retriever 303 of
The pattern query formulation unit 305 of
The transmitter 307 of
Exemplary methods may be carried out on one or more computing devices such as that illustrated in
Although a few embodiments have been shown and described, it would be appreciated by those skilled in the art that changes may be made in these embodiments without departing from the principles and spirit thereof, the scope of which is defined in the claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
17172321.6 | May 2017 | EP | regional |