High-throughput data flow processing is commonly implemented by representing data flow using a directed graph, in which nodes represent computation resources and edges represent data transmission paths among the nodes. In such cases, nodes can be decoupled from each other by using asynchronous data transmission. This decoupling allows each computation node to execute as efficiently as possible since it does not have to wait for downstream nodes to complete processing before it can begin processing the next message. In some cases, multiple computation nodes can be executed in parallel and together act as a single computation node, thus processing many units of work simultaneously.
A Staged Event Driven Architecture (SEDA) enhances this approach by inserting bounded queues between computation nodes. When a node A attempts to transfer work to another node B, if the queue between the nodes A and B is full, then A blocks until B has consumed some work from the queue. This blocking of A prevents A from consuming new work which in turn causes its input queue to get full, blocking any predecessors. One example of a process that utilizes such a technique is search engine document ingestion, in which multiple forms of documents (emails, PDFs, multimedia, blog postings, etc.) all need to be processed and indexed by a search engine for subsequent retrieval.
A scalable system that can process large amounts of data can be provided by using such asynchronous directed graph models. In some applications, documents may need to be processed in order. However, a system based on an asynchronous direct graph model generally cannot guarantee that documents are processed in order. One prior solution to this problem, described in U.S. Patent Publication 2010/0005147, is a system in which all messages are processed in order.
A highly parallel, asynchronous data flow processing system in which processing is represented by a directed graph, can include processing nodes that generate, and process, groups of dependent messages and process messages within such groups in order. Other messages can be processed in whatever order they are received by a processing node.
To identify a group of dependent messages, message identifiers are applied to a message. Processing of a message may generate child messages. A child message is assigned a message identifier that incorporates the associated message identifier of the parent message. The message identifier of the parent message is annotated to indicate the number of related child messages.
When a group of messages is processed by a processing node in order, the processing node maintains a buffer in which messages in the group are stored. When a message is received, its message identifier indicates whether it is in a group, its parent node, if any, and the number of child nodes it has if it is a parent node. From this information, it can be determined whether all messages within the group have been received. When all of the messages within the group have been received, the processing node can process the messages in order.
The processing and ingestion of documents into an indexed, searchable data store, involves numerous steps, some of which are executed in a particular order and others that are processed in parallel. As used herein, the term document may refer to unstructured data such as .pdf files, video, audio, email, or to structured data such as XML files, .csv files, or data received from database sources. In order to facilitate document processing and ingestion, documents are held within messages having unique message IDs.
Some documents (e.g., emails having multiple attachments) include yet other documents. Such documents introduce the possibilities of multiple processing threads that process the document, differing processing times for different parts of the document, and recursive processing of documents. If committing data from such a document to a database is dependent on the completion of processing of the document, such possibilities introduce significant complexities into the processing of documents.
Such a system is typically implemented using a platform (e.g., computer with an operating system and application processing framework), which is designed to process applications that are created by programmers that conform to a specification. For example, programmers create applications with interconnected processing nodes that form a highly parallel arbitrary directed graph.
In such applications, the application programmer will determine whether a particular processing node may generate, or will process, messages in a particular order with respect to each other. Most messages do not need to be processed in order. Processing nodes which do not require ordered message processing can run in parallel with other nodes without consideration of message ordering. However, some processing nodes generate multiple messages, or a group of messages, from a single message, and messages in this group of messages may need to be processed in order with respect to each other. Also, a processing node may receive one or more of the messages from a group of messages, and may need to process the messages it has received in order with respect to each other. Other messages not in the group can be processed in any order with respect to the group. The application programmer would determine whether a processing node will create groups of dependent messages, and whether a processing node will process messages from a group of dependent messages in order.
In general, message ordering is preserved in this system by grouping dependent messages together. As an example, groups of messages may be identified by using message identifiers that represent the dependency among messages as a tree. Characteristics of individual nodes and the messages being processed can be used to direct message flow such that ordering constraints are met and recursive processing is permitted without impacting overall system performance. For example, some processing nodes may not require ordered message processing, and some message types need not be processed in any particular order.
Any processing node can be configured to detect whether it is processing a message from a group of messages requiring ordered processing. For example, in
For example, assume at step 210 a message is received which has a message identifier of ‘1’ and is an email message containing multiple attachments. At 220, an email processing node processes the email message and generates child messages for each attachment. Each attachment is inserted into a message that incorporates the parent's message identifier. In this case, the parent identifier is ‘1’; therefore, if there are two attachments, the first attachment will be inserted into a message with a multi-part message identifier of ‘1.1’ and the second attachment inserted into a message with the identifier ‘1.2’ At 240, the message identifier of the parent message is annotated as having two child messages. Further assume that child document ‘1.1’ has two children. At 220, the child messages are generated, and at 230 the child messages are assigned IDs of ‘1.1.1’ and 1.1.2.’ Child document ‘1.1’ is then, at 240, annotated as having 2 children.
In some instances, a processing node breaks a message down into additional messages that are subjected to further analysis. For example, an email may have one or more attachments, each of which is subjected to different processing at different processing nodes. Furthermore, there may be instances in which, in order to maintain integrity of the document index, documents having more than one component should not be written to the index unless and until all of the components have been successfully processed. In other words, if the processing of certain attachments to an email fails, the text of the email (and other components or attachments that were successfully processed) should not be written to the index. Some implementations may allow for partial document indexing, whereas in others this constraint may be enforced without exception. In some cases, rules may be used to determine which “failures” are considered acceptable, and which are fatal. The message identifiers described above in connection with
In general, each processing node includes a message queue. Message queues are used to store multiple messages awaiting processing at a particular node. The message queue also can reorder messages within a group of dependent messages based on their message identifiers as they arrive at a processing queue out of order.
In instances in which the email includes attachments, the message containing the email may be annotated to indicate that there are child messages associated with the email so that the indexer, when processing the index queue 318, knows when all related messages have been received. When a new ‘child’ message is generated for downstream processing, the current message is marked as “having a child” and the new child message is assigned a message ID encoded to incorporate the parent ID. Such cross-references of messages allows a message to be held at subsequent processing nodes until all its children (or its parent and sibling messages) arrive at the same node for processing.
Messages requiring processing (e.g., message ID 0 at the email node 306) are processed and forwarded to the index queue 318 and released to the index when it is available.
Referring now to
The processing node generally includes a message processor 400 that processes a current message 418. This message processor 400 may generate results 402 as a result of processing the message. For example, this may include processing the metadata associated with a .zip file. The message processor 400 also may output one or more output messages 404, which are child messages of the current message being processed. For example, if this processing node is for processing .zip files, then a message to delete previous contents (the results of processing an earlier version of the zip file) of the zip file is generated. In addition, each document within the .zip file becomes an output message 404. The message processor 400 would then instruct an ID generator 406, through a trigger 408, to generate a message identifier 410 for the child message using the message identifier 412 of the current message. The ID generator 406 then updates the current message identifier 412 to indicate that it has an additional child message. A message tagger 414 tags the output message 404 with the child message identifier 410 and outputs the tagged message 416. If the message processor 400 outputs more than one child message, as each child message is output the message identifier 412 of the current message is updated to reflect the number of child messages generated. The current message 418 is then output, augmented with the results 402 and message identifier as modified by the ID generator 406.
The processing node generally includes a message processor 500 that processes a current message. This message processor 500 may output results 502 as a result of processing the message. A processing node that processes groups of dependent messages in order uses a sorted buffer or queue 506 to store messages from the group until all of the messages of the group are received, and can begin processing the messages in the group after the parent message is received. Until it begins processing messages from a group, it can process other messages from its input queue 504. While processing messages within a group, the message processor 500 also may process messages from the input queue 504.
The processing node processes each input message 508 to determine whether the message is in a group of dependent messages (as indicated by the message identifier), as indicated by module “IsGroup?” 510. If a message is not in a group, it is placed in the input queue 504 and can be processed at any time by the message processor 500. If the message is in a group, it is also determined whether the message is the parent node of the group, or a parent node within the group, as indicated by module “IsParent?” 512 (as indicated by the message identifier). Whether a message is in a group and/or is a parent node is indicated to a buffer manager 514. The buffer manager tracks the incoming messages in a group of dependent messages and places them in the buffer 506. The buffer manager also determines whether all of the messages in a group have been received (by examining the message identifiers), and provides a “group complete” indication to the message processor 500. The message processor 500 can then start processing the group's messages from the buffer in order. A flowchart describing how a processing node such as in
The processing node receives a current message and stores the current message identifier in step 600. The processing node begins 602 processing the current message. If a child message is generated by processing the current message, the child message is generated 604. A child message identifier is then generated 606. The current message identifier is then updated 608 to reflect that the current message has at least one child message. If this child message is the first child message, the current message identifier is changed to add an indicator of the number of child messages, which is set to 1. Otherwise the number of child messages is incremented. The child message is then tagged with its message identifier and output in step 610. If additional child messages are detected, as determined at 612, this process repeats steps 604 through 610. Both before and after step 612, the processing of the current message continues and ultimately completes (as indicated at 614).
A flowchart describing how a processing node such as in
In
The processing nodes and their components and modules described throughout the specification can be implemented in whole or in part as a combination of one or more computer programs operating on one or more processors using any suitable programming language or languages (C++, C#, java, Visual Basic, LISP, BASIC, PERL, etc.) and/or as a hardware device (e.g., ASIC, FPGA, processor, memory, storage and the like).
An implementation of the method and system of the present invention can be realized in a centralized fashion in one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system, or other apparatus adapted for carrying out the methods described herein, is suited to perform the functions described herein.
A typical combination of hardware and software could be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein. The present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which, when loaded in a computer system is able to carry out these methods.
Computer program or application in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following a) conversion to another language, code or notation; b) reproduction in a different material form. Significantly, this invention can be embodied in other specific forms without departing from the spirit or essential attributes thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting on the invention described herein.
Number | Name | Date | Kind |
---|---|---|---|
5588117 | Karp et al. | Dec 1996 | A |
6341302 | Celis | Jan 2002 | B1 |
6578159 | Kitagawa et al. | Jun 2003 | B1 |
7434225 | Groetzner et al. | Oct 2008 | B2 |
7600131 | Krishna et al. | Oct 2009 | B1 |
7761514 | Popescu et al. | Jul 2010 | B2 |
7836143 | Blocksome et al. | Nov 2010 | B2 |
7856415 | Gatti | Dec 2010 | B2 |
8064446 | Ramakrishnan et al. | Nov 2011 | B2 |
8081628 | Wu et al. | Dec 2011 | B2 |
8194690 | Steele et al. | Jun 2012 | B1 |
8271996 | Gould et al. | Sep 2012 | B1 |
8295203 | Ramakrishnan et al. | Oct 2012 | B2 |
8316443 | Rits et al. | Nov 2012 | B2 |
8495656 | Johnson, III et al. | Jul 2013 | B2 |
8649377 | Ramakrishnan et al. | Feb 2014 | B2 |
20020111986 | Wolfson | Aug 2002 | A1 |
20020128919 | Rime et al. | Sep 2002 | A1 |
20020194327 | DeGilio et al. | Dec 2002 | A1 |
20030110230 | Holdsworth et al. | Jun 2003 | A1 |
20030126294 | Thorsteinson et al. | Jul 2003 | A1 |
20030158883 | Drudis | Aug 2003 | A1 |
20030223466 | Noronha, Jr. et al. | Dec 2003 | A1 |
20040120301 | Kitchin | Jun 2004 | A1 |
20050038824 | Kenntner et al. | Feb 2005 | A1 |
20050138632 | Groetzner et al. | Jun 2005 | A1 |
20060015811 | Tanaka et al. | Jan 2006 | A1 |
20060269063 | Hauge et al. | Nov 2006 | A1 |
20070118601 | Pacheco | May 2007 | A1 |
20070124398 | Parkinson et al. | May 2007 | A1 |
20070143442 | Zhang et al. | Jun 2007 | A1 |
20080259960 | Favor et al. | Oct 2008 | A1 |
20080289039 | Rits et al. | Nov 2008 | A1 |
20090164548 | Hayer et al. | Jun 2009 | A1 |
20090208009 | Hauge et al. | Aug 2009 | A1 |
20100005147 | Johnson, III et al. | Jan 2010 | A1 |
20120096475 | Johnson, III et al. | Apr 2012 | A1 |
20130046442 | Hayama et al. | Feb 2013 | A1 |
Number | Date | Country |
---|---|---|
6-318158 | Nov 1994 | JP |
7-319787 | Dec 1995 | JP |
9-83541 | Mar 1997 | JP |
2000-163372 | Jun 2000 | JP |
2002-314566 | Oct 2002 | JP |
2002-314606 | Oct 2002 | JP |
2003-283539 | Oct 2003 | JP |
2010-524333 | Jul 2010 | JP |
2009154752 | Dec 2009 | WO |
2010093288 | Aug 2010 | WO |
2012051366 | Apr 2012 | WO |
2012051366 | Aug 2012 | WO |
Entry |
---|
Final Office Action received for U.S. Appl. No. 12/456,517, mailed on Sep. 13, 2012, 15 pages. |
Non Final office Action received for U.S. Appl. No. 12/456,517, mailed on Mar. 16, 2011, 14 pages. |
Non Final Office Action received for U.S. Appl. No. 12/456,517, mailed on Oct. 26, 2011, 14 pages. |
Non Final Office Action received for U.S. Appl. No. 12/905,211, mailed on Oct. 3, 2012, 8 pages. |
Notice of Allowance Received received for U.S. Appl. No. 12/905,211, mailed on Mar. 21, 2013, 9 pages. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2009/003626, mailed on Nov. 5, 2009, 10 pages. |
International Preliminary Report on Patentability received for PCT Patent Application No. PCT/US2011/056054, mailed on Apr. 25, 2013, 5 pages. |
International Search Report and Written Opinion received for PCT Patent Application No. PCT/US2011/056054, mailed on Jun. 18, 2012, 7 pages. |
Office Action received for Japanese Patent Application No. 2013-533990, mailed on Sep. 5, 2013, 2 pages of Office Action and 2 pages of English translation. |
Office Action Received for Japanese Patent Application No. 2011-514608, mailed on Oct. 8, 2013, 3 pages of Office Action and 4 pages of English Translation. |
Second Office Action Received for Japanese Patent Application No. 2011-514608, mailed on Jul. 8, 2014, with partial English Translation. |
Takashi Sonoda, “Evaluation of Link Aggregation by Duplication and Unification of Packets for TCP Traffic”, IEICE Technical Report, Japan, The Institute of Electronics, Information and Communication Engineers, Feb. 27, 2004, vol. 103, No. 692, pp. 171-174, with English Abstract. |
Notice of Allowance received in connection with U.S. Appl. No. 12/456,517, mailed on May 29, 2014. |
Number | Date | Country | |
---|---|---|---|
20140052798 A1 | Feb 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12905211 | Oct 2010 | US |
Child | 13943624 | US |