System and method for storing large messages

Information

  • Patent Grant
  • 7840532
  • Patent Number
    7,840,532
  • Date Filed
    Wednesday, April 25, 2007
    18 years ago
  • Date Issued
    Tuesday, November 23, 2010
    15 years ago
Abstract
A large message can be stored by separating the message into an envelope portion containing information such as headers, protocols, and addresses, and a payload portion containing items such as file attachments. The envelope portion can be stored in local storage, while the payload can be stored to a persistent store. The message can be processed incrementally, such that the entire message is never in system memory. Once the envelope portion is processed, the payload portion can be read in increments without being processed, and those increments written directly to the persistent store. Alternatively, the payload can be streamed to the persistent store. A pointer in the envelope can then be used to locate and retrieve attachments from persistent storage.
Description
COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document of the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.


CROSS-REFERENCED CASES

The following applications are cross-referenced and incorporated herein by reference:


U.S. patent application Ser. No. 10/404,552 filed Apr. 1, 2003, to Mike Blevins et al. and entitled, “COLLABORATIVE BUSINESS PLUG-IN FRAMEWORK”;


U.S. patent application Ser. No. 10/404,684 filed Apr. 1, 2003, to Mike Blevins et al. and entitled, “SYSTEMS AND METHODS FOR BUSINESS PROCESS PLUG-IN DEVELOPMENT”; and


U.S. patent application Ser. No. 10/404,666 filed Apr. 1, 2003, to David Wiser et al. and entitled “Single Servlets for B2B Message Routing.”


FIELD OF THE INVENTION

The present invention relates to the storage of large messages in a computer system or on a computer network.


BACKGROUND

Existing integration and messaging systems have problems handling large messages. Incoming messages are read into memory in their entirety, such that when a number of large messages are received a system can crash due to a lack of available memory. Some systems try to prevent these problems by limiting the size of messages that can be processed through a system, but this approach is undesirable to users needing to send messages that may occasionally exceed that limitation.


Another existing approach utilizes in-database persistence and in-memory caching on a hub. Persistence saves enough data for recovery purposes, and caching allows messages to be serialized to a Java Message Service (JMS). This allows JMS to enqueue faster, and allows a JMS dequeue to request the message from a cache without having to redo expensive operations like deserialization, decryption, and XML parsing. The problem still exists in that it is necessary to read an entire message into memory in order to process the message.


BRIEF SUMMARY

Systems and methods in accordance with embodiments of the present invention can overcome deficiencies in existing messaging systems by changing the way in which messages are processed and stored. An integration component can receive an incoming message, such as from a Web server. The integration component can separate the message into an “envelope” portion, which can contain information such as headers, protocols, and addresses, and a “payload” portion, which can contain items such as file attachments. The integration component can write the envelope portion to local memory, and can write the payload portion to at least one persistent store. A pointer can be placed in the envelope to identify the location of the payload in the persistent store. Applications can then use the envelope to locate the payload in a persistent store.


An integration component can also process a message incrementally. The integration component can process portions of the message until the payload portion is reached. The integration component can then stop processing the message, but can continue to read the message in increments and write those increments to a persistent store. Parsers such as MIME parsers and XML parsers can be used by the integration component to process the message. Alternatively, the integration component can process the message as a stream, or at least write the payload portion to the persistent store as a stream.


Other features, aspects, and objects of the invention can be obtained from a review of the specification, the figures, and the claims.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a diagram of a system in accordance with one embodiment of the present invention.



FIG. 2 is a diagram of a message that can be processed using the system of FIG. 1.





DETAILED DESCRIPTION

In systems and methods in accordance with embodiments of the present invention, “large” messages, such as large business messages in XML format, an be processed in a Web server or integration application. These business messages can be Java Message Service (JMS) messages, for example, which can utilize distributed destinations in a cluster. A large business message can be any message that may have an attachment or a large amount of text, for example, which can have an overall message size of at or above 1 MB, at or above 10 MB, at or above 50 MB, or even at or above 100 MB. For example, company A can send a message to company B that has a file size of 100 MB. The integration system receiving that message will have to process and resend the entire message. In existing systems, it is necessary to read the entire message into memory before writing the message to disk. The read and write are each done in one complete step. Present systems also have to parse the entire message.


As shown in FIG. 1, when a message from a company A 100 is first received to an integration component 104 from a Web server 102, the message can be read into local memory 106. Local memory can be any appropriate storage medium, such as may be located on the Web server itself, in a cluster containing the Web server, or on a network node accessible to the Web server. If several large messages are received by the Web server 102, the server may eventually run out of memory. In a system in accordance with one embodiment of the present invention, portions of the body of each message can be stored in persistent storage 108 instead of being completely stored in local memory 106. There are at least two types of persistent storage, including file-based persistence stores and data-based persistence stores.


Continuing with the example, company B 110 can be working with an integration application. When the message arrives at the Web server 102 for company B 110, the message can arrive on a socket on the network. Portions of the message can be stored somewhat directly to the persistent message store 108 instead of being read entirely into local memory 106. One way to do this is to read the message in increments, or small portions, and write those small portions to storage. For example, the 100 MB message could have a 4 MB portion read into local memory 106, then have that 4 MB portion written to persistent storage 108. Then another 4 MB portion could be read into local memory and written to persistent storage. This process could continue until the entire body portion of the message is in persistent storage 108. Although portions of the entire message may be in memory at one point or another, there would only be up to 4 MB of the message in local memory at any given time. The user can configure the persistent store 108 so that the message is sent to a file or to a database, for example. The portion size can be any size appropriate for the size of the message or the capacity of the system, such as portions of 1 MB, 5 MB, or 10 MB. The portion size can also be a percentage of the overall file size, such as 1%, 5%, 10%, or 25%, for example.


When a message is processed using an integration application or integration server, for example, the message can use a storage method referred to herein as “envelope plus payload.” The message can be processed in the server to separate the contents to be placed in the “envelope” from contents to be placed in the “payload.” This is shown, for example, in the diagram of FIG. 2. Headers 202, 204 of a message 200 can be extracted by an integration server, as the headers may be all the server requires to process the message 200. A header can identify the protocol under which the message is sent, such as an XOCP protocol. The protocol can be used to help identify the headers 202, 204 and the body 206 of the message. It can be important in certain systems to identify the message protocol, as protocols such as RosettaNet and ebXML have different packaging semantics than a protocol such as XOCP. The headers can be placed in the envelope 214, which can be stored in local memory. The body 206 of the message 200, which can contain several attachments 208, 210, 212, for example, can be placed into the payload 218. The payload can be stored in persistent storage on the server, in the cluster, or on the network. The envelope 214 can contain a pointer 216 to the location of the payload 218.


Since a message can contain a body with multiple parts, the payload can be designed to contain multiple parts as well. While processing a message in the server, however, only the envelope may be needed. The payload can belong to the user of a B2B server, for example, or an application riding on top of an integration server. The payload can be stored to persistent storage, so that the full payload is never stored in memory. A server or any application can simply deal with the envelope, which can contain pointers to the payload. When an application wants to access any portion of the message, the application can view information contained in the envelope, which can include identification information for the payload parts.


An application can use any pointers in an envelope to extract portions of the body of the message stored in the payload. As the application can retrieve the data from this persistence store, it is not necessary to accumulate everything in local memory on the integration server. A message envelope can contain a pointer to the body of the message, whether there is a single message body or a number of portions, or can contain a pointer for each portion of the body in persistent storage. The number of portions can include a number of attachments, for example. It is not necessary for the integration system to process the attachments to a message, so the system can simply write the attachments to storage, either all together in one block of memory or individually. The pointer can point to the location at which a portion of the message body begins in memory, or can point to the boundaries of a given body portion in memory, for example.


An envelope can contain other useful information about a message, such as the address of the sender and/or the address of the recipient. Each of these addresses can each be a URL, for example. The envelope can also contain the protocol of the message and possibly the protocol of any body portion, if applicable. The envelope can contain message text. The envelope can also contain information about each attachment in the body, such as title, file type, and historical information.


At least two levels of parsing can be used to process a message. A low-level parsing mechanism can be used to decode transfer protocols such as MIME or UUENCODE. The low-level parser can receive the byte stream and identify the parts of the message, such as a text portion and a binary attachment. A second level of parsing, such as XML parsing, can be used to read headers and body portions, which can be in XML or another appropriate messaging or mark-up language.


A message can arrive from the Internet, for example, and can pass through the Web server into an integration transport layer. First, the message can pass through a MIME parser. Second, the message can be decoded using a second processing layer to determine the appropriate business protocol. The envelope can be created in this transport layer. In the decoding process, which can use the XML parser, the envelope can be filled with headers and other appropriate information. After the headers, a pointer can be placed in the envelope and the MIME parser can stop parsing the message. The MIME parser can know to stop parsing when it hits attachments, for example.


The remainder of the message, which can include at least a portion of the body and any attachments, can then be written directly to persistent storage, either in small data “dumps” or on a data stream. Once the entire message is processed, such that the envelope and payload are created, an application can determine where the message portions reside using pointers in the message envelope. The envelope can be thought of as an “abstract” of the message. Once a user or application gets this abstract, that user or application can extract any portion of the message that is needed. For instance, if there are three attachments, the user or application can choose to extract one or two of the attachments from the persistent storage. When the user deletes the message, the envelope can be used, such as by an integration server or B2B server, to delete the associated portions in the persistent storage.


The foregoing description of preferred embodiments of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to one of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalence.

Claims
  • 1. A system for providing access to a message in a computer environment, comprising: a computer, that includes a memory for in-memory storage of message information;a persistent store for persistent storage of message data; andan integration component that receives a message from a sending application, and then creates both an envelope, and a separate payload, for the message, including reading the message in incremental portions smaller than the total message size, and successively writing those portions to the persistent storage as the payload,stores the payload in the persistent store as a plurality of selectively retrievable portions,creates within the envelope a header information, and pointers to different portions of the payload,stores the envelope in the memory, while the separate payload is stored at the persistent store,provides in-memory access to the envelope by one or more receiving applications, andenables the receiving applications to access the envelope in memory, and to use the pointers in the message envelope to select and retrieve selected portions of the message payload from the persistent store for use by those receiving applications.
  • 2. The system of claim 1, wherein the step of creating a separate envelope and payload includes parsing the message with a protocol parser to identify portions of the message to the integration component.
  • 3. The system of claim 1, wherein the step of creating a separate envelope and payload includes parsing the message with an XML parser to read header and body portions of the message, and to separate the head and body portions into the envelope and payload respectively.
  • 4. The system of claim 1, wherein the step of creating a separate envelope and payload includes processing the message in increments.
  • 5. The system of claim 1 wherein the persistent store is located at the computer that receives the message.
  • 6. The system of claim 1 wherein the persistent store is located at a computer different from the one that receives the message, and wherein the persistent store is shared among a plurality of computers for storage of and access to message payloads for the plurality of computers.
  • 7. A method for providing access to a message in a computer environment, comprising the steps of: providing a computer that includes a memory for in-memory storage of message information,a persistent store for persistent storage of message data, andan integration component;receiving a message from a sending application, at the integration component at the computer;creating both an envelope, and a separate payload, for the message, including reading the message in incremental portions smaller than the total message size, and successively writing those portions to the persistent storage as the payload;storing the payload in the persistent store as a plurality of selectively retrievable portions;creating within the envelope a header information, and pointers to different portions of the payload;storing the envelope in the computer's memory, while the separate payload is stored at the persistent store; andproviding in-memory access to the envelope by one or more receiving applications to access the envelope in memory, and to use the pointers in the message envelope to select and retrieve selected portions of the message payload, from the persistent store, for use by those receiving applications.
  • 8. The method of claim 7, wherein the step of creating a separate envelope and payload includes parsing the message with a protocol parser to identify portions of the message to the integration component.
  • 9. The method of claim 7, wherein the step of creating a separate envelope and payload includes parsing the message with an XML parser to read header and body portions of the message, and to separate the head and body portions into the envelope and payload respectively.
  • 10. The method of claim 7, wherein the step of creating a separate envelope and payload includes processing the message in increments.
  • 11. The method of claim 7 wherein the persistent store is located at the computer that receives the message.
  • 12. The method of claim 7 wherein the persistent store is located at a computer different from the one that receives the message, and wherein the persistent store is shared among a plurality of computers for storage of and access to message payloads for the plurality of computers.
  • 13. A computer readable medium including instructions stored thereon which when executed cause the computer to perform the steps of: receiving at a computer, including a memory for in-memory storage of message information, and a persistent store for persistent storage of message data, a message, from a sending application, at an integration component at the computer;creating both an envelope and a separate payload for the message, including reading the message in incremental portions smaller than the total message size, and successively writing those portions to the persistent storage as the payload;storing the payload in the persistent store as a plurality of selectively retrievable portions;creating within the envelope a header information, and pointers to different portions of the payload;storing the envelope in the computer's memory, while the separate Payload is stored at the persistent store; andproviding in-memory access to the envelope, by one or more receiving applications to access the envelope in memory, and to use the pointers in the message envelope to selectively retrieve selected portions of the message payload from the persistent store, for use by those receiving applications.
  • 14. The computer readable medium of claim 13, wherein the step of creating a separate envelope and payload includes parsing the message with a protocol parser to identify portions of the message to the integration component.
  • 15. The computer readable medium of claim 13, wherein the step of creating a separate envelope and payload includes parsing the message with an XML parser to read header and body portions of the message, and to separate the head and body portions into the envelope and payload respectively.
  • 16. The computer readable medium of claim 13, wherein the step of creating a separate envelope and payload includes processing the message in increments.
  • 17. The computer readable medium of claim 13 wherein the persistent store is located at the computer that receives the message.
  • 18. The computer readable medium of claim 13 wherein the persistent store is located at a computer different from the one that receives the message, and wherein the persistent store is shared among a plurality of computers for storage of and access to message payloads for the plurality of computers.
CLAIM OF PRIORITY

This application is a continuation of U.S. patent application Ser. No. 10/404,865, filed Apr. 1, 2003, entitled “System and Method for Storing Large Messages” which claims priority to U.S. Provisional Patent Application No. 60/376,773, filed May 1, 2002, entitled “System and Method for Storing Large Messages”, which is hereby incorporated herein by reference.

US Referenced Citations (180)
Number Name Date Kind
5469562 Saether Nov 1995 A
5604860 McLaughlin et al. Feb 1997 A
5630131 Palevich et al. May 1997 A
5748975 Van De Vanter May 1998 A
5801958 Dangelo et al. Sep 1998 A
5835769 Jervis et al. Nov 1998 A
5836014 Faiman Nov 1998 A
5862327 Kwang et al. Jan 1999 A
5867822 Sankar Feb 1999 A
5933838 Lomet Aug 1999 A
5944794 Okamoto et al. Aug 1999 A
5950010 Hesse et al. Sep 1999 A
5966535 Benedikt et al. Oct 1999 A
6012083 Savitzky et al. Jan 2000 A
6016495 McKeehan et al. Jan 2000 A
6018730 Nichols et al. Jan 2000 A
6023578 Birsan et al. Feb 2000 A
6023722 Colyer Feb 2000 A
6028997 Leymann et al. Feb 2000 A
6029000 Woolsey et al. Feb 2000 A
6044217 Brealey et al. Mar 2000 A
6067548 Cheng May 2000 A
6067623 Blakely et al. May 2000 A
6070184 Blount et al. May 2000 A
6085030 Whitehead et al. Jul 2000 A
6092102 Wagner Jul 2000 A
6119149 Notani Sep 2000 A
6141686 Jackowski et al. Oct 2000 A
6141701 Whitney Oct 2000 A
6189044 Thompson Feb 2001 B1
6212546 Starkovich et al. Apr 2001 B1
6222533 Notani et al. Apr 2001 B1
6226675 Meltzer et al. May 2001 B1
6230287 Pinard et al. May 2001 B1
6230309 Turner et al. May 2001 B1
6237135 Timbol May 2001 B1
6243737 Flanagan et al. Jun 2001 B1
6282711 Halpern et al. Aug 2001 B1
6292932 Baisley et al. Sep 2001 B1
6311327 O'Brien et al. Oct 2001 B1
6324681 Sebesta et al. Nov 2001 B1
6330569 Baisley et al. Dec 2001 B1
6334114 Jacobs et al. Dec 2001 B1
6336122 Lee et al. Jan 2002 B1
6343265 Glebov et al. Jan 2002 B1
6345283 Anderson Feb 2002 B1
6348970 Marx Feb 2002 B1
6349408 Smith Feb 2002 B1
6353923 Bogle et al. Mar 2002 B1
6360221 Gough Mar 2002 B1
6360358 Elsbree et al. Mar 2002 B1
6377939 Young Apr 2002 B1
6393605 Loomans May 2002 B1
6408311 Baisley et al. Jun 2002 B1
6411698 Bauer et al. Jun 2002 B1
6445711 Scheel et al. Sep 2002 B1
6463503 Jones et al. Oct 2002 B1
6470364 Prinzing Oct 2002 B1
6516322 Meredith Feb 2003 B1
6549949 Bowman-Amuah Apr 2003 B1
6560769 Moore et al. May 2003 B1
6567738 Gopp et al. May 2003 B2
6584454 Hummel et al. Jun 2003 B1
6594693 Borwankar Jul 2003 B1
6594700 Graham et al. Jul 2003 B1
6601113 Koistinen et al. Jul 2003 B1
6609115 Mehring et al. Aug 2003 B1
6615258 Barry et al. Sep 2003 B1
6636491 Kari et al. Oct 2003 B1
6637020 Hammond Oct 2003 B1
6643652 Helgeson et al. Nov 2003 B2
6654932 Bahrs et al. Nov 2003 B1
6678518 Eerola Jan 2004 B2
6684388 Gupta et al. Jan 2004 B1
6687702 Vaitheeswaran et al. Feb 2004 B2
6687848 Najmi Feb 2004 B1
6721779 Maffeis Apr 2004 B1
6732237 Jacobs et al. May 2004 B1
6748420 Quatrano et al. Jun 2004 B1
6754884 Lucas et al. Jun 2004 B1
6757689 Battas et al. Jun 2004 B2
6789054 Makhlouf Sep 2004 B1
6799718 Chan et al. Oct 2004 B2
6804686 Stone et al. Oct 2004 B1
6823495 Vedula et al. Nov 2004 B1
6832238 Sharma et al. Dec 2004 B1
6836883 Abrams et al. Dec 2004 B1
6847981 Song et al. Jan 2005 B2
6850979 Saulpaugh et al. Feb 2005 B1
6874143 Murray et al. Mar 2005 B1
6889244 Gaither et al. May 2005 B1
6915519 Williamson et al. Jul 2005 B2
6918084 Slaughter et al. Jul 2005 B1
6950872 Todd, II Sep 2005 B2
6959307 Apte Oct 2005 B2
6963914 Breitbart et al. Nov 2005 B1
6971096 Ankireddipally et al. Nov 2005 B1
6976086 Sadeghi et al. Dec 2005 B2
7000219 Barrett et al. Feb 2006 B2
7017146 Dellarocas et al. Mar 2006 B2
7043722 Bau, III May 2006 B2
7051072 Stewart et al. May 2006 B2
7051316 Charisius et al. May 2006 B2
7054858 Sutherland May 2006 B2
7062718 Kodosky et al. Jun 2006 B2
7069507 Alcazar et al. Jun 2006 B1
7072934 Helgeson et al. Jul 2006 B2
7076772 Zatloukal Jul 2006 B2
7096422 Rothschiller et al. Aug 2006 B2
7107578 Alpern Sep 2006 B1
7111243 Ballard et al. Sep 2006 B1
7117504 Smith et al. Oct 2006 B2
7127704 Van De Vanter et al. Oct 2006 B2
7143186 Stewart et al. Nov 2006 B2
7184967 Mital et al. Feb 2007 B1
7240331 Vion-Dury et al. Jul 2007 B2
20010016880 Cai et al. Aug 2001 A1
20010032263 Gopal et al. Oct 2001 A1
20020004848 Sudarshan et al. Jan 2002 A1
20020010781 Tuatini Jan 2002 A1
20020010803 Oberstein et al. Jan 2002 A1
20020016759 Marcready et al. Feb 2002 A1
20020035604 Cohen et al. Mar 2002 A1
20020073080 Lipkin Jun 2002 A1
20020073236 Helgeson et al. Jun 2002 A1
20020073396 Crupi et al. Jun 2002 A1
20020083075 Brummel et al. Jun 2002 A1
20020111922 Young et al. Aug 2002 A1
20020120685 Srivastava et al. Aug 2002 A1
20020143960 Goren et al. Oct 2002 A1
20020152106 Stoxen et al. Oct 2002 A1
20020161826 Arteaga et al. Oct 2002 A1
20020165936 Alston et al. Nov 2002 A1
20020174178 Stawikowski Nov 2002 A1
20020174241 Beged-Dov et al. Nov 2002 A1
20020184145 Sijacic et al. Dec 2002 A1
20020184610 Chong et al. Dec 2002 A1
20020194244 Raventos Dec 2002 A1
20020194267 Flesner et al. Dec 2002 A1
20020194495 Gladstone et al. Dec 2002 A1
20030004746 Kheirolomoom et al. Jan 2003 A1
20030005181 Bau et al. Jan 2003 A1
20030014439 Boughannam Jan 2003 A1
20030018661 Darugar Jan 2003 A1
20030018665 Dovin et al. Jan 2003 A1
20030018832 Amirisetty et al. Jan 2003 A1
20030028579 Kulkarni et al. Feb 2003 A1
20030043191 Tinsley et al. Mar 2003 A1
20030046266 Mullins et al. Mar 2003 A1
20030046591 Asghari-Kamrani et al. Mar 2003 A1
20030051066 Pace et al. Mar 2003 A1
20030055868 Fletcher et al. Mar 2003 A1
20030055878 Fletcher et al. Mar 2003 A1
20030074217 Beisiegel et al. Apr 2003 A1
20030074467 Oblak et al. Apr 2003 A1
20030079029 Garimella et al. Apr 2003 A1
20030084203 Yoshida et al. May 2003 A1
20030110117 Saidenberg et al. Jun 2003 A1
20030110446 Nemer Jun 2003 A1
20030126136 Omoigui Jul 2003 A1
20030149791 Kane et al. Aug 2003 A1
20030167358 Marvin et al. Sep 2003 A1
20030196168 Hu Oct 2003 A1
20030204559 Nguyen Oct 2003 A1
20030233631 Curry et al. Dec 2003 A1
20040019645 Goodman et al. Jan 2004 A1
20040040011 Bosworth et al. Feb 2004 A1
20040068568 Griffin et al. Apr 2004 A1
20040078373 Ghoneimy et al. Apr 2004 A1
20040133660 Junghuber et al. Jul 2004 A1
20040148336 Hubbard et al. Jul 2004 A1
20040204976 Oyama et al. Oct 2004 A1
20040216086 Bau Oct 2004 A1
20040225995 Marvin et al. Nov 2004 A1
20040260715 Mongeon et al. Dec 2004 A1
20050050068 Vaschillo et al. Mar 2005 A1
20050278585 Spencer Dec 2005 A1
20060206856 Breeden et al. Sep 2006 A1
20060234678 Juitt et al. Oct 2006 A1
20070038500 Hammitt et al. Feb 2007 A1
Foreign Referenced Citations (2)
Number Date Country
WO9923558 May 1999 WO
0190884 Nov 2001 WO
Related Publications (1)
Number Date Country
20070198467 A1 Aug 2007 US
Provisional Applications (1)
Number Date Country
60376773 May 2002 US
Continuations (1)
Number Date Country
Parent 10404865 Apr 2003 US
Child 11740192 US