1. Technical Field
The present invention relates in general to data processing, and in particular, to efficient serialization of mutable program objects
2. Description of the Related Art
In a typical enterprise Java® environment, a server services many clients contemporaneously. For example, in a typical online banking environment, numerous customers may each utilize a client application executing on a client device to make transactions within one or more of the customer's accounts with a financial institution. Each transaction entails a remote call to the server to effect a desired action, such as a balance inquiry, electronic bill payment, transfer of funds, or withdrawal, which in turn requires data to be communicated between the client device and server.
Distributed computing in a Java enterprise edition (Java EE) environment such as the online banking environment discussed above makes use of either Remote Method Invocation—Java Remote Method Protocol (RMI-JRMP) or Remote Method Invocation—Internet Inter-Orb Protocol (RMI-IIOP), both of which leverage Java serialization. The term “serialization” is used in the art to describe the process of saving an object's state as a sequence of bytes. In Java, whenever a local process wants to send an object to a remote process running in a remote machine, the local process will serialize the object into a sequence of bytes (i.e., an Intermediate Format) and then send the sequence to the remote process using sockets. The remote process then deserializes the Intermediate Format to create an exact object graph on the remote machine.
In the prior art, serialization is a resource and time consuming process that includes the following principal steps:
In current implementations, the Java serialization process implements stream-based caching of the binary data. However, the stream cache is cleared each time the stream is closed or reset because there is presently no way of knowing whether or not mutable objects have changed. Consequently, the serialization process is carried out for an object even if the object is unchanged since it was last communicated between the client and server.
In some embodiments, a method of serialization in a data processing system includes serializing a mutable object utilizing a full serialization process and caching primitive data and metadata regarding the mutable object in binary format in cache. Thereafter, the mutable object is again serialized utilizing an abbreviated serialization process by reference to the cached primitive data and metadata, and the serialized mutable object is communicated to a distributed code element.
With reference now to the figures and with particular reference to
As indicated, client devices 102a-102c, each of which includes a processor, data storage, and other possibly conventional hardware, executes software including a client operating system 105 and a client application 106, such as a web browser. As shown, client application 106 may include an Enterprise Java Bean (EJB) object 107, such as a browser plug-in that facilitates communication with data processing enterprise 110, as discussed further below.
The communication between client devices 102a-102c and data processing system 110 can include data communication, for example, via instant messaging, Simple Mail Transport Protocol (SMTP), Hypertext Transfer Protocol (HTTP) and/or other known or future developed protocols. In a typical use scenario, the communication between data processing enterprise 110 and client devices 102a-102c includes the transmission of requests from client devices 102a-102c to data processing enterprise 110 and the transmission of responsive data (e.g., in the form of program objects, markup language (e.g., HTML or XML) pages, images, graphics, text, audio, video, and/or files containing such data) from data processing enterprise 110 to client devices 102a-102c.
Still referring to
In the depicted embodiment, data storage 110 stores program code executable by processor(s) 120. The program code includes a server operating system (OS) 130 that manages the hardware resources of server 112a and provides common services to other software executing on server 112a. Server OS 112 may be implemented, for example, with one of the AIX®, Linux®, Android®, or Windows® operating systems. Data storage 110 also stores middleware 132, such as the IBM WebSphere® Application Server (WAS) available from IBM Corporation of Armonk, N.Y. Middleware 132 provides a platform for the development, delivery and communication of distributed applications 136. In a preferred embodiment, middleware 134 is compliant with the Java Platform, Enterprise Edition (JEE) Specification Edition 6, incorporated herein by reference, and serves as a Java EE container providing services such as transaction management, persistence, security, connection pooling, and naming services. Middleware 134 preferably contains at least one EJB component 140, which exposes services that clients can invoke. The invocation of a service will result in data exchange between the client and the server. The communication mechanism used to exchange the data relies on a serialization engine, which is part of the Java runtime shipped as a component of the middleware 134.
In most embodiments, middleware 134 includes program code (e.g., an HTTP server) to support communication of server 112a with other servers 112 and devices client 102a-102c via communication fabric 114 and communication network(s) 104. Should appropriate communication capabilities not be integrated within middleware 134, data storage 110 may additionally include communication code integrated within server OS 132 or implemented as an application 136 that enables server 112a to communicate with other servers 112 and client devices 102a-102c via communication fabric 114 and communication network(s) 104.
It should be appreciated that the contents of data storage 110 can be localized on server 112a in some embodiments and will be distributed across the data storage 110 of multiple of servers 112a-112n in other embodiments. In addition, the contents depicted in data storage 110 of server 112a (and other associated databases) may optionally partially or fully reside on a storage area network (SAN) 160 of data processing enterprise 110. As shown, SAN 160 includes a switch/controller (SW/C) 162 that receives and services storage requests and multiple data storage nodes 170a-170k, each of which may comprise one or more physical non-volatile memory drives, hard disk drives, optical storage drives, tape drives, etc.
It will be appreciated upon review of the foregoing description that the form in which data processing enterprise 110 is realized can vary between embodiments. All such implementations, which may include, for example, one or more handheld, notebook, desktop, or server computer systems, are contemplated as embodiments of the inventions set forth in the appended claims.
As discussed above, in data processing environments such as that depicted in
In one preferred embodiment, serialization engine 200 traverses through the object graph of an object to be serialized and stores each object's serialized primitive data as a separate entry. For example, assuming that an object A to be serialized has a reference to object B and C as shown in
Consider now the following Java program, which serializes an object “st” of type “SerialTest” which is derived from “Parent” class and has a container object “con” as one of its fields:
Referring now to
By constructing MetaDataCache 202 and PrimitiveDataCache 204 during the initial serialization of the “st” object, serialization engine 200 eliminates the costliest reflection operations involved in retrieving primitive data as part of serialization process from subsequent serialization operations. Thus, when the same object “st” is again requested for serialization, serialization engine 200 traverses through the caches and constructs the serialized format of the object without actually reading the object through reflection. Consequently, the time required to perform the serialization operation is reduced by the time required to retrieve primitive data through reflection.
Moreover, by building a primitive data cache at a finer object level than employed in the prior art, when any one of the objects in the earlier serialized object graph is requested for serialization, serialization engine 200 can retrieve the data from caches 202 and 204 instead of serializing the object again. For example, if the “con” object of the “Contain” class is requested for serialization (alone or as referenced through some other object) for the first time, serialization engine 200 doesn't need to carry out the serialization process because the serialized format of “con” is already present in the cache.
Of course, information for a mutable object can only be utilized in the serialization process if the mutable object has not changed since it was cached. In order to differentiate mutable objects that have changed since being cached from those that have not, mutable objects preferably contain a serialization change (SC) field (e.g., a single bit) that indicates whether or not the mutable object has been modified after it has been serialized by the serialization process.
With reference now to
In accordance with one embodiment, object header 510 includes an SC field 512 (optionally implemented in a conventionally unused header field to avoid increasing object size) indicating whether or not the object has been modified after it has been serialized by the serialization process. For example, in one implementation, SC field is reset to a bit value of 0b0 to indicate the object has not changed since the last serialization and is set to a bit value of 0b1 to indicate that the object has changed since the last serialization.
Referring now to
The process begins at block 600 and then proceeds to block 602, which illustrates the JVM of a Java application instantiating a mutable object (e.g., object 502c) in heap 500. As indicated at block 604, if the mutable object is modified, the process passes to block 606 and otherwise passes to block 610, which is described below.
As depicted at blocks 606-608, write barrier code of the object sets SC field 512 (e.g., to 0b1) if any primitive field of the object is mutated, but does not set SC field 512 if a reference field of the object is mutated. As indicated above, when set, SC field 512 indicates the object has changed since the last serialization and must be reserialized through the conventional serialization process rather than by reference to PrimitiveDataCache 204.
Following block 608, the process proceeds to block 610, which illustrates that the process given in
With reference now to
The process illustrated in
First, at block 704, serialization engine 200 gets the class object of object “obj” via the method obj.getClass( ). In addition, serialization engine 200 queries MetaDataCache 202 to get the description of the class returned by the method obj.getClass( ). Serialization engine 200 then writes the definition of the class object and that of any serializable super class of “obj” into the output stream (block 706). The process depicted at block 706 is described further below with reference to
Through reflection, serialization engine 200 gets the container object(s) of “obj”. In one embodiment, serialization engine 200 identifies the container objects of “obj” by calling the method getDeclaredClasses( ) on this class object and using reflection to get the actual objects, as depicted at block 710-712, respectively. If serialization engine 200 determines at block 714 that there is no container object of the object under consideration, the serialization process proceeds to block 718, which is described below. Otherwise, the process proceeds to block 716, which depicts repetition of previously described blocks 702-714 for the container object.
As depicted in block 718, the serialization process increments the loop variable “i” and proceeds to block 720. At block 720, serialization engine 200 checks whether or not it has completed serializing all the given objects. If serialization engine 200 that at least one object remains to be serialized, the process returns to block 704, which has been described. Otherwise, the process proceeds from block 720 to block 722, which depicts the end of serialization of current object.
Referring now to
Referring now to block 804, in response to serialization engine 200 determining that MetaDataCache 202 contains the serialized description of the class of “obj”, serialization engine 200 accesses the class description in MetaDataCache 202 and writes it directly to the output stream. At indicated at block 820-822, serialization engine 200 recursively writes a serialized description of each super class object of the class of “obj”, if any, until a top level serializable class is reached. In one preferred embodiment, serialization engine 200 determines the super class, if any, of a class by calling the class.getSuperclass( )method. Once the class of “obj” and any serializable super class have been processed, serialization engine 200 terminates the process depicted in
With reference now to
Block 904 illustrates serialization engine 200 determining whether or not SC field 512 of the object “obj” is reset (e.g., to 0b0) to indicate that “obj” has not been modified since its was last serialized. If not (i.e., SC field 512 of “obj” is set to 0b1), the process passes to block 910, which is described below. If, however, serialization engine 200 determines at block 904 that SC field 512 is reset, serialization engine 200 reads the serially formatted data of the object “obj” from PrimitiveDataCache 204 and appends it to the output stream. Thereafter, the process shown in
Returning now to block 910, if an up-to-date serialization of the object “obj” is not present in PrimitiveDataCache 204, serialization engine 200 retrieves and serializes the data of object “obj” and writes it into the output stream. In addition, serialization engine 200 resets SC field 512 of “obj” (e.g., to 0b0) and updates PrimitiveDataCache 204 with the serialized data of object “obj (blocks 912 and 914). The process depicted in
As has been described, in at least some embodiments, a mutable object is serialized utilizing a full serialization process, and primitive data and metadata regarding the mutable object are cached in binary format. Thereafter, the mutable object is again serialized utilizing an abbreviated serialization process by reference to the cached primitive data and metadata, and the serialized mutable object is communicated to a distributed code element. In a preferred embodiment, the mutable object has a object header including a field indicating whether or not the mutable object has changed since last serialized, and the abbreviated serialization process is employed in response to the field indicating that the mutable object has not changed since last serialized. In a preferred embodiment, object primitive and metadata is cached at the most granular level, which ensures that, if a mutable object's primitive data is changed, the cached data pertaining to other mutable objects in its object graph are unaffected.
While the present invention has been particularly shown as described with reference to one or more preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention. For example, although aspects have been described with respect to a computer system executing program code that directs the functions of the present invention, it should be understood that present invention may alternatively be implemented as a program product including a tangible, non-transient data storage medium (e.g., an optical or magnetic disk or memory) storing program code that can be processed by a data processing system to perform the functions of the present invention. Further, although preferred embodiments are described herein with reference to serialized objects created in, and used by, Java software products, the disclosed techniques may be adapted for use with other programming languages; thus, references to Java serialization are by way of illustration and not of limitation.
This application is a continuation of U.S. patent application Ser. No. 12/960,891 entitled “EFFICIENT SERIALIZATION OF MUTABLE OBJECTS” by Aruna A. Kalagananam et al. filed Dec. 6, 2010, the disclosure of which is hereby incorporated herein by reference in its entirety for all purposes.
Number | Date | Country | |
---|---|---|---|
Parent | 12960891 | Dec 2010 | US |
Child | 13595508 | US |