This invention relates the preservation of objects for later recovery and use.
Storing an object for later use by an application is called “object persistence.” In addition, encoding an object for transmission over a distributed network is called object persistence. Object persistence is also known as “serializing an object.” An “object” is the core concept of an “object-oriented paradigm.
Object-Oriented Paradigm
A large segment of the computing realm operates under the object-oriented paradigm. This is sometime called “object technology” or “object-oriented programming.” In general, an object is understood to encapsulate data and procedures (i.e., methods).
Object-oriented programming is a type of programming in which programmers define not only the data type of a data structure, but also the types of operations (i.e., procedures, functions, or methods) that can be applied to the data structure. In this way, the data structure becomes an object that includes both data and functions. In addition, programmers can create relationships between one object and another. For example, objects can inherit characteristics from other objects.
One of the principal advantages of object-oriented programming techniques over procedural programming techniques is that they enable programmers to create modules that do not need to be changed when a new type of object is added. A programmer can simply create a new object that inherits many of its features from existing objects. This makes object-oriented programs easier to modify.
To perform object-oriented programming, one needs an object-oriented programming language (OOPL). “Java,” “C++,” and “Smalltalk” are three of the more popular languages, and there are object-oriented versions of Pascal.
The object-oriented paradigm allows for the fast development of applications to solve real problems. Using this paradigm, applications can interact with other applications (or the operating system) on the same computer. Such an interaction may involve sharing data or requesting execution of a task by another application. For example, the Component Object Model (COM), by the Microsoft Corporation, enables programmers to develop objects that can be accessed by any COM-compliant application on the same computer.
The object-oriented paradigm also allows applications to interact with applications on different computers. This is often called “distributed computing.”
Generally, distributed computing utilizes different components and objects comprising an application that are located on different computers coupled to a network. So, for example, a word processing application might consist of an editor component on one computer, a spell-checker object on a second computer, and a thesaurus on a third computer. In some distributed computing systems, each of the three computers could even be running a different operating system.
One of the requirements of distributed computing is a set of standards that specify how objects communicate with one another. There are currently two chief distributed computing standards: CORBA (Common Object Request Broker Architecture) and DCOM (Distributed Component Object Model).
For example, programmers may use DCOM (by the Microsoft Corporation) to develop objects that can be accessed by any DCOM-compliant application on a different computer. DCOM is an extension of COM to support objects distributed across a network.
Object Serialization
Serialization is the process of saving and restoring objects. More precisely, serialization is the process of saving and restoring the current data and the data structures of objects. The information is extracted from objects so that it is not lost or destroyed. In other words, the transitory status of objects is fixed (often in a file or a database) for the purpose of storage or communications. This process is also called “object persistence.”
If an application using an object is closed, then the object's data and its data structures must be preserved so that the object may be restored into its current state when the program is invoked again. For example, it is often necessary to temporarily store an object so that another application may access it. In another example, sending an object to another computer in a distributed computing environment requires the object be stored, transmitted, received, and recovered. In each of these examples, objects are stored and restored.
When serializing an object, the focus is not so much on how to store an object's data in non-volatile memory (such as a hard drive), but rather on how the in-memory data structure of an object differs from how the data appears once it has been extracted from the object. In memory, the data is located at arbitrary addresses, which are conceptually defined as data structures including data, arrays, objects, methods, and the like. However, these data structures cannot be stored directly.
To store a data structure, it must be broken down into its component parts, which includes simple data types like integers, strings, floating point numbers, etc. In addition, the hierarchical arrangement within each data structure must be stored and maintained. Furthermore, the hierarchical arrangement of data structures themselves must be stored and maintained.
The serialized data of an object may be thought of as a “dehydrated object” where all of the water (object functions in this metaphor) has been squeezed out of the object. This leaves only dry potato flakes (the data). Later, a hungry person wishes to have mashed potatoes (the object with the data), the potato flakes may be rehydrated. To “add water” to a dehydrated object, an empty object is created and the stored data is inserted therein.
Serialization of an object is an effective and important step in exchanging the object between computers. These types of object exchanges are important to a distributed computing environment where computers actively distribute objects across a network. Those of ordinary skill in the art are familiar with object serialization.
Serialization Issues
Separating Data Items: When serializing an object, data items must be separated from each other when they are stored. Otherwise, they will not be properly identified later when reading the data back into a new object during deserialization. Therefore, a serialization scheme must specify how data items are separated from each other.
Preserving Hierarchical Structure: Unless the hierarchical structure of the data is preserved during the serialization process, it cannot be recreated during a deserialization. Each data structure is potentially different from each other.
Therefore, a serialization scheme must have a general data format suiting the needs of all potential data structures of an object. Typically, such a scheme accomplishes this by having the capability to delimit arbitrary nested data, that is, truly hierarchical data structures.
Preserving Object Relationships: Often objects include references to other objects. When in memory, this reference is often a pointer in memory to the other objects. When serializing an object with a reference to another object, the serialized object includes the entire object like its does for a data structure.
However, if there are multiple references to the same object, then there are redundant inclusions of the same object. Furthermore, if the reference within an object is to itself (directly or indirectly), then the serialization process may fail because it is circularly and potentially infinitely storing object data.
Extensible Markup Language (XML)
SGML (Standard Generalized Markup Language) is a generic text formatting language that is widely used for large databases and multiple media projects. It is particularly well suited for works involving intensive cross-referencing and indexing.
HTML (HyperText Markup Language) is a specific implementation of a subset of SGML and is nearly universally used throughout the global as the foundation for the World Wide Web (“Web). HTML uses tags to mark elements, such as text and graphics, in a document to indicate how Web browsers should display these elements to the user. HTML tags also indicate how the Web browsers should respond to user actions such as activation of a link by means of a key press or mouse click.
XML (eXtensible Markup Language) is a specific implementation of a condensed form of SGML. XML lets Web developers and designers create customized tags that offer greater flexibility in organizing and presenting information than is possible with the HTML document coding system.
In HTML, both the tag semantics and the tag set are fixed. XML specifies neither semantics nor a tag set. In fact, XML is really a meta-language for describing markup languages. In other words, XML provides a facility to define tags and the structural relationships between them. Since there's no predefined tag set, there are no preconceived semantics. All of the semantics of an XML document will be defined either by the applications that process them or by stylesheets.
As the Internet becomes a serious business tool, HTML's limitations are becoming more apparent. For example, HTML can be used to exchange data, but it is not capable of exchanging objects. To be more precise, HTML cannot be used to exchange serialized objects.
XML does not have defined protocol for exchanging serialized objects between computers within a distributed computing environment.
The object persister serializes an object to preserve the object's data structure and its current data. The serialized object is encoded using XML and inserted within a message. That message is transmitted to an entity over a network. Such a transmission is performed using standard Internet protocols, such as HTML. Upon receiving the serialized object, the receiving entity deserializes the object to use it.
Rather than include copies of referenced objects within the serialized object, the object persister includes references to those objects. This avoids redundant inclusion of the same object and potentially infinite inclusion of the object itself that is being serialized.
a is a textual illustration of a typical data structure of an object as represented in pseudocode.
b is a textual illustration of a serialized object generated by an implementation of the exemplary object persister, where the typical data structure shown in
The following description sets forth a specific embodiment of the object persister that incorporates elements recited in the appended claims. This embodiment is described with specificity in order to meet statutory written description, enablement, and best-mode requirements. However, the description itself is not intended to limit the scope of this patent. Rather, the inventors have contemplated that the claimed object persister might also be embodied in other ways, in conjunction with other present or future technologies.
Computer Entities and Object Exchange
Herein, an entity is understood to be a computer component that is capable of exchanging messages containing at least one serialized object with another entity. Such an entity may be in an object-oriented, decentralized, distributed network environment. Alternatively, such an entity may be in a local, object-oriented computing environment. For example, an entity may be a computer, a computer system, a component of a computer, or an application running on a computer.
Herein, an originating entity (i.e., originator) is an entity that serialized an object, inserts it into a message, and sends that message. A destination entity (i.e., ultimate destination) is an entity that receives the message, parses the message, and deserializes the serialized object in the message. The exemplary object persister is implemented by one or more computer entities within a local computing environment or within a distributed network environment.
SOAP
In the primary exemplary embodiment described herein, the object persister is implemented as part of a protocol called Simple Object Access Protocol (SOAP). In addition, the primary exemplary embodiment described herein employs XML (eXtensible Markup Language).
SOAP provides a simple and lightweight mechanism for exchanging structured and typed information between peers in a decentralized, distributed environment using XML. SOAP does not itself define any application semantics such as a programming model or implementation specific semantics; rather it defines a simple mechanism for expressing application semantics by providing a modular packaging model and an encoding mechanism for encoding data within modules. This allows SOAP to be used in a large variety of systems ranging from general messaging systems to object-oriented programming systems to Remote Procedure Calls (RPC).
SOAP consists of two parts:
The SOAP envelope portion (which may be called the “message exchanger”) is described in more detail in appendix A and in co-pending patent application, entitled “Messaging Method, Apparatus, and Article of Manufacture”, which was filed Apr. 27, 2000 and is assigned to the Microsoft Corporation. The co-pending application is incorporated by reference.
The SOAP encoding mechanism includes the primarily exemplary embodiment of an object persister described herein. Furthermore, SOAP is described in more detail in Appendix A.
XML and HTTP
Unlike HTML (HyperText Markup Language), XML has sufficient flexibility so that it is possible to exchange serialized objects over a network. XML has no standard mechanism to accomplish this. However, the exemplary object persister provides such a mechanism to accomplish this.
Using the exemplary object persister, an object is serialized and encoded into XML and sent over a network to a destination entity. With the exemplary object persister, the serialized object is inserted into a message and sent over a network using HTTP (HyperText Transport Protocol). However, other transport protocols may be employed.
Serialization Format
The elements in the serialization format of the exemplary object persister represent different elements in an object data structure. The format is easily readable by humans and machines. The format also compensates for potentially infinite cycles where objects call each other.
In
A parameter may be one of many “datatypes” or “types”. Datatype is a concept understood by those of ordinary skill in the art. There are two main forms of datatypes: simple and complex.
A parameter is a simple datatype when it is defined to be a most fundamental type of data. In other words, a simple datatype cannot be broken down into one or more simpler types. Examples of a simple data type include character, string, integer, and floating point.
A parameter is a complex datatype when it composed of one or more other datatypes, which may include simple and other complex datatypes. A complex datatype may also be a customized datatype, which is defined within the object or by a reference to a definition outside of the object.
In
The object's data structure also includes a parameter that is itself a data structure at 38. This data structure parameter defines additional parameters. In particular, the addition parameters include paramA_label (an integer), paramB_label (a floating point), and paramC_label (a string).
b illustrates a serialized representation of the exemplary object shown in
As discussed above (and shown in Appendix A), the serialized object of the exemplary object persister is sent within a message over a network.
b shows the XML tags (“<Object_label>” at 54a and “</Objectlabel>” at 54b) that define the boundaries of the data structure of the serialized object (of
Corresponding to the parameters 34 of
In
b also shows a reference to another object at 58. In this parameter, an object called “ObjectName” is specified and it is located by a reference label “object2_ref”. Rather than including a copy of the “ObjectName” object within the serialized object, the exemplary persister simply includes the reference to the object. Referencing of embedded objects instead of including them lessens the data that must be serialized and sent over a network.
The object being serialized may be quite large and include redundant information if it includes multiple references to another object or if a referenced object includes references to still other objects. Suppose, for example, an object being serialized includes references to ObjectA, ObjectB, ObjectC, ObjectD, and ObjectE. ObjectB includes references to ObjectD and ObjectE. In addition, ObjectE includes references to ObjectsA-D. If all referenced objects were included within the serialized object (as is conventional), then most of the referenced object would be included multiple times. This is redundant. The exemplary object persister avoids this problem by including references to an object rather than the object itself.
Furthermore, the serialization of an object may be stuck an infinite loop if the object includes a references to itself or if a referenced object refers back to the object being serialized. If the serialization process includes the referenced object within the serialized object (as is conventional), then the serialized object may include itself in itself in itself in itself in itself etc. The exemplary object persister avoids this problem by including references to an object rather than the object to itself. Thus, an object will simply include a reference to itself.
In
The serialized object bounded by tags 54a and 54b may also be called a data structure element or simply “datastruct” element. The tags are part of the datastruct element. Everything within these tags is content of the datastruct element. The parameters (such as 56, 58, and 62) are part of the contents of the datastruct element.
Serialization Example
Below is an example of serialization of an object. The exemplary object's data structure in pseudocode:
Below is a serialized representation of an object (based upon the above structure is pseudocode) generated in accordance with the exemplary object persister:
Exemplary Methodological Implementation of the Object Persister
At 102 of
The contents of the data struct element include one or more data parameter elements (such as parameters 56, 58, and 62 in
At 104 of
At 108 and 110, the destination entity receives the message and parses it. At 112, the serialized object in the message is deserialized. The new object has the same hierarchical structure and arrangement of the original object (that was serialized). It also includes the data of that object at the moment that the object was serialized.
Exemplary Computing Environment
Exemplary computing environment 920 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the exemplary object persister. Neither should the computing environment 920 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary computing environment 920.
The exemplary object persister is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the exemplary object persister include, but are not limited to, personal computers, server computers, think clients, thick clients, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The exemplary object persister may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The exemplary object persister may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
As shown in
Bus 936 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnects (PCI) buss also known as Mezzanine bus.
Computer 930 typically includes a variety of computer readable media. Such media may be any available media that is accessible by computer 930, and it includes both volatile and non-volatile media, removable and non-removable media.
In
Computer 930 may further include other removable/non-removable, volatile/non-volatile computer storage media. By way of example only,
The drives and their associated computer-readable media provide nonvolatile storage of computer readable instructions, data structures, program modules, and other data for computer 930. Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 948 and a removable optical disk 952, it should be appreciated by those skilled in the art that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like, may also be used in the exemplary operating environment.
A number of program modules may be stored on the hard disk, magnetic disk 948, optical disk 952, ROM 938, or RAM 940, including, by way of example, and not limitation, an operating system 958, one or more application programs 960, other program modules 962, and program data 964.
A user may enter commands and information into computer 930 through input devices such as keyboard 966 and pointing device 968 (such as a “mouse”). Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, serial port, scanner, or the like. These and other input devices are connected to the processing unit 932 through an user input interface 970 that is coupled to bus 936, but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB).
A monitor 972 or other type of display device is also connected to bus 936 via an interface, such as a video adapter 974. In addition to the monitor, personal computers typically include other peripheral output devices (not shown), such as speakers and printers, which may be connected through output peripheral interface 975.
Computer 930 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 982. Remote computer 982 may include many or all of the elements and features described herein relative to computer 930.
Logical connections shown in
When used in a LAN networking environment, the computer 930 is connected to LAN 977 network interface or adapter 986. When used in a WAN networking environment, the computer typically includes a modem 978 or other means for establishing communications over the WAN 979. The modem 978, which may be internal or external, may be connected to the system bus 936 via the user input interface 970, or other appropriate mechanism.
Depicted in
In a networked environment, program modules depicted relative to the personal computer 930, or portions thereof, may be stored in a remote memory storage device. By way of example, and not limitation,
Exemplary Operating Environment
The operating environment is only an example of a suitable operating environment and is not intended to suggest any limitation as to the scope of use of functionality of the bw-meter described herein. Other well known computing systems, environments, and/or configurations that may be suitable for use with the bw-meter include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
Computer-Executable Instructions
An implementation of the exemplary object persister may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
Computer Readable Media
An implementation of the exemplary object persister may be stored on or transmitted across some form of computer readable media. Computer readable media can be any available media that can be accessed by a computer. By way of example, and not limitation, computer readable media may comprise computer storage media and communications media.
Computer storage media include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.
Communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as carrier wave or other transport mechanism and included any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above are also included within the scope of computer readable media.
Conclusion
Although the object persister has been described in language specific to structural features and/or methodological steps, it is to be understood that the object persister defined in the appended claims is not necessarily limited to the specific features or steps described. Rather, the specific features and steps are disclosed as preferred forms of implementing the claimed object persister.
Appendix A
Appendix A includes a copy of a provisional application filed on Mar. ______, 2000 and submitted to W3C organization for consideration standards committee.
SOAP: Simple Object Access Protocol
W3C Technical Report 23 March 2000
Abstract
SOAP is a lightweight protocol for exchange of information in decentralized, distributed environments. It is an XML based protocol that consists of two parts: an envelope for handling extensibility and modularity and an encoding mechanism for representing types within the envelope. SOAP can potentially be used in combination with a variety of other protocols; however, the only bindings defined in this document describe how to use SOAP in combination with HTTP and HTTP Extension Framework.
Introduction
SOAP provides a simple and lightweight mechanism for exchanging structured and typed information between peers in a decentralized, distributed environment using XML. SOAP does not itself define any application semantics such as a programming model or implementation specific semantics; rather it defines a simple mechanism for expressing application semantics by providing a modular packaging model and an encoding mechanism for encoding data within modules. This allows SOAP to be used in a large variety of systems ranging from messaging systems to RPC.
SOAP consists of two parts:
In addition to the SOAP envelope and encoding, this specification defines two protocol bindings that describe how a SOAP message can be carried in HTTP messages either with or without the HTTP Extension Framework.
Notational Conventions
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC-2119 [2].
The namespace prefixes “SOAP-ENV” and “SOAP-ENC” are used in this document to represent the prefixes that actually appears in the XML instance and are associated with the SOAP namespaces “http://schemas.xmlsoap.org/soap/envelope/” and “http://schemas.xmlsoap.org/soap/encoding/” respectively.
Throughout this document, the namespace prefix “xsi” is assumed to be associated with the URI “http://www.w3.org/1999/XMLSchema/instance” which is defined in the XML Schemas specification [11].
Namespace URIs of the general form “some-URI” represent some application-dependent or context-dependent URI [4].
This specification uses the augmented Backus-Naur Form (ABNF) as described in RFC-2234 [3] for certain constructs.
Examples of SOAP Messages
In this example, a GetLastTradePrice SOAP request is sent to a StockQuote service. The request takes a string parameter, ticker, and returns a float in the SOAP response. The SOAP Envelope element is the top element of the XML document representing the SOAP message. XML namespaces are used to disambiguate SOAP identifiers from application specific identifiers. The example illustrates the HTTP bindings. It is worth noting that the rules governing XML payload format in SOAP are entirely independent of the fact that the payload is carried in HTTP.
Example 1 SOAP Message Embedded in HTTP Request
Following is the response message containing the HTTP message with the SOAP message as the payload:
Example 2 SOAP Message Embedded in HTTP Response
The SOAP Message Exchange Model
SOAP does not define a specific message exchange pattern, as this is defined by the protocol bindings to a specific protocol such as HTTP. In other words, SOAP itself is not a request/response or a one-way message protocol. However, it can be used within the context of a request/response protocol or a one-way message protocol. For example, in the context of HTTP, SOAP inherits the HTTP message exchange pattern of requests and responses.
SOAP does however define the notion of a message path that consists of the originator of the message, the ultimate destination, and potentially one or more intermediaries that may take part in the message path.
A SOAP application receiving a SOAP message MUST process that message by performing the following actions in the order listed below:
All SOAP messages are encoded using XML.
A SOAP application SHOULD include the proper SOAP namespace on all elements and attributes defined by SOAP in messages that it generates. A SOAP application MUST be able to process SOAP namespaces in messages that it receives. It MUST discard messages that have incorrect namespaces and it MAY process SOAP messages without SOAP namespaces as though they had the correct SOAP namespaces.
SOAP defines two namespaces:
SOAP uses the local, unqualified “id” attribute of type “ID” to specify the unique identifier of an encoded element. SOAP uses the local, unqualified attribute “href” of type “uri-reference” to specify a reference to that value, in a manner conforming to the XML Specification [7], XML Schema Specification [11], and XML Linking Language Specification [21].
With the exception of the SOAP mustUnderstand attribute and the SOAP actor attribute, it is generally permissible to have attributes and their values appear in XML instances or alternatively in schemas, with equal effect. That is, declaration in a DTD or schema with a default or fixed value is semantically equivalent to appearance in an instance.
SOAP Envelope
A SOAP message is an XML document that consists of a mandatory SOAP envelope, an optional SOAP header, and a mandatory SOAP body. This XML document is referred to as a SOAP message for the rest of this specification. The namespace identifier for the elements and attributes defined in this section is “http://schemas.xmlsoap.org/soap/envelope/”. A SOAP message contains the following:
1. Envelope
2. Header
3. Body
The SOAP encodingStyle global attribute can be used to indicate the serialization rules used in a SOAP message. This attribute MAY appear on any element, and is scoped to that element's contents and all child elements not themselves containing such an attribute, much as an XML namespace declaration is scoped. There is no default encoding defined for a SOAP message.
The attribute value is an ordered list of one or more URIs identifying the serialization rule or rules that can be used to deserialize the SOAP message indicated in the order of most specific to least specific. Examples of values are
The serialization rules defined by SOAP are identified by the URI “http://schemas.xmlsoap.org/soap/encoding/”. Messages using this particular serialization SHOULD indicate this using the SOAP encodingStyle attribute. In addition, all URIs syntactically beginning with “http://schemas.xmlsoap.org/soap/encoding//” indicate conformance with the SOAP encoding rules.
A value of the zero-length URI (“”) explicitly indicates that no claims are made for the encoding style of contained elements. This can be used to turn off any claims from containing elements.
Envelope Versioning Model
SOAP does not define a traditional versioning model based on major and minor version numbers. A SOAP message MUST have an Envelope element associated with the “http://schemas.xmlsoap.org/soap/envelope/” namespace. If a message is received by a SOAP application in which the SOAP Envelope element is associated with a different namespace, the application MUST treat this as a version error and discard the message. If the message is received through a request/response protocol such as HTTP, the application MUST respond with a SOAP VersionMismatch faultcode message using the SOAP “http://schemas.xmlsoap.org/soap/envelope/” namespace.
SOAP Header
SOAP provides a flexible mechanism for extending a message in a decentralized and modular way without prior knowledge between the communicating parties. Typical examples of extensions that can be implemented as header entries are authentication, transaction management, payment etc.
The Header element is encoded as the first immediate child element of the SOAP Envelope XML element. All immediate child elements of the Header element are called header entries.
The encoding rules for header entries are as follows:
The SOAP Header attributes defined in this section determine how a recipient of a SOAP message should process the message. A SOAP application generating a SOAP message SHOULD only use the SOAP Header attributes on immediate child elements of the SOAP Header element. The recipient of a SOAP message MUST ignore all SOAP Header attributes that are not applied to an immediate child element of the SOAP Header element.
An example is a header with an element identifier of “Transaction”, a “mustUnderstand” value of “1”, and a value of 5. This would be encoded as follows:
SOAP Actor Attribute
A SOAP message travels from the originator to the ultimate destination, potentially by passing through a set of SOAP intermediaries along the message path. A SOAP intermediary is an application that is capable of both receiving and forwarding SOAP messages. Both intermediaries as well as the ultimate destination are identified by a URI.
Not all parts of a SOAP message may be intended for the ultimate destination of the SOAP message but may be intended for one or more of the intermediaries on the message path. The role of a recipient of a header element is similar to that of accepting a contract in that it cannot be extended beyond the recipient. That is, a recipient receiving a header element MUST NOT forward that header element to the next application in the SOAP message path. The recipient MAY insert a similar header element but in that case, the contract is between that application and the recipient of that header element.
The SOAP actor global attribute can be used to indicate the recipient of a header element. The value of the SOAP actor attribute is a URI. The special URI “http://schemas.xmlsoap.org/soap/actor/next” indicates that the header element is intended for the very first SOAP application that processes the message. This is similar to the hop-by-hop scope model represented by the Connection header field in HTTP.
Omitting the SOAP actor attribute indicates that the sender does not know or does not care which resource should process the header element. The SOAP actor global header element attribute can be used to indicate the entity that is going to act on that header element. The entity identified by the SOAP actor can be thought of as the application resource that a header element is intended for. This can for example be used to indicate whether a header element is intended for an entity of the recipient that takes part of the application or as part of the routing infrastructure.
This attribute MUST appear in the SOAP message instance and not in a schema or DTD in order to be effective.
SOAP mustUnderstand Attribute
The SOAP mustUnderstand global attribute can be used to indicate whether a header entry is mandatory or optional for the recipient to process. The recipient of a header entry is defined by the SOAP actor attribute. The value of the mustUnderstand attribute is either “1” or “0”. The absence of the SOAP mustUnderstand attribute is semantically equivalent to its presence with the value “0”.
If a header element is tagged with a SOAP mustUnderstand attribute with a value of “1”, the recipient of that header entry either MUST obey the semantics (as conveyed by its element name, contextual setting, and so on) and process correctly to those semantics, or MUST fail processing the message.
The SOAP mustUnderstand attribute allows for robust evolution. Elements tagged with the SOAP mustUnderstand attribute with a value of “1” MUST be presumed to somehow modify the semantics of their parent or peer elements. Tagging elements in this manner assures that this change in semantics will not be silently (and, presumably, erroneously) ignored by those who may not fully understand it.
This attribute MUST appear in the instance and not in a schema or DTD in order to be effective.
SOAP Body
The SOAP Body element provides a simple mechanism for exchanging mandatory information intended for the ultimate recipient of the message. Typical uses of the Body element include marshalling RPC calls and error reporting.
The Body element is encoded as an immediate child element of the SOAP Envelope XML element. If a Header element is present then the Body element MUST immediately follow the Header element, otherwise it MUST be the first immediate child element of the Envelope element.
All immediate child elements of the Body element are called body entries and each body entry is encoded as an independent element within the SOAP Body element.
The encoding rules for body entries are as follows:
While the Header and Body are defined as independent elements, they are in fact related. The relationship between a body entry and a header entry is as follows: A body entry is semantically equivalent to a header entry intended for the default actor and with a SOAP mustUnderstand attribute with a value of “1”. The default actor is indicated by not using the actor attribute.
SOAP Fault
The SOAP Fault element is used to carry error and/or status information within a SOAP message. If present, the SOAP Fault element MUST appear as a body entry and MUST NOT appear more than once within a Body element. The SOAP Fault element defines the following four subelements:
faultcode
The faultcode element is intended for use by software to provide an algorithmic mechanism for identifying the fault. The faultcode MUST be present in a SOAP Fault element and the faultcode value MUST be a qualified name. SOAP defines a small set of SOAP fault codes covering basic SOAP faults.
faultstring
The faultstring element is intended to provide a human readable explanation of the fault and is not intended for algorithmic processing. The faultstring element is similar to the ‘Reason-Phrase’ defined by HTTP (see [5]). It MUST be present in a SOAP Fault element and SHOULD provide at least some information explaining the nature of the fault.
faultactor
The faultactor element is intended to provide information about who caused the fault to happen within the message path. It is similar to the SOAP actor attribute but instead of indicating the destination of the header entry, it indicates the source of the fault. The value of the faultactor attribute is a URI identifying the source. Applications that do not act as the ultimate destination of the SOAP message MUST include the faultactor element in a SOAP Fault element. The ultimate destination of a message MAY use the faultactor element to indicate explicitly that it generated the fault.
detail
The detail element is intended for carrying application specific error information related to the Body element. It MUST be present if the contents of the Body element could not be successfully processed. It MUST NOT be used to carry information about error information belonging to header entries. Detailed error information belonging to header entries MUST be carried within header entries.
The absence of the detail element in the Fault element indicates that the fault is not related to processing of the Body element. This can be used to distinguish whether the Body element was processed or not in case of a fault situation.
All immediate child elements of the detail element are called detail entries and each detail entry is encoded as an independent element within the detail element.
The encoding rules for detail entries are as follows:
The faultcode values defined in this section MUST be used in the faultcode element when describing faults defined by this specification. The namespace identifier for these faultcode values is “http://schemas.xmlsoap.org/soap/envelope/”. Use of this space is recommended (but not required) in the specification of methods defined outside of the present specification.
The default SOAP faultcode values are defined in an extensible manner that allows for new SOAP faultcode values to be defined while maintaining backwards compatibility with existing faultcode values. The mechanism used is very similar to the 1xx, 2xx, 3xx etc basic status classes classes defined in HTTP (see [5]). However, instead of integers, they are defined as XML qualified names (see [8]). The character “.” (dot) is used as a separator of faultcode values indicating that what is to the left of the dot is a more generic fault code value than the value to the right. Example
Client.Authentication
The set of faultcode values defined in this document is:
SOAP Encoding
The SOAP encoding style uses a simple, traditional type system. A type either is a simple (scalar) type or is a complex type constructed as a composite of several parts, each with a type. This section defines a rule for serialization of a graph of typed objects. The namespace identifier for the elements and attributes defined in this section is “http://schemas.xmlsoap.org/soap/encoding/”. The encoding samples shown assume all namespace declarations are at a higher element level.
Rules for Encoding Types in XML
XML allows very flexible encoding of data. SOAP defines a narrower set of rules for encoding. This section defines the encoding rules at a high level, and the next section describes the encoding rules for specific types when they require more detail.
To describe encoding, the following terminology is used:
For simple types, SOAP adopts the types found in the section “Built-in datatypes” of the “XML Schema Part 2: Datatypes” Specification [11], along with the corresponding recommended representation thereof. Examples include:
Strings and arrays of bytes are encoded as multi-reference simple types.
String
For the purposes of this encoding discussion, a “String” is any sequence of characters meeting the production for CharData in the XML 1.0 specification. Note that many languages contain a datatype called “string” that permits values that do not match the CharData production. These values must be represented by using some datatype other than xsi:string.
A string is a multi-reference simple type. According to the rules of multi-reference simple types, the containing element of the string value MAY have an ID attribute; additional accessor elements MAY then have matching href attributes.
For example, two accessors to the same string could appear, as follows:
However, if the fact that both accessors reference the same instance of the string is immaterial, they may be encoded as though single-reference, as follows:
Enumerations
An enumeration is a single reference type whose value is encoded as one of the possible enumeration strings. In the following example EyeColor is an enumeration with the possible values of “Green”, “Blue”, or “Brown”:
Array of Bytes
An array of bytes is encoded as a multi-reference simple type. The recommended representation of an opaque array of bytes is the ‘base64’ encoding defined in XML Schemas [10][11], which uses the base64 encoding algorithm defined in 2045 [13]. However, the line length restrictions that normally apply to Base64 data in MIME do not apply in SOAP.
Polymorphic Accessor
Many languages allow accessors that can polymorphically access values of several types, each type being available at run time. A polymorphic accessor MUST contain an “xsi:type” attribute that describes the type of the actual value.
For example, a Polymorphic parameter named “cost” with a type of float would be encoded as follows:
as contrasted with a cost parameter whose type is invariant, as follows:
Beyond the simple types, SOAP defines support for the following constructed types:
A complex value contains an ordered sequence of structural members. When the members have distinct names, as in an instance of a C or C++ struct, this is called a “struct,” and when the members do not have distinct names but instead are known by their ordinal position, this is called an “array.”
The members of a complex value are encoded as accessor elements. For a struct, the accessor element name is the member name. For an array, the accessor element name is the element type name and the sequence of the accessor elements follows the ordinal sequence of the members.
The following is an example of a struct of type Book:
Below is an example of a type with both simple and compound members. It shows two levels of referencing.
Note that the “href” attribute of the Author accessor element is a reference to the value whose “id” attribute matches; a similar construction appears for the Address.
The form above is appropriate when the Person value and the Address value are multi-reference. If these were instead both single-reference, they SHOULD be embedded, as follows:
If instead there existed a restriction that no two persons can have the same address in a given instance and that an address can be either a Street-address or an Electronic-address, a Book with two authors would be encoded as follows:
Generic Records
There are cases where a struct is represented with its members named and values typed at run time. Even in these cases, the existing rules apply. Each member is encoded as an element with matching name, and each value is either contained or referenced. Contained values MUST have an “xsi:type” attribute giving the type of the value.
Arrays
The representation of the value of an array is an ordered sequence of elements constituting items of the array. The element name for each element is the element type.
As with complex types generally, if the type of an item in the array is a single-reference type, each item contains its value. Otherwise, the item references its value via an href attribute.
The following example is an array containing integer array members. The length attribute is optional.
The following is an example of a two-dimensional array of strings.
The following is an example of an array of two arrays, each of which is an array of strings.
Finally, the following is an example of an array of phone numbers embedded in a struct of type Person and accessed through the accessor “phone-numbers”:
A multi-reference array is always encoded as an independent element whose element name is “SOAP-ENC:Array”. For example an array of order structs encoded as an independent element:
A single-reference array is encoded as an embedded element whose element name is the accessor name.
Note that it is explicitly legal per this specification to follow the style used for serializing arrays and yet not explicitly mark an element as being an array. See the PurchaseLineItems element in the example here:
Partially Transmitted Arrays
SOAP provides support for partially transmitted arrays, known as “varying” arrays, in some contexts [12]. A partially transmitted array indicates in an “offset” attribute the zero-origin index of the first element transmitted; if omitted, the offset is taken as zero.
The following is an example of an array of size five that transmits only the third and fourth element:
Sparse Arrays
SOAP provides support for sparse arrays in some contexts. Each element contains a “position” attribute that indicates its position within the array. The following is an example of array of arrays of strings:
Assuming that the only reference to array-1 occurs in the enclosing array, this example could also have been encoded as follows:
Default Values
An omitted accessor element implies either a default value or that no value is known. The specifics depend on the accessor, method, and its context. Typically, an omitted accessor implies a Null value for polymorphic accessors (with the exact meaning of Null accessor-dependent). Typically, an omitted Boolean accessor implies either a False value or that no value is known, and an omitted numeric accessor implies either that the value is zero or that no value is known.
SOAP Root Attribute
The SOAP root attribute can be used to label serialization roots that are not true roots of an object graph so that the object graph can be deserialized. The attribute can have one of two values, either “1” or “0”. True roots of an object graph have the implied attribute value of “1”. Serialization roots that are not true roots can be labeled as serialization roots with an attribute value of “1” An element can explicitly be labeled as not being a serialization root with a value of “0”.
The SOAP root attribute MAY appear on any subelement within the SOAP Header and SOAP Body elements. The attribute does not have a default value.
Using SOAP in HTTP
This section describes how to use SOAP within HTTP with or without using the HTTP Extension Framework. Binding SOAP to HTTP provides the advantage of being able to use the formalism and decentralized flexibility of SOAP with the rich feature set of HTTP. Carrying SOAP in HTTP does not mean that SOAP overrides existing semantics of HTTP but rather that the semantics of SOAP over HTTP maps naturally to HTTP semantics.
SOAP naturally follows the HTTP request/response message model providing SOAP request parameters in a HTTP request and SOAP response parameters in a HTTP response. Note, however, that SOAP intermediaries are NOT the same as HTTP intermediaries. That is, an HTTP intermediary addressed with the HTTP Connection header field cannot be expected to inspect or process the SOAP entity body carried in the HTTP request.
HTTP applications MUST use the media type “text/xml” when including SOAP entity bodies in HTTP messages.
SOAP HTTP Request
Although SOAP might be used in combination with a variety of HTTP request methods, this binding only defines SOAP within HTTP POST requests.
The SOAPAction HTTP Header Field
The SOAPAction HTTP request header field can be used to indicate the intent of the SOAP HTTP request. The value is a URI identifying the intent. SOAP places no restrictions on the format or specificity of the URI or that it is resolvable. An HTTP client MUST use this header field when issuing a SOAP HTTP Request.
The presence and content of the SOAPAction header field can be used by servers such as firewalls to appropriately filter SOAP request messages in HTTP. The header field value of empty string (“”) means that the intent of the SOAP message is provided by the HTTP Request-URI. No value means that there is no indication of the intent of the message.
Examples:
SOAP HTTP Response
SOAP HTTP follows the semantics of the HTTP Status codes for communicating status information in HTTP. For example, a 2xx status code indicates that the client's request including the SOAP component was successfully received, understood, and accepted etc.
In case of a SOAP error while processing the request, the SOAP HTTP server MUST issue an HTTP 500 “Internal Server Error” response and include a SOAP message in the response containing a SOAP Fault element indicating the SOAP processing error.
The HTTP Extension Framework
A SOAP message MAY be used together with the HTTP Extension Framework [6] in order to identify the presence and intent of a SOAP HTTP request.
Whether to use the Extension Framework or plain HTTP is a question of policy and capability of the communicating parties. Clients can force the use of the HTTP Extension Framework by using a mandatory extension declaration and the “M-” HTTP method name prefix. Servers can force the use of the HTTP Extension Framework by using the 510 “Not Extended” HTTP status code. That is, using one extra round trip, either party can detect the policy of the other party and act accordingly.
The extension identifier used to identify SOAP using the Extension Framework is
Example 3 SOAP HTTP Using POST
Example 4 SOAP Using HTTP Extension Framework
Using SOAP for RPC
One of the design goals of SOAP is to encapsulate and exchange RPC calls using the extensibility and flexibility of XML. In order to provide a uniform mechanism for representing a method call and response, SOAP defines a mapping to be used with the SOAP encoding. This mapping is implied whenever the SOAP encoding is used.
Using SOAP for RPC is orthogonal to the SOAP protocol binding. In the case of using HTTP as the protocol binding, an RPC call maps naturally to an HTTP request and an RPC response maps to an HTTP response. However, using SOAP for RPC is not limited to the HTTP protocol binding.
To make a method call, the following information is needed:
RPC method calls and responses are both carried in the SOAP Body element using the following encoding
Because a result indicates success and a fault indicates failure, it is an error for the method response to contain both a result and a fault.
RPC and SOAP Header
An example of the use of the header element is the passing of a transaction ID along with a message. Since the transaction ID is not part of the signature and is typically held in an infrastructure component rather than application code, there is no direct way to pass the necessary information with the call. By adding an entry to the headers and giving it a fixed name, the transaction manager on the receiving side can extract the transaction ID and use it without affecting the coding of remote procedure calls.
References for this Appendix
Example 5 Similar to Example 1 but with a Mandatory Header
Example 6 Similar to Example 1 but with a Struct
Sample Encoding of Response
Example 7 Similar to Example 2 but with a Mandatory Header
Example 8 Similar to Example 2 but with a Struct
Example 9 Similar to Example 2 but Failing to Honor Mandatory Header
Example 10 Similar to Example 2 but Failing to Handle Body
This application is a continuation of and claims priority to U.S. patent application Ser. No. 09/635,830, filed Aug. 9, 2000, the disclosure of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
20040268241 A1 | Dec 2004 | US |
Number | Date | Country | |
---|---|---|---|
60148172 | Aug 1999 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 09635830 | Aug 2000 | US |
Child | 10893731 | Jul 2004 | US |