Specifications are typically used to describe a format for a document. For example, the XML Paper Specification (XPS) describes the XPS document format. An XPS document is a paginated representation of electronic paper described in an XML-based format. Documents based on such document specifications are often encapsulated into a container, or package to organize data into files for comprehensive document management. Such packages are typically based on packaging conventions that describe the technique for packaging documents and related information in a file format, describing metadata, parts (e.g., markup and binary resources), relationships between parts, etc. An exemplary such package convention is Open Packaging Specification Convention (OPC). Various applications use such packages to exchange, display, print, package content (e.g., documents, resources, etc.). To produce expected results, these applications rely not only on package conformance to a package specification, but also rely on document conformance to a document specification.
To determine conformance of a package and encapsulated document information, a user typically needs to manually verify that the package and document information conform to the corresponding specifications. However, document and package specifications are generally very large and complex, typically including extensive and detailed descriptions of abstract representations of each object's characteristics and relationship to other objects. As a result, determining conformance of a package and associated document content with corresponding package and document specifications is generally very time consuming, labor intensive, and prone to human error.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Systems and methods for automatic package conformance validation are described. A package is a logical entity that specifies multiple datastreams for use by an application to render pages and resources associated with one or more documents. In one aspect the systems and methods automatically validate conformance of the package in view of one or more package and document specifications. The specification(s) identify sets of criteria that delineate structural and markup conformance for the package and fixed payload(s) that specify the document(s), resources, etc. The systems and methods validate package and fixed payload(s), and notify a user of whether the package, documents, and/or associated resources passed or failed respective ones of the conformance criteria.
In the figures, the left-most digit of a component reference number identifies the particular Figure in which the component first appears.
Overview
Systems and methods for automatic package conformance validation are described. To this end, the systems and methods combine schema definition validation of package markup with a process that analyzes structure of the package to determine whether the package and encapsulated document content is well formed according to corresponding package and document specifications. The structure represents dependencies between respective ones of the package parts/datastreams. The systems and methods provide a user with indications, including verbose error information, of whether the data package (including its encapsulated fixed payload content) complies or does not conform to corresponding ones of the specifications,
These and other aspects for automatic package conformance validation are now described in detail.
An Exemplary System
Systems and methods for automatic package conformance validation are described in the general context of computer-executable instructions program modules being executed by a computing device such as a personal computer. Program modules generally include routines, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types. While the systems and methods are described in the foregoing context, acts and operations described hereinafter may also be implemented in hardware.
Validation module 112 validates conformance of a package 116 (hereinafter often called a “data package”), including encapsulated package relationships parts 118 (a logical entity identifying other datastreams) and fixed payload parts 120 to respective ones of package and document specifications 122. A fixed payload part 120 is a logical entity encapsulating other datastreams that specify document content, resources, etc. Package(s) 116 may be stored in compressed or uncompressed formats. In this implementation, package and document specifications 122 are respectively based on OPC and XPS. In a different implementation, different package and document specifications 122 are utilized.
As shown in
Relationships parts (e.g., please refer to package parts 118 and 126) explicitly identify relationships between parts (i.e., data streams), and therefore, respective structure of package 116 and fixed payload part 120. A relationship part is attached to another part via a naming convention. An exemplary such naming convention is shown in the following example: /markup/_rels/mypart.xml.rels is the relationships part for /markup/mypart.xml, although other naming conventions could be used. A relationship may be associated with a package 116 as a whole. Additionally, parts may internally reference other parts without defining a relationship.
TABLE 1, which is shown immediately below, shows an exemplary package parts relationship declaration, according to one embodiment. In this example, “Target” represents a URI of a referenced part. “ID” uniquely identifies the relationship within the particular relationships part. Type specifies a namespace-like definition of the purpose of the relationship
Each fixed document object 304 represents individual documents, chapters, or other document-defined groups of pages (represented by respective fixed page(s) 306) in the package 116. That is, each fixed document object 304 contains references to one or more fixed page objects 306. A fixed page object 306 includes zero or more document resource parts 128 (e.g., fonts 308, images 310 etc.). In this example, a fixed page object 306 includes, for example, markup describing the page, references to fonts 308, images 310, etc. (e.g., annotation(s), custom resources, etc.). Each of document markup part 124 (e.g., parts 302 through 316) is connected to a different respective part 124 via a relationship 320 (e.g., 320-1 through 320-N) specified in document relationships parts 126 of
In the example of
Exemplary fixed payload relationships (i.e., document relationships parts 126 of
In the example of
TABLE 2 shows an exemplary XML sequence for a fixed document sequence part 302 of
TABLE 3 shows an exemplary XML sequence for a fixed document (“FixedDocument”) part 306 of
Exemplary Conformance Checking
Validation module 112 validates structural and markup conformance of package 116 and package content (e.g., fixed payload(s) 120) in parallel by identifying package parts 118 through 128 and following relationships and in-line markup references specified in respective ones of the package parts. Package specification 122 includes text that defines structural and markup conformance of package 122. Document specification 122 includes text that defines structural and markup conformance of fixed payload(s) 120. Validation module 112 reads package 116 to identify/discover package parts, one-by-one, and build structure of the package 116 by processing package part specified relationship data, markup references, and implicit references. In this implementation, validation module 112 determines datastream conformance upon discovering the datastream. In another implementation, validation module 112 determines datastream conformance at any time after discovering the datastream, rather than immediately validating conformance of the datastream upon discovery (e.g., after discovering subsequent datastreams). In one implementation, a well-formed package 116 includes package relationship parts 118, one or more fixed payload parts 120, and for each fixed payload part 120, corresponding document markup parts 124, document relationships parts 126, and document resources parts 128. In a different implementation, a well-formed package 116 may include a different set of package parts. In one implementation, validation module 112 reads package 116 using known packaging APIs such as Windows Presentation Foundation packaging APIs.
To validate package and package content conformance, validating module 112 visits all parts in the package 116 and fixed payload(s) 120. For each package part identified, validation module 112 retrieves the datastream (package part) and validates content of the part based on the content type. Content types include, for example markup content and non-markup (e.g., binary) content. Markup data includes package markup, relationship markup, and fixed payload markup. Non-markup includes package 116 resources (thumbnails, digital certificates, etc.) and fixed payload 120 resources (fonts, images, digital certificates, remote resource dictionaries, etc.). When a portion of content encapsulated by a package 116 does not meet a requirement identified in a corresponding specification 122, validation module 122 (or another module employed by validation module) logs a corresponding error message in a log file (e.g., conformance validation log 134).
Referring to
Validation module 112, for each identified fixed document 304, validates the markup of the fixed document 304, and then and follows each of one or more page content references (e.g., identified with a “<PageContent>” markup tag) to validate markup of each fixed page 306. For each fixed page 306, validation module 112 discovers any associated resource parts 128 such as fonts 308 and/or images 310 and performs resource validation operations. That is, when there is more than one document reference element (“<DocumentReference>”) in fixed document sequence markup, validation model 112 validates an entire fixed document 304 and all of its associated fixed pages 306 (including any resources associated with each fixed page 306), before continuing to validate a second document reference (i.e., fixed document 304), etc.
For each resource (e.g., fonts, images, ICC profiles, remote resource dictionaries, etc.) associated with a fixed page 306, validation module 112 performs specific validation for the resource. For example, when determining conformance of a font, validation module 112 determines, for example, whether the font is a non-embeddable font. In another example, when determining conformance of an image, validation module 112 determines whether content type of an image is incorrect content type. In yet another example, when determining conformance of a remote resource dictionary, the remote resource dictionary is validated for conformance similar to a fixed page 306 in that the markup of the remote resource dictionary and each of the parts specified in the remote resource dictionary parts are processed and validated. Subsequent to verifying conformance for a particular resource, validation module 112 determines, for every markup referenced resource part, that there is a corresponding required-resource relationship. A common source of non-conformant documents are those that (in the FixedPage markup) reference a resource (font, image, etc.) and do not specify a required-resource relationship to this resource. Without the required-resource, relationship consumers are unable to determine what resources are required to render a FixedPage unless they parse the FixedPage markup, which is not a trivial task for non-rendering consumers. TABLE 4 shows an exemplary markup specifying a required-resource relationship, according to one embodiment.
In one implementation, during package and package content conformance validation processes, validation module 112 maintains a log (conformance validation log 134) of the validating operations and corresponding results. In one implementation, the log identifies at least a subset of the operations performed and indications of whether respective ones of the conformance validation operations passed or failed. In one implementation, validation module 112 operates in verbose mode, presenting operational messages, including error messages to a user, for example, on a display device 132, via audio, etc. In another implementation, validation model 112 identifies any package parts that were not validated during conformance validation operations. Such parts may represent one or more documented or non-documented extensions to a respective specification 122. For example, if the package 116 includes package parts that are not in a fixed payload 120, validation module 120 will provide a corresponding message to a user.
We now describe specific operations to validate package parts that include markup content and package parts that include non-markup content.
Validation module 112 determines conformance of package parts that include markup (e.g., package markup, relationship markup, fixed payload and document markup, etc.) by validating the markup prior to parsing (rendering) the markup. Validation module 112 validates markup conformance in view of one or more corresponding schema definitions that formally describe elements in the markup. Validation module 112 identifies the particular schema definition to validate conformance of specific markup based on the particular content type of part associated with the markup. For example, if markup is contained in fixed payload 120, a corresponding document specification 122 provides schema definitions for associated markup. In another example, if markup is contained in package 116 and not encapsulated in fixed payload 120, a corresponding package specification 122 provides the schema definitions for the associated markup. In one implementation, the markup is XML and the schema definitions are XML schema definitions (XSDs).
Validation module 112 validates conformance of identified non-markup data based on content type (e.g., font image, digital certificate, etc.). Package specification 122 includes definitions for non-markup references such as thumbnails (images), digital signatures, package properties (metadata), etc. In one implementation, validating module 112 uses WPF APIs to validate and images by determining whether the image can be successfully decoded. In another example, validating module 112 validates conformance of a font resource in view of identified incoming inline markup reference(s)—reference(s) specified in the markup of the FixedPage part contained in 124. When processing a font resource, first the font is decoded using, for example, known WPF APIs, then licensing intent of the font is inspected to ensure that the font has been embedded in the document in accordance with any and all licensing intents (e.g., licensing intents requiring a font to be embedded a certain way, a font not be embedded at all, etc.).
TABLE 5 shows an exemplary set of command line parameters, according to embodiment.
Referring to fig a 4, section 404 shows the set of operations used to validate conformance of non-markup resources associated with package 116 in view of the package specification 122.
An Exemplary Procedure
Operations of block 704, response to receiving the request, automatically discover datastreams specified by the package 116. Referring to
Operations of block 706, for each discovered datastream, and responsive to discovering the datastream, automatically validates conformance of one or more of the datastreams (e.g., markup data, non-markup of data (e.g., binary resources), and/or relationship data) in view of package and/or fixed payload conformance criteria. In one implementation, conformance validation module 112 validates conformance of each datastream as the datastream is discovered/identified. The particular conformance criteria applied to determine conformance of a particular datastream is based on the datastream's particular content type. For example, datastreams associated with the fixed payload 120 are validated in view of the document or paper specification 122 (
Operations of block 708 automatically generate a log of the implemented package conformance-validation operations and associated results. In one implementation, package conformance-validation module 112 automatically generates the log (shown as conformance validation log 134 in
Although systems and methods for automatic package conformance validation have been described in language specific to structural features and/or methodological operations or actions, it is understood that the described implementations are not necessarily limited to the specific features or actions described and do not limit the scope of the appended claims.
For example, in one implementation, computer 102 is coupled across a network 136 to one or more remote computing devices 138. In such a scenario, and in one implementation, one or more of the operations described above with respect to computer 102 implementing conformance validation of a package 116 are distributed to a remote computing device 138 for implementation. In another exemplary alternative embodiment, computing device 102 implements a service that provides automatic (programmatic) package conformance verification services to one or more such remote computing devices 138. In this latter scenario, a remote computing device 138 requests computing device 102 verify conformance of a specific package 116 in view of one or more identified specifications 122.
Accordingly, the specific features and operations of automatic package conformance validation described above with respect to
This application claims priority to U.S. provisional patent application Ser. No. 60/743,136, titled “Package Compliance Validation”, filed on Jan. 17,2006, and hereby incorporated by reference.
Number | Name | Date | Kind |
---|---|---|---|
6397259 | Lincke et al. | May 2002 | B1 |
6763343 | Brooke et al. | Jul 2004 | B1 |
7359902 | Ornstein et al. | Apr 2008 | B2 |
7370100 | Gunturu | May 2008 | B1 |
7536681 | Nagendra | May 2009 | B2 |
7587487 | Gunturu | Sep 2009 | B1 |
20020087571 | Stapel et al. | Jul 2002 | A1 |
20030154444 | Tozawa et al. | Aug 2003 | A1 |
20030163778 | Shores et al. | Aug 2003 | A1 |
20040073870 | Fuh et al. | Apr 2004 | A1 |
20040193627 | Matsuda | Sep 2004 | A1 |
20040205583 | Jones et al. | Oct 2004 | A1 |
20050028084 | Dziejma | Feb 2005 | A1 |
20050066015 | Dandekar et al. | Mar 2005 | A1 |
20050093770 | de Bonet et al. | May 2005 | A1 |
20050114148 | Hinkelman | May 2005 | A1 |
20050160108 | Charlet et al. | Jul 2005 | A1 |
20050177543 | Chen et al. | Aug 2005 | A1 |
20050251740 | Shur et al. | Nov 2005 | A1 |
20050262115 | Hu et al. | Nov 2005 | A1 |
20050273701 | Emerson et al. | Dec 2005 | A1 |
20050278272 | Ornstein et al. | Dec 2005 | A1 |
20060090195 | Pearson et al. | Apr 2006 | A1 |
20060129650 | Ho et al. | Jun 2006 | A1 |
20070121585 | Morrissey et al. | May 2007 | A1 |
Number | Date | Country |
---|---|---|
WO0125024 | Apr 2001 | WO |
Number | Date | Country | |
---|---|---|---|
20070168264 A1 | Jul 2007 | US |
Number | Date | Country | |
---|---|---|---|
60743136 | Jan 2006 | US |