This invention relates generally to the field of printing systems. More particularly, the invention relates to identifying resources prior to printing.
Print systems include presentation architectures that are provided for representing documents in a data format that is independent of the methods that are utilized to capture or create those documents. One example of an exemplary presentation system, which will be described herein, is the (Advanced Function Presentation) AFP™ system developed by International Business Machines Corporation. According to the AFP system, documents may include combinations of text, image, graphics, and/or bar code objects in device and resolution independent formats. Documents may also include and/or reference fonts, overlays, and other resource objects, which are required at presentation time to present the data properly.
Additionally, documents may also include resource objects, such as a document index and tagging elements supporting the search and navigation of document data for a variety of application purposes. In general, a presentation architecture for presenting documents in printed format employs a presentation data stream. To increase flexibility, this stream can be further divided into a device-independent application data stream and a device-dependent printer data stream. A data stream is a continuous ordered stream of data elements and objects that conform to a given formal definition. Application programs can generate data streams destined for a presentation device, archive library, or another application program.
Further, the AFP architecture provides Tag Logical Element (TLE) structured fields for content-based tagging. The indexing information in the TLEs applies to the page or page group containing them. TLEs are effective if the content of the variable data is predictable, for example, if a zip code of an address is always located on the same line of the data. However, TLEs do not work effectively if the location of the data is not always the same. For instance, the zip code portion of an address block is typically in the last line of the address block, which may have a variable number of lines.
Currently there are two mechanisms for defining such a TLE. The first method includes looking on n entire page for data. The second method comprises defining the position of the data with a threshold around which the data may be located. Each of these mechanisms is unreliable.
In one embodiment, a method is disclosed. The method includes generating one or more Tag Logical Elements (TLEs) in a variable location within a page of an Advanced Function Presentation (AFP) document. In another embodiment, a printing system is disclosed. The printing system includes a print application to enable a user generate one or more TLEs in a variable location within a page of an AFP document. In yet another embodiment, the print application included a graphical user interface (GUI) to enable a user to the TLEs by drawing a box around a block of data and specifying one or more lines within the box that are used to extract the one or more TLEs.
A better understanding of the present invention can be obtained from the following detailed description in conjunction with the following drawings, in which:
A data extraction mechanism is described. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without some of these specific details. In other instances, well-known structures and devices are shown in block diagram form to avoid obscuring the underlying principles of the present invention.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
In other embodiments print application 110 may also provide PostScript (P/S) and PDF files for printing. P/S and PDF files are printed by first passing them through a pre-processor (not shown), which creates resource separation and page independence so that the P/S or PDF file can be transformed into an AFP MO:DCA data stream prior to being passed to print server 120.
According to one embodiment, the AFP MO:DCA data streams are object-oriented streams including, among other things, data objects, page objects, and resource objects. In a further embodiment, AFP MO:DCA data streams include a Resource Environment Group (REG) that is specified at the beginning of the AFP document, before the first page. When the AFP MO:DCA data streams are processed by print server 120, the REG structure is encountered first and causes the server to download any of the identified resources that are not already present in the printer. This occurs before paper is moved for the first page of the job. When the pages that require the complex resources are eventually processed, no additional download time is incurred for these resources.
Print server 120 processes pages of output that mix all of the elements normally found in presentation documents, e.g., text in typographic fonts, electronic forms, graphics, image, lines, boxes, and bar codes. The AFP MO:DCA data stream is composed of architected, structured fields that describe each of these elements.
In one embodiment, print server 120 communicates with control unit 130 via an Intelligent Printer Data Stream (IPDS). The IPDS data stream is similar to the AFP data steam, but is built specific to the destination printer in order to integrate with each printer's specific capabilities and command set, and to facilitate the interactive dialog between the print server 120 and the printer. The IPDS data stream may be built dynamically at presentation time, e.g., on-the-fly in real time. Thus, the IPDS data stream is provided according to a device-dependent bi-directional command/data stream.
According to one embodiment, control unit 130 process and renders objects received from print server and provides sheet maps for printing to print engine 160. Objects are captured and stored in the printer capture storage 180.
In one embodiment, a user of printing system 100 may generate TLEs at print application 110. Particularly, application 110 provides a user interface that enables a process of defining a TLE that describes the location of data within a defined area of data. In such an embodiment, a TLE may be defined within the intermediate or last lines of the area.
For exemplary purposes, the TLE definition process will be described with references to a United States (US) address block. However, the process may be implemented to define TLEs in any data mining application where text is in a variable location within a specific area of a page. For instance, a US address block typically includes between 3 and 5 lines of data. The positions of the lines may vary in different statements but the address block usually appears within a defined area on a statement. Therefore, address data is not placed outside of this area, while no non-address is placed inside.
From such an address block, a user of print application 110 may wish to create zip code TLEs and optionally City/State TLEs. Further, a user may like to define TLEs for all intermediate lines. TLEs in an AFP document are typically created based on the position of transparent data (TRNs) on the page. For example, if the value of a social security number (SSN) is always found at a fixed position on a page, the TRN can be used to create an SSN TLE reliably.
However, such a process will not work for a TLE like zip code since the position of the zip code TRN can vary depending upon the number of address lines. Nonetheless, it can be guaranteed that the zip code will always appear on the last line or the penultimate line or so on, within an address block.
According to one embodiment, print application 110 facilitates the generation of a bounding box around a block of data and enables specification of one or more lines within the box that is used to extract one or more TLEs. For example, a bounding box may be generated around the address block of data and a particular line is specified to extract the zip code.
Particularly,
Referring back to
The above-described data extraction mechanism provides a way to clearly define the location of the data. As a result, there is no ambiguity in the definition, resulting in fewer errors than would occur in existing methods.
Embodiments of the invention may include various steps as set forth above. The steps may be embodied in machine-executable instructions. The instructions can be used to cause a general-purpose or special-purpose processor to perform certain steps. Alternatively, these steps may be performed by specific hardware components that contain hardwired logic for performing the steps, or by any combination of programmed computer components and custom hardware components.
Elements of the present invention may also be provided as a machine-readable medium for storing the machine-executable instructions. The machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards, propagation media or other type of media/machine-readable medium suitable for storing electronic instructions. For example, the present invention may be downloaded as a computer program which may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a carrier wave or other propagation medium via a communication link (e.g., a modem or network connection).
Throughout the foregoing description, for the purposes of explanation, numerous specific details were set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without some of these specific details. Accordingly, the scope and spirit of the invention should be judged in terms of the claims which follow.