Leaking sensitive information is a major concern for corporations. A company may have social security numbers, birth dates, addresses, credit card information, salary information, medical information or other sensitive information about various people. A company may also have company sensitive information that includes company trade secrets, mailing lists, marketing strategy, and other information that a company would like to keep secret. This information may be leaked in many ways including through e-mail, downloading to a portable storage device, printing, and so forth. Guarding against sensitive information leakage in printing is particularly challenging.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practiced.
Briefly, aspects of the subject matter described herein relate to controlling sensitive information leakage in printing. In aspects, one or more interception units sit in the print path(s) of a device. The interception unit(s) receives print data that is generated on the device and extract information from the print data. The extracted information is used to determine whether the print data include sensitive information. If the print data includes sensitive information, a policy is applied and the print data may or may not be forwarded towards a printer. Otherwise, the print data is forwarded towards a printer without applying a policy.
This Summary is provided to briefly identify some aspects of the subject matter that is further described below in the Detailed Description. This Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
The phrase “subject matter described herein” refers to subject matter described in the Detailed Description unless the context clearly indicates otherwise. The term “aspects” is to be read as “at least one aspect.” Identifying aspects of the subject matter described in the Detailed Description is not intended to identify key or essential features of the claimed subject matter.
The aspects described above and other aspects of the subject matter described herein are illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
As used herein, the term “includes” and its variants are to be read as open-ended terms that mean “includes, but is not limited to.” The term “or” is to be read as “and/or” unless the context clearly dictates otherwise. Other definitions, explicit and implicit, may be included below.
Aspects of the subject matter described herein are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, or configurations that may be suitable for use with aspects of the subject matter described herein comprise personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microcontroller-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, personal digital assistants (PDAs), gaming devices, printers, appliances including set-top, media center, or other appliances, automobile-embedded or attached computing devices, other mobile devices, distributed computing environments that include any of the above systems or devices, and the like.
Aspects of the subject matter described herein may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. Aspects of the subject matter described herein may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to
The computer 110 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 110 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media.
Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVDs) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer 110.
Communication media typically embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation,
The computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media, discussed above and illustrated in
A user may enter commands and information into the computer 20 through input devices such as a keyboard 162 and pointing device 161, commonly referred to as a mouse, trackball, or touch pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, a touch-sensitive screen, a writing tablet, or the like. These and other input devices are often connected to the processing unit 120 through a user input interface 160 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB).
A monitor 191 or other type of display device is also connected to the system bus 121 via an interface, such as a video interface 190. In addition to the monitor, computers may also include other peripheral output devices such as speakers 197 and printer 196, which may be connected through an output peripheral interface 190.
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in
When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 may include a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160 or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
As mentioned previously, leaking sensitive information is a major concern. It is particularly challenging to control information leakage in printing. Some examples of possible sensitive information have been described previously, but there is no intention for these examples to be all-inclusive or exhaustive. Indeed, based on the teachings herein, those skilled in the art may recognize many other types of sensitive information that may be prevented or hindered from leaking by aspects of the subject matter described herein.
The application 210 may include one or more processes that are capable of communicating with the printing API 215. The term “process” and its variants as used herein may include one or more traditional processes, threads, components, libraries, objects that perform tasks, and the like. A process may be implemented in hardware, software, or a combination of hardware and software. In an embodiment, a process is any mechanism, however called, capable of or used in performing an action. A process may be distributed over multiple devices or a single device. Likewise, the application 210 may have components that are distributed over one or more devices.
The printing API 215 comprises a programming interface that may be used by an application to print data. The printing API 215 may, for example, allow a process to submit print data, control print processes, and receive status about print jobs, printers, other printing information, and the like.
The spooler 220 provides a buffer in which data related to printing may be stored by the printing API 215 and accessed by the printing subsystem 225. This buffer may reside in volatile or non-volatile memory. The spooler 220 allows an application to quickly indicate information to be printed and then frees the application to do other tasks. The spooler 220 provides print information to the printing subsystem 225 at a rate the printing subsystem 225 needs the information. The spooler 220 may operate as a background task.
The printing subsystem 225 may include a print processor, a rendering engine, a print driver, or other components. Where needed, these components may assist in transforming print information provided by the print spooler into printer specific language that can be sent to a printer or back to the spooler 220. Sometimes, the print information buffered by the spooler 220 may be formatted in a format that is understood by one or more printers. In this case, the printing subsystem 225 may provide a communication path to the printer (e.g., by sending the information to a printer driver).
The interception unit 230 is a component that sits in the print path between the application 210 and the one or more printers 235. The interception unit 230 has an opportunity to examine data (sometimes called “print data”) sent between an application and a printer and is capable of extracting information from the data that may be used to determine whether the data contains sensitive information.
The content extractor 310 may extract information related to print jobs from the data received by the interception unit. Where the data has been transformed into an image or into commands that can be used to generate an image, the context extractor 310 may perform optical character recognition (OCR) on the image to obtain text from the image. Where the data is represented in a printing language, text corresponding to the original text of the document may be obtained from the data while printer commands may be discarded. Where the data includes meta-data or other content, the meta-data or other content may be extracted or may be discarded depending on the needs of the inspection engine 315.
The information extracted by the content extractor 310 may be sent to the content inspection service 305. With this information, the inspection engine 315 may determine whether the information includes sensitive information or not. Based on the teachings herein, those skilled in the art may recognize many techniques that may be used on the extracted information to determine whether the extracted information includes sensitive information or not. These techniques and others hereafter developed may be used without departing from the spirit or scope of aspects of the subject matter described herein.
If the content inspection service 305 determines that the data includes sensitive information, the content inspection service 305 may inform the interception unit 230. The interception unit 230 may then enforce a policy with respect to the data. Some exemplary policies include canceling the print job, presenting a user interface that asks the user to confirm that the user wishes to print the sensitive information, sending a message to a system administrator or the like that indicates that sensitive information has been printed, logging information that indicates a user that requested that the sensitive information be printed, take some other action, establishing an audit trail with respect to the requester who requested to print sensitive information, or the like.
If the content inspection service 305 determines that the content does not include sensitive information, the content inspection service 305 may so inform the interception unit 230. In response, the interception unit 230 may allow the print data to proceed towards a printer.
In one embodiment, the content inspection service 305 and the inspection engine 315 may be included in the interception unit 230 as subcomponents. In some embodiments, they may reside on a different device that the device upon which the interception unit 230 resides. In other embodiments, they may reside on the same device that the device upon which the interception unit 230 resides.
Filters are print processing components that may perform various operations on data that travels to or from a printer. For example, a filter may apply a watermark to a page. As another example, a filter may apply color management to a page. In the case of the interception unit 230, the interception unit 230 may extract data from the print jobs and provide this data to content inspection service 305. If the content inspection service 305 indicates that the data includes sensitive information, the interception unit 230 may take various actions to control leakage as has been described previously.
The interception unit 230 and the filters 405-407 may be ordered such that the interception unit 230 receives print data from the spooler 220 first and then sends the data if appropriate to the filter 405 which then sends the data to the filter 406, and so forth. Although the interception unit 230 is shown as first in the filter stack, in other embodiments, the interception unit 230 may be placed in other locations.
The print subsystem 225 may access a filter configuration file that describes how the filters of the printer driver are loaded, how they are called, and how data is passed between filters.
A print processor (e.g., such as the print processor 505) may be responsible for converting data in a spooled format into a format that can be sent to a print monitor. A print monitor may direct the data to the appropriate port driver. A print processor may also have other responsibilities including handling application requests to pause, resume, and cancel print jobs.
As shown in
If the content inspection service 305 indicates that the data does not include sensitive information, the interception unit 230 may pass the data unchanged to the print processor 505.
The other components 506-507, if any, may comprise a rendering engine, a print driver, or other components as needed or desired.
Returning to
As other examples, in one embodiment, the interception unit 230 may be placed such that it replaces or supplements the printing API. In another embodiment, the interception unit 230 may be implemented as a partial print provider. In yet another embodiment, the interception unit 230 may be implemented as a queue manager with two queues. Applications can print to one queue which the interception unit 230 extracts data from. If a print job is deemed non-sensitive, it may be moved to another print queue that a printer prints from. In yet another embodiment, the interception unit 230 may be placed in a networking stack to monitor for print jobs that are sent via a network.
In a given implementation, there may more than one instance of the interception unit 230. For example, the interception unit 230 may be placed before the spooler 220 and in the printing subsystem 225. As another example, there may be more than one interception unit 230 in the printing subsystem 225 to take care of multiple print paths.
Various of the components above may have responsibilities that include determining whether a print job is to be handled locally or across a network, determining a physical printer that will be used to print a print job, converting a data stream from a spooled format to a format that can be used by a printer, sending a data stream of a print job to a printer, and so forth.
Furthermore, in one embodiment, one or more of the components above may execute in user mode, in kernel mode, some other mode, a combination of the above, or the like.
Although the environment described above includes various components related to printing, it will be recognized that more, fewer, or a different components may be employed without departing from the spirit or scope of aspects of the subject matter described herein. Furthermore, the components included in the environment may be configured in a variety of ways as will be understood by those skilled in the art without departing from the spirit or scope of aspects of the subject matter described herein.
Turning to
At block 615, the data is sent towards a print driver of the device. For example, referring to
At block 620, the print data is received at an interception unit on the device. For example, referring to
At block 625, a determination is made as to whether the print data includes sensitive information. If so, the actions continue at block 630; otherwise, the actions continue at block 635. For example, referring to
At block 630, a policy is enforced with respect to the print data. For example, referring to
At block 635, the print data is forwarded towards the printer. For example, referring to
At block 640, other actions, if any, may occur.
As can be seen from the foregoing detailed description, aspects have been described related to controlling sensitive information leakage in printing. While aspects of the subject matter described herein are susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit aspects of the claimed subject matter to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of various aspects of the subject matter described herein.