1. Field of the Invention
This invention relates to decentralized virus scanning for stored data, such as for example in a networked environment.
2. Related Art
Computer networking and the Internet in particular offer end users unprecedented access to information of all types on a global basis. Access to information can be as simple as connecting some type of computing device using a standard phone line to a network. With the proliferation of wireless communication, users can now access computer networks from practically anywhere.
Connectivity of this magnitude has magnified the impact of computer viruses. Viruses such as “Melissa” and “I love you” had a devastating impact on computer systems worldwide. Costs for dealing with viruses are often measured in millions and tens of millions of dollars. Recently it was shown that hand-held computing devices are also susceptible to viruses.
Virus protection software can be very effective in dealing with viruses, and virus protection software is widely available for general computing devices such as personal computers. There are, however, problems unique to specialized computing devices, such as such as for example servers, file servers, storage systems, and devices of any kind performing storage and retrieval of data. Off-the-shelf virus protection software will not run on a specialized computing device unless it is modified to do so, and it can be very expensive to rewrite software to work on another platform.
A first known method is to scan for viruses at the data source. When the data is being provided by a specialized computing device the specialized computing device must be scanned. Device-specific virus protection software must be written in order to scan the files on the device.
While this first known method is effective in scanning files for viruses, it suffers from several drawbacks. First, a company with a specialized computing device would have to dedicate considerable resources to creating virus protection software and maintaining up-to-date data files that protect against new viruses as they emerge.
Additionally, although a manufacturer of a specialized computing device could enlist the assistance of a company that creates mainstream virus protection software to write the custom application and become a licensee this would create other problems, such as reliance on the chosen vendor of the anti-virus software, compatibility issues when hardware upgrades are effected, and a large financial expense.
A second known method for protecting against computer viruses is to have the end user run anti-virus software on their client device. Anti-virus software packages are offered by such companies as McAfee and Symantec. These programs are loaded during the boot stage of a computer and work as a background job monitoring memory and files as they are opened and saved.
While this second known method is effective at intercepting and protecting the client device from infection, it suffers from several drawbacks. It places the burden of detection at the last possible link in the chain. If for any reason the virus is not detected prior to reaching the end user it is now at the computing device where it will do the most damage (corrupting files and spreading to other computer users and systems).
It is much better to sanitize a file at the source from where it may be delivered to millions of end users rather than deliver the file and hope that the end user is prepared to deal with the file in the event the file is infected. End users often have older versions of anti-virus software and/or have not updated the data files that ensure the software is able to protect against newly discovered viruses, thus making detection at the point of mass distribution even more critical.
Also, hand-held computing devices are susceptible to viruses, but they are poorly equipped to handle them. Generally, hand-held computing devices have very limited memory resources compared to desktop systems. Dedicating a portion of these resources to virus protection severely limits the ability of the hand-held device to perform effectively. Reliable virus scanning at the information source is the most efficient and effective method.
Protecting against viruses is a constant battle. New viruses are created everyday requiring virus protection software manufacturers to come up with new data files (solution algorithms used by anti-virus applications). By providing protection at the source of the file, viruses can be eliminated more efficiently and effectively.
Security of data in general is important. Equally important is the trust of the end user. This comes from the reputation that precedes a company, and companies that engage in web commerce often live and die by their reputation. Just like an end user trusts that the credit card number they have just disclosed for a web-based sales transaction is secure they want files they receive to be just as secure.
Accordingly, it would be desirable to provide a technique for scanning specialized computing devices for viruses and other malicious or unwanted content that may need to be changed, deleted, or otherwise modified.
The invention provides a method and system for performing specialized services for files at a server, such as scanning files at a storage system, filer, or other server performing storage and retrieval of data, for viruses by secondary computing devices. The server (such as a filer) is connected to one or more supplementary computing devices that scan requested files upon request to ensure they are virus free prior to delivery to end users. When an end user requests a file from the server the following steps occur: The server determines whether the file or other object requested by the user must be scanned before delivery to, or after use by, the user. The server opens a channel to one of the external computing devices and sends the filename (or some other designator of the file or object, such as a file handle or an i-node pointer; “filename,” “file name space” and the like refer to the collection of possible designators for files or other types of object). The external computing device opens the file and scans it. After possibly taking remedial actions (such as for example cleaning the file of the virus, quarantining or deleting the file), the external computing device notifies the filer the status of the file scan operation. The server sends the file to the end user provided the status indicates it may do so.
This system is very efficient and effective, as a file needs only to be scanned one time for a virus unless the file has been modified or new data files that protect against new viruses have been added. Scan reports for files that have been scanned may be stored in one or more of the external computing devices, in one or more servers, and some portion of a scan report may be delivered to end users.
In alternative embodiments of the invention one or more of the external computing devices may be running other supplementary applications, such as data compression and decompression, data encryption and decryption, and database compaction, independently or in some combination.
In the following description, a preferred embodiment of the invention is described with regard to preferred process steps and data structures. Those skilled in the art would recognize after perusal of this application that embodiments of the invention can be implemented using one or more general purpose processors or special purpose processors or other circuits adapted to particular process steps and data structures described herein, and that implementation of the process steps and data structures described herein would not require undue experimentation or further invention.
Lexicography
The following terms refer or relate to aspects of the invention as described below. The descriptions of general meanings of these terms are not intended to be limiting, only illustrative.
One type of storage system is a file server. A file server or filer includes a computer that provides file services relating to the organization of information on writeable persistent storage devices, such as memories, tapes or disks of an array. The filer might include a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on, e.g., the disks. Each “on-disk” file may be implemented as a set of data structures, e.g., disk blocks, configured to store information, such as the actual data for the file. A directory, on the other hand, might be implemented as a specially formatted file in which information about other files and directories are stored. In general, the term “storage operating system” refers to computer-executable code that implements data storage functionality, such as file system semantics, and manages data access. A storage operating system can be implemented as an application program operating over a general-purpose operating system, such as UNIX® or Windows NT®, or as a general-purpose operating system with storage functionality or with configurable functionality that is configured for storage applications, or as a special-purpose operating system dedicated to performing a limited range of functionality including storage and related tasks in storage appliances and other devices.
A storage system may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on a server, e.g., the storage system. In this model, the client may comprise an application executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network, wide area network or virtual private network implemented over a public network, such as the Internet. Each client may request the services of the file system on the storage system by issuing file system protocol messages (in the form of packets) to the system over the network. It should be noted, however, that the storage system may alternatively be configured to operate as an assembly of storage devices that is directly-attached to a (e.g., client or “host”) computer. Here, a user may request the services of the file system to access (i.e., read and/or write) data from/to the storage devices.
Although the invention is described herein with reference to a “filer,” there is no particular limitation of the invention to filers, file servers, storage systems, or similar devices. It would be clear to those skilled in the art, after perusal of this application, how to implement the ideas and techniques described herein for all types of server devices. Such implementations would not require any undue experimentation or further invention, and are within the scope and spirit of the invention.
For example, but without limitation, a particular client device in a first relationship with a first server device, can serve as a server device in a second relationship with a second client device. In a preferred embodiment, there are generally a relatively small number of server devices servicing a relatively larger number of client devices.
For example, but without limitation, the client device and the server device in a client-server relation can actually be the same physical device, with a first set of software elements serving to perform client functions and a second set of software elements serving to perform server functions.
Although the invention is described with regard to a client-server model, there is no particular requirement in the invention that the stored data is maintained and communicated to users using a client-server model. For example, other forms of distributed computing in which a user request for access to data objects triggers decentralized processing by one or more of a set of computing devices would also be within the scope and spirit of the invention.
As noted above, these descriptions of general meanings of these terms are not intended to be limiting, only illustrative. Other and further applications of the invention, including extensions of these terms and concepts, would be clear to those of ordinary skill in the art after perusing this application. These other and further applications are part of the scope and spirit of the invention, and would be clear to those of ordinary skill in the art, without further invention or undue experimentation.
System Elements
A system 100 includes a client device 110 associated with a user 111, a communications network 120, a filer 130, and a processing cluster 140.
The client device 110 includes a processor, a main memory, and software for executing instructions (not shown, but understood by one skilled in the art). Although the client device 110 and filer 130 are shown as separate devices there is no requirement that they be physically separate.
In a preferred embodiment, the communication network 120 includes the Internet. In alternative embodiments, the communication network 120 may include alternative forms of communication, such as an intranet, extranet, virtual private network, direct communication links, or some other combination or conjunction thereof.
A communications link 115 operates to couple the client device 110 to the communications network 120.
The filer 130 includes a processor, a main memory, software for executing instructions (not shown, but understood by one skilled in the art), and a mass storage 131. Although the client device 110 and filer 130 are shown as separate devices there is no requirement that they be separate devices. Moreover, although the invention is described with regard to a single filer 130, the invention is equally applicable to sets of filers 130 operating with the processing cluster 140. A set of multiple filers 130 might each one operate independently and each one make individual use of the processing cluster 140, or might operate in conjunction as a group and make use of the processing cluster 140 as a collective entity, or some combination thereof. Since, as noted below, the processing cluster 140 can include one or more cluster devices 141, the invention can be performed with any set of M filers and any set of N processors. There is no particular requirement that M or N must be fixed; either filers 130 or cluster devices 141 might be added by operator command or by a handshaking protocol while filers 130 and cluster devices 141 are operating. The filer 130 is connected to the communications network 120.
The filer 130 includes a set of configuration information 137 disposed so that a processor for the filer 130 can readily access that configuration information 137. In a preferred embodiment, the filer 130 includes software instructions for reviewing, reporting, editing, or modifying the configuration information 137, as directed by an operator, or possibly by a remote user having designated privileges. The configuration information 137 includes the following:
The mass storage 131 includes at least one file 133 that is capable of being requested by a client device 110. The processing cluster 140 includes one or more cluster device 141 each including a processor, a main memory, software for executing instructions, and a mass storage (not shown but understood by one skilled in the art). Although the filer 130 and the processing cluster 140 are shown as separate devices there is no requirement that they be separate devices.
In a preferred embodiment the processing cluster 140 is a plurality of personal computers in an interconnected cluster capable of intercommunication and direct communication with the filer 130. There is no particular requirement that the processing cluster 140 must be organized as a unified cluster, or must be local to the filer 130, or must be homogeneous in the nature of the processing devices, or have any other particular characteristics. For example, in alternative embodiments, the processing cluster 140 includes a set of PC's, workstations, servers, or other devices, coupled to the filer 130 by means of a network such as the Internet.
In a preferred embodiment, cluster devices 141 in the processing cluster 140 register their presence with the filer 130, thus giving the filer 130 knowledge of their availability to perform scanning (or other) operations. While this is preferred, there is no particular requirement for the invention for registration, as the filer 130 may in alternative embodiments be configured to send out “John Doe” requests for cluster devices 141 to process files requested by the user.
The cluster link 135 operates to connect the processing cluster 140 to the filer 130. The cluster link 135 may include non-uniform memory access PUMA), or communication via an intranet, extranet, virtual private network, direct communication links, or some other combination or conjunction thereof.
Method of Operation
A method 200 includes a set of flow points and a set of steps. The system 100 performs the method 200. Although the method 200 is described serially, the steps of the method 200 can be performed by separate elements in conjunction or in parallel, whether asynchronously, in a pipelined manner, or otherwise. There is no particular requirement that the method 200 be performed in the same order in which this description lists the steps, except where so indicated.
At a flow point 210, the system 100 is ready to begin performing the method 200.
At a step 211, a user 111 utilizes the client device 110 to initiate a request for a file 133. The request is transmitted to the filer 130 via the communications network 120. In a preferred embodiment the filer 130 is an independent file server performing file retrieval and storage in response to a file server protocol such as NFS or CIFS. In alternative embodiments, the filer 130 might be a supplemental storage device or file maintenance server operating at the direction of another server, such as a web server.
At a step 212, the filer 130 receives the request for the file 133 and determines if the file 133 must be scanned for a virus. As part of this step, the filer 130 performs the following sub-steps:
At a step 213, the filer 130, having determined that the file 133 should be scanned, sends the file ID and path of the file 133 to the processing cluster 140 where it is received by one of the cluster devices 141. As part of this step, the filer 130 performs the following sub-steps:
At a step 215, the cluster device 141 uses the file ID and path to open the file 133 in the mass storage 131 of the filer 130.
At a step 217, the cluster device 141 scans the file 133 for viruses. In a preferred embodiment, files are tasked to the processing cluster 140 in a round robin fashion. In alternative embodiments files may be processed individually by a cluster device 141, by multiple cluster device 141 simultaneously, or some combination thereof. Load balancing may be used to ensure maximum efficiency of processing within the processing cluster 140.
In a preferred embodiment, the filer 130 groups cluster devices 141 into one or more classes, such as primary and secondary, where all primary cluster devices 141 are assigned, followed by secondary cluster devices 141. This allows an operator to direct the filer 130 to use a first cluster device 141, such as for example available using a relatively rapid connection, exclusively, but when the first cluster device 141 is unavailable for any reason, to fall back to using a second designated cluster device 141, such as for example available using a much less rapid connection.
In certain embodiments, an operation offloaded by the filer 130 to the cluster 140 may include a plurality of individual processes, each of which may be performed at a separate cluster device 141 in the cluster 140.
There are several vendors offering virus protection software for personal computers, thus the operator of the filer 130 may choose whatever product they would like to use that supports the communication protocol with the filer 130 described herein. They may even use combinations of vendors' products in the processing cluster 140, when those combinations can operate using the communication protocol with the filer 130 described herein. In alternative embodiments, the filer 130 may operate with forms of virus protection software that does not support the communication protocol with the filer 130 described herein, with some features (such as the timeout and ARE-YOU-WORKING? message) not available to those forms of virus protection software. In further alternative embodiments of the invention, continual scanning of every file 133 on the filer 130 may take place.
The processing cluster 140 is highly scalable. The price of personal computers is low compared to dedicated devices, such as filers, therefore this configuration is very desirable. Additionally, a cluster configuration offers redundant systems availability in case a cluster device 141 fails—failover and takeover is also possible within the processing cluster.
The cluster device 141 is assigned a special type of access (herein called “OPEN-FOR-SCANNING”), so that the cluster device 141 can scan the file 133 regardless of whether it is already locked by another user. In a preferred embodiment, OPEN-FOR-SCANNING mode is restricted to those devices the filer 130 can verify are actually cluster devices 141. In a preferred embodiment, the filer 130 can restrict OPEN-FOR-SCANNING mode to devices according to one or more of the following criteria:
In a preferred embodiment, OPEN-FOR-SCANNING mode access is restricted to processes running as an NT “Service” on the cluster device 141. Thus, a selected cluster device 141 might be in use by a user having no particularly special privileges, while the cluster device 141 concurrently operates with a service running as “Administrator” and thus being allowed by the filer 130 to have OPEN-FOR-SCANNING mode access.
At a step 219, the cluster device 141 transmits a scan report to the filer 130. The scan report primarily reports whether the file is safe to send. Further information may be saved for statistical purposes (for example, how many files have been identified as infected, was the virus software able to sanitize the file or was the file deleted) to a database. The database may be consulted to determine whether the file 133 needs to be scanned before delivery upon receipt of a subsequent request. If the file 133 has not changed since it was last scanned and no additional virus data files have been added to the processing cluster, the file 133 probably does not need to be scanned. This means the file 133 can be delivered more quickly.
Other intermediary applications may also run separately, in conjunction with other applications, or in some combination thereof within the processing cluster 140. Compression and encryption utilities are some examples of these applications. These types of applications, including virus scanning, can be very CPU intensive, thus outsourcing can yield better performance by allowing a dedicated device like a filer to do what it does best and farm out other tasks to the processing cluster 140.
As part of this step, the filer 130 might also perform the following sub-steps:
At a step 221, the filer 130 transmits or does not transmit the file 133 to the client 110 based on its availability as reported following the scan by the processing cluster 140. Some portion of the scan report may also be transmitted to the user. As part of this step, the filer 130 performs the following sub-steps:
At this step, a request for a file 133 has been received, the request has been processed, and if possible a file 133 has been delivered. The process may be repeated at step 211 for subsequent requests.
Generality of the Invention
The invention has wide applicability and generality to other aspects of processing requests for files.
The invention is applicable to one or more of, or some combination of, circumstances such as those involving:
Although preferred embodiments are disclosed herein, many variations are possible which remain within the concept, scope, and spirit of the invention, and these variations would become clear to those skilled in the art after perusal of this application.
Number | Name | Date | Kind |
---|---|---|---|
4104718 | Poublan et al. | Aug 1978 | A |
4937763 | Mott | Jun 1990 | A |
5067099 | McCown et al. | Nov 1991 | A |
5261051 | Masden et al. | Nov 1993 | A |
5392446 | Tower et al. | Feb 1995 | A |
5396609 | Schmidt et al. | Mar 1995 | A |
5604862 | Midgely et al. | Feb 1997 | A |
5623600 | Ji et al. | Apr 1997 | A |
5630049 | Cardoza et al. | May 1997 | A |
5649099 | Theimer et al. | Jul 1997 | A |
5649152 | Ohran et al. | Jul 1997 | A |
5682535 | Knudsen | Oct 1997 | A |
5771354 | Crawford | Jun 1998 | A |
5787409 | Seiffert et al. | Jul 1998 | A |
5819047 | Bauer et al. | Oct 1998 | A |
5819292 | Hitz et al. | Oct 1998 | A |
5835953 | Ohran | Nov 1998 | A |
5918008 | Togawa et al. | Jun 1999 | A |
5925126 | Hsieh | Jul 1999 | A |
5933594 | La Joie et al. | Aug 1999 | A |
5946690 | Pitts | Aug 1999 | A |
5963962 | Hitz et al. | Oct 1999 | A |
5968176 | Nessett et al. | Oct 1999 | A |
6076105 | Wolff et al. | Jun 2000 | A |
6088803 | Tso et al. | Jul 2000 | A |
6101558 | Utsunomiya et al. | Aug 2000 | A |
6108785 | Poisner | Aug 2000 | A |
6115741 | Domenikos | Sep 2000 | A |
6138126 | Hitz et al. | Oct 2000 | A |
6148349 | Chow et al. | Nov 2000 | A |
6185598 | Farber et al. | Feb 2001 | B1 |
6189114 | Orr | Feb 2001 | B1 |
6226752 | Gupta et al. | May 2001 | B1 |
6230200 | Forecast et al. | May 2001 | B1 |
6237114 | Wookey et al. | May 2001 | B1 |
6253217 | Dourish et al. | Jun 2001 | B1 |
6256773 | Bowman-Amuah | Jul 2001 | B1 |
6266774 | Sampath et al. | Jul 2001 | B1 |
6275393 | Baudelot et al. | Aug 2001 | B1 |
6275939 | Garrison | Aug 2001 | B1 |
6324581 | Xu et al. | Nov 2001 | B1 |
6327594 | Van Huben et al. | Dec 2001 | B1 |
6327658 | Susaki et al. | Dec 2001 | B1 |
6327677 | Garg et al. | Dec 2001 | B1 |
6338141 | Wells | Jan 2002 | B1 |
6401126 | Douceur et al. | Jun 2002 | B1 |
6405327 | Sipple et al. | Jun 2002 | B1 |
6490666 | Cabrera et al. | Dec 2002 | B1 |
6502102 | Haswell et al. | Dec 2002 | B1 |
6519679 | Devireddy et al. | Feb 2003 | B2 |
6523027 | Underwood | Feb 2003 | B1 |
6542967 | Major | Apr 2003 | B1 |
6560632 | Chess et al. | May 2003 | B1 |
6577636 | Sang et al. | Jun 2003 | B1 |
6606744 | Mikurak | Aug 2003 | B1 |
6697846 | Soltis | Feb 2004 | B1 |
6721721 | Bates et al. | Apr 2004 | B1 |
6721862 | Grant et al. | Apr 2004 | B2 |
6728766 | Cox et al. | Apr 2004 | B2 |
6757753 | DeKoning et al. | Jun 2004 | B1 |
6757794 | Cabrera et al. | Jun 2004 | B2 |
6785732 | Bates et al. | Aug 2004 | B1 |
6801949 | Bruck et al. | Oct 2004 | B1 |
6802012 | Smithson et al. | Oct 2004 | B1 |
6832313 | Parker | Dec 2004 | B1 |
6859841 | Narad et al. | Feb 2005 | B2 |
6918113 | Patel et al. | Jul 2005 | B2 |
6931540 | Edwards et al. | Aug 2005 | B1 |
6981070 | Luk et al. | Dec 2005 | B1 |
6985927 | O'Brien et al. | Jan 2006 | B2 |
7020697 | Goodman et al. | Mar 2006 | B1 |
7032022 | Shanumgam et al. | Apr 2006 | B1 |
7089293 | Grosner et al. | Aug 2006 | B2 |
7124180 | Ranous | Oct 2006 | B1 |
7146377 | Nowicki et al. | Dec 2006 | B2 |
7237027 | Raccah et al. | Jun 2007 | B1 |
7293083 | Ranous et al. | Nov 2007 | B1 |
7349960 | Pothier et al. | Mar 2008 | B1 |
20010013064 | Cox et al. | Aug 2001 | A1 |
20020040405 | Gold | Apr 2002 | A1 |
20020042866 | Grant et al. | Apr 2002 | A1 |
20020065946 | Narayan | May 2002 | A1 |
20020087479 | Malcolm | Jul 2002 | A1 |
20020103907 | Petersen | Aug 2002 | A1 |
20020120741 | Webb et al. | Aug 2002 | A1 |
20020124090 | Poier et al. | Sep 2002 | A1 |
20020133491 | Sim et al. | Sep 2002 | A1 |
20020133561 | O'Brien et al. | Sep 2002 | A1 |
20020194251 | Richter et al. | Dec 2002 | A1 |
20030045069 | Gilgen et al. | Mar 2003 | A1 |
20030046396 | Richter et al. | Mar 2003 | A1 |
20030056069 | Cabrera et al. | Mar 2003 | A1 |
20030191957 | Hypponen et al. | Oct 2003 | A1 |
20030195895 | Nowicki et al. | Oct 2003 | A1 |
20040044744 | Grosner et al. | Mar 2004 | A1 |
20040078419 | Ferrari et al. | Apr 2004 | A1 |
20040148382 | Narad et al. | Jul 2004 | A1 |
20040226010 | Suorsa | Nov 2004 | A1 |
20040230795 | Armitano et al. | Nov 2004 | A1 |
20050138204 | Iyer et al. | Jun 2005 | A1 |
20050251500 | Vahalia et al. | Nov 2005 | A1 |
20060195616 | Petersen | Aug 2006 | A1 |
20080066151 | Thomsen et al. | Mar 2008 | A1 |
Number | Date | Country |
---|---|---|
0 903 901 | Mar 1999 | EP |
903901 | Mar 1999 | EP |
1100001 | May 2001 | EP |
2004-523820 | Aug 2004 | JP |
2004523820 | Aug 2004 | JP |
WO 9739399 | Oct 1997 | WO |
WO 9749252 | Dec 1997 | WO |
WO 0244862 | Jun 2002 | WO |
WO 02095588 | Nov 2002 | WO |
Number | Date | Country | |
---|---|---|---|
20020103783 A1 | Aug 2002 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 09728701 | Dec 2000 | US |
Child | 10010959 | US |