1. Technical Field
The present disclosure relates in general to the field of data processing, and, in particular, to computers that utilize software files. Still more particularly, the present disclosure relates to scoring the health integrity of software files.
2. Description of the Related Art
At a high conceptual level, a computer can be understood as hardware that, under the control of an operating system, executes instructions that are in an application program. The application program manipulates data found in data files, which are persistently stored on devices such as hard disk drives. When the application is a database program, the files are known as “database files.” These database files are often maintained by a service provider and utilized by the service provider's customers.
Customers become frustrated when attempting to use a database file only to eventually discover that the database is damaged and cannot be used. A database file may be damaged because the definitional information of the file is corrupted. Alternatively, database files may have been corrupted a long time ago and the damage has remained hidden, only to suddenly surface through an interface that does not detect the damage but presents an odd assortment of information to the user. Examples of bizarre information surfacing are: Column headings in the format being overlaid with invalid data; Default Value structures being out of place; and/or a Structured Query Language (SQL) alias (long name) disappearing from the column/field definition. This corruption can be due to part of a file being damaged and disappearing, hardware problems causing partial information to be saved on a disk, or software problems resulting in bits and bytes being modified when they should not.
To address the problems described above associated with corrupted database files, the present invention presents a method, system and computer-readable medium for scoring the health of a database file. In a preferred embodiment, the method includes the steps of: retrieving a plurality of file attributes from a file in a database; determining if at least one of the file attributes is damaged; and creating a health score for the file based on what percentage of the file attributes for the file are damaged.
The above, as well as additional, purposes, features, and advantages of the present invention will become apparent in the following detailed written description.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further purposes and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, where:
With reference now to the figures, and in particular to
If the database file DOES exist (again at query block 110), then the file definition is processed by verifying that a file header of the file definition shows valid addresses to other objects that can be called to or from the database (block 116). This permits an initial health score (block 118) to be calculated. If there are too many invalid pointers (or addresses) to other objects in the main file definition, then the health score of the database is not acceptable (query block 120), and a “Low Health” message (block 122) is sent to DFHSP 248 for preparation of a final health score for the database file (block 124).
However, if the health score determined in block 118 is acceptable, then attributes of the database file are determined (block 126). Each file attribute is then processed to determine if that file attribute is valid or damaged (block 128). Attributes that are examined include, but are not limited to, the following ten attributes.
1. Addressability to Composite objects.
The database file is a composite object that can be used by multiple applications. To access the database, the applications must be able to read the following various information in the file attributes. In an exemplary database file such as the IBM® iSeries™ database file, these addressability attributes include, but are not limited to, the File Control Block (FCB), the File Constraint Space (FCS), the Trigger Definition Space (TDS), the Record Format (FMT), the Column Extension Space (CES), file directories, Member (MBR) name, Data Space, Indexes and associated space, and Group Space (GRPSPC).
FCB is defined as a file system structure that describes the attributes of the database file. Information in the FCB includes the name of the drive from which the database file was retrieved, the file name, the file type, implementation dependent (variable) information and record numbers.
FCS, TDS, and FMT are defined as internal parameters that are needed to call an Application Program Interface (API) that is used to access the database file.
CES, file directories, Data Space, Indexes and associated space, and GRPSPC are defined as various parameters that describe the structure, size and naming of files in the database.
MBR is defined as including the cursor and the Open Data Path (ODP). A member is one of several different sets of data, each having a same format, within the database file. The cursor is a controls structure that points to a row of data in the database file. The ODP is a control block that exists only when a file is open, and contains information about merged file attributes and information returned by input or output operations to the database file.
Offsets describe a number of measuring units from an arbitrary starting point in the database file to some other point in the database file. The offsets are evaluated to ensure that they actually point to some other point in the database file, and are not so large that they point to an address outside the database file. Furthermore, the offsets are evaluated to ensure that they do not go beyond the maximum Machine Interface (MI) object size. The maximum MI defines the size of the object that the offset is allowed to traverse. If an offset attempts to point too far up or down within an object, then the offset is deemed corrupt.
The names of objects (files, columns, formats, etc.) in the database file are evaluated for correctness and validity. That is, the names are evaluated to ensure that they are in the proper nomenclature format, and that they do not violate naming protocol (e.g., using a prohibited name, etc.).
The data in the database files are evaluated to ensure that they do not exceed their maximum allowed length, or, alternatively, do not meet their required minimum length. Data that is “too long”, or “too short” is assumed to be corrupted.
Different files may have distinct bit patterns, or attributes. For example, a physical file cannot have join file information. If a physical file has such attributes, then it is assumed to be in conflict, and thus invalid.
Pointers and addresses are examined to ensure that they are not NULL (no value) or contain addresses of destroyed (“erased”) data objects.
An SLIC database includes a dataspace object to actually hold data, cursors to point to the dataspace object for reading the data, and dataspace indexes that are used for lookups against entries in a directory. If any of these components are damaged, the SLIC database is unhealthy to some degree.
Constraints are essential requirements of a database file, including an object from which a unique resource set can be inherited. Constraints are evaluated to ensure compatibility between the SLIC database structure, the FCS, and data in a system's cross reference files.
Triggers are defined as code that causes a trigger application, which accesses the database files, to execute. Triggers are evaluated to ensure that 1) the trigger application exists and 2) that the trigger application matches a trigger definition in the TDS.
Data Link information must exist in the Data Link File Manager (DLFM) for a file with FILE LINK CONTROL.
Referring again to
Besides a raw pass/fail score, each failed file attribute will be returned with additional data for evaluation/correction purposes. This data will include:
If there is not any file attribute corruption at all (query block 134), then a “perfect health” score message is sent (block 136) to an evaluation program, such as DFHSP 248 shown in
With reference now to
Database file server 202 is able to communicate with a client computer 250 via a network 228 using a network interface 230, which is coupled to system bus 206. Network 228 may be an external network such as the Internet, or an internal network such as an Ethernet or a Virtual Private Network (VPN). Client computer 250 requests and utilizes database files 254, which are stored in the hard drive 234 of the database file server 202, from database file server 202.
A hard drive interface 232 is also coupled to system bus 206. Hard drive interface 232 interfaces with the hard drive 234, which, as described above, stores the database files 254 that are the subject of the database file scoring described above.
In a preferred embodiment, hard drive 234 populates a system memory 236, which is also coupled to system bus 206. System memory is defined as a lowest level of volatile memory in database file server 202. This volatile memory includes additional higher levels of volatile memory (not shown), including, but not limited to, cache memory, registers and buffers. Data that populates system memory 236 includes database file server 202's operating system (OS) 238 and application programs 244.
OS 238 includes a shell 240, for providing transparent user access to resources such as application programs 244. Generally, shell 240 is a program that provides an interpreter and an interface between the user and the operating system. More specifically, shell 240 executes commands that are entered into a command line user interface or from a file. Thus, shell 240 (as it is called in UNIX®), also called a command processor in Windows®, is generally the highest level of the operating system software hierarchy and serves as a command interpreter. The shell provides a system prompt, interprets commands entered by keyboard, mouse, or other user input media, and sends the interpreted command(s) to the appropriate lower levels of the operating system (e.g., a kernel 242) for processing. Note that while shell 240 is a text-based, line-oriented user interface, the present invention will equally well support other user interface modes, such as graphical, voice, gestural, etc.
As depicted, OS 238 also includes kernel 242, which includes lower levels of functionality for OS 238, including providing essential services required by other parts of OS 238 and application programs 244, including memory management, process and task management, disk management, and mouse and keyboard management.
Application programs 244 include a browser 246. Browser 246 includes program modules and instructions enabling a World Wide Web (WWW) client (i.e., database file server 202) to send and receive network messages to the Internet using HyperText Transfer Protocol (HTTP) messaging, thus enabling communication with client computer 250. In one embodiment of the present invention, client computer 250 and software deploying server 252 may each utilize a same or substantially similar architecture as shown and described for database file server 202.
Also stored with system memory 236 is a Database File Health Score Program (DFHSP) 248, which includes some or all software code needed to perform the steps described in
The hardware elements depicted in database file server 202 are not intended to be exhaustive, but rather are representative to highlight essential components required by the present invention. For instance, database file server 202 may include alternate memory storage devices such as magnetic cassettes, Digital Versatile Disks (DVDs), Bernoulli cartridges, and the like. These and other variations are intended to be within the spirit and scope of the present invention.
Note further that, in a preferred embodiment of the present invention, software deploying server 252 performs all of the functions associated with the present invention (including execution of DFHSP 248), thus freeing database file server 202 from having to use its own internal computing resources to execute DFHSP 248.
It is to be understood that at least some aspects of the present invention may alternatively be implemented in a computer-useable medium that contains a program product. Programs defining functions on the present invention can be delivered to a data storage system or a computer system via a variety of signal-bearing media, which include, without limitation, non-writable storage media (e.g., CD-ROM), writable storage media (e.g., hard disk drive, read/write CD ROM, optical media), and communication media, such as computer and telephone networks including Ethernet, the Internet, wireless networks, and like network systems. It should be understood, therefore, that such signal-bearing media, including but not limited to tangible computer-readable media, when carrying or encoded with a computer program having computer readable instructions that direct method functions in the present invention, represent alternative embodiments of the present invention. Further, it is understood that the present invention may be implemented by a system having means in the form of hardware, software, or a combination of software and hardware as described herein or their equivalent.
Thus, in one embodiment, the present invention may be implemented through the use of a computer-readable medium encoded with a computer program that, when executed, performs the inventive steps described and claimed herein.
The current disclosure thus presents a computer-implemented method, system and computer-readable medium for health scoring a database file. In a preferred embodiment, the method includes the steps of: retrieving a plurality of file attributes from a file in a database; determining if at least one of the file attributes is damaged; and creating a health score for the file based on what percentage of the file attributes for the file are damaged. In one embodiment, at least one of the file attributes is an addressability attribute, which includes a File Control Block (FCB) and a Member (MBR), wherein the MBR includes a cursor and an Open Data Path (ODP). In another embodiment, at least one of the file attributes an offset that points to a descendent object of the file, wherein the offset is limited to a pre-determined maximum Machine Interface (MI). In another embodiment, the method further includes the steps of extracting a file address from a file definition interface to determine if the file exists; and processing a main file definition to ensure that addresses to other objects called by the file are valid.
When the method is implemented by execution of computer-executable instructions stored on the computer-readable medium, the computer executable instructions are deployable from a software deploying server to a database file server that is at a remote location, preferably in an on-demand basis.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.