Method and apparatus for information storage and retrieval

Information

  • Patent Grant
  • 4276597
  • Patent Number
    4,276,597
  • Date Filed
    Thursday, January 17, 1974
    51 years ago
  • Date Issued
    Tuesday, June 30, 1981
    43 years ago
Abstract
Method and apparatus for identifying particular desired information bearing records having desired predetermined identifiable characteristics from a set of such records in a base data file. A special retrieval file including arrays of binary coded elements is produced and maintained from the information content of the base data file. Each array of the retrieval file corresponds to a particular predetermined identifiable characteristic of language structure potentially present in or associated with the set of records concerned and each element in such an array corresponds to and is representative of the address or location of a particular one of the records in the base data file. The elements are binary coded to represent the presence or absence of the predetermined identifiable characteristics of language structure associated with that particular array in the corresponding record. Furthermore, the set of predetermined identifiable characteristics is itself chosen, in one exemplary embodiment, to represent the alphabetic value and relative sequential location of information characters in associated groups of characters such as words contained in the records. In this manner, the retrieval file itself represents an irreversible information compression of the language structure and/or information contained in the set of information bearing records.To locate any particular desired record, the retrieval file is first searched by identifying and selecting those arrays representing desired predetermined identifiable characteristics of language structure and comparing the binary values of respectively corresponding elements in the selected arrays thus identifying which records in the base data file have all the desired predetermined identifiable characteristics of language structure. Once the desired records in the base data file have been identified in this manner, they are then selected and displayed, copied, etc., as desired to provide the requisite access or retrieval of information that had previously been stored in the base data file. Particular choices and variations in the selection of the set of predetermined identifiable characteristics of language structure to be represented by the arrays in the retrieval file will change the search and retrieval characteristics, capabilities, flexibility, etc., of the system as may be desired for particular types of record sets and particular types of base data file formats, etc.
Description

This invention generally relates to the art of information storage and retrieval with special emphasis on the retrieval aspects of an overall information storage and retrieval system.
The problem and art of retrieving or selecting particular desired records from a set of stored records is an old one and must date back to early times when information bearing records began to be accumulated and stored in sets.
Perhaps one of the oldest and better known techniques is to organize the record set itself in some predetermined manner related to the expected retrieval process so that the record set can itself be manipulated to locate desired records. One simple example of such a system would be the segregation of documents in a typical household filing system so that records relating to current bills are in one file folder, those relating to this year's tax records are in another file folder, etc. In such a system, one may access or retrieve particular documents or records by first sorting through all of the possible catagories of segregated documents and selecting those which would probably be most closely related to the particular document desired and then physically manipulating each record in those particular segments of the overall collection of records until the particular desired document is discovered.
In this technique, retrieval accuracy is something less than 100% (unless every record of the entire set is always searched) and retrieval precision is obviously a matter of chance.
Another variation in prior known storage and retrieval systems involving hierarchical organizations of the information bearing records would be the alphabetical organization of topics such as are found in the usual encyclopedia. Here, one must first identify a "key word" relating to the topic of interest and then access the key word organized data file with his knowledge of the alphabet to locate that particular section of the file relating to that particular key word. Depending upon the complexity and extent of information that is to be retrieved in this manner, one or more such manipulations of the information bearing records may suffice.
However, as the complexity of the information stored on each individual record increases and as the number of records increase and as the complexity of the desired accessibility of such records increases, retrieval methods requiring direct manipulations of the information bearing records become cumbersome, time-consuming and counterproductive. For instance, one may often want to retrieve particular records from the data file having some common characteristics that were never before considered to be particularly relevant or important and which were not specifically taken into account when the records were first organized and stored. Thus, there is a need for a more generalized method of retrieval that does not involve the the necessity of manipulating the entire base data file each time one wants to search that data file for particular records having desired predetermined characteristics that might never have before been considered important.
To help meet such needs in the past, many different approaches have been attempted and some have worked with various degrees of success for particular applications. For instance, we are all familiar with the device commonly known as an index whereby the subject matter or information contained in a set of records is carefully classified according to a predetermined set of topics. The topics included in the index may of course be quite detailed and, in fact, there is a whole science of taxonomy which can be called upon to help in organizing the hierarchical structure of such an indexing arrangement. However, in practical effect, the index is just a more sophisticated form of the earlier discussed more primitive system whereby the records themselves are physically segregated into sections related to particular topics. Of course, with an index system, the set of records is itself separate and apart from the index and is organized in some predetermined manner such as by sequential numbers, etc. To use such a system of information storage and retrieval, one must take his special and perhaps unique desire for information and attempt to fit it within the predetermined taxonomic organization of the preexisting index and attempt to locate those index topics most closely related thereto. A cross-reference may then be had to the particular page numbers or sequence numbers, etc., of the source documents themselves for access. Of course, since the source documents themselves are not actually included within the index, it is possible to have reference at more than one place in the index to the same source document or documents. However, unless some anticipated and thus already indexed retrieval inquiry is involved, such index systems have obvious limitations.
Besides the simple indexes which are in daily use by the general public, there are, of course, much more sophisticated versions of such indexes used in computerized search and retrieval systems. However, all such known indexing systems are an inherent compromise between the optimum and the practical since the taxonomic organization of the index is necessarily fixed with respect to a whole population of information bearing records and thus not uniquely tailored to any one of them nor is the taxonomic organization of such an index uniquely related to the unknown future requirements of those who will use the index. At best, it is a compromise hopefully representing a usable practical interface between the user and the actual base data file of information bearing records.
Another known attempt to interface a large set of information bearing records with the ultimate user involves the so-called abstracting and/or key word systems wherein abstracts of each information bearing record and/or selected key words from the text of such information bearing records are organized in a smaller abstract or key word file which can be more quickly searched than the entire voluminous file of information bearing records. Often, the actual file of information bearing records are not in machine readable form but are located in an archive someplace. Often, the abstract file or file of key words, etc., together with appropriate pointers to the actual location of corresponding information bearing records in the archive is in machine readable form. Thus, the abstract or key word file may be machine accessed so that the user has to merely supply special key words, etc., whereupon a computer search of the smaller abstract or key word file is initiated in the hope of locating particular corresponding records in the archive having those key words therein.
As can be appreciated, this technique actually involves physical manipulation of the abstract or key word file and thus if this file reaches significant proportions, the search itself may take a significant time and become cumbersome, etc., subject to the same infirmities as the most primitive system wherein all of the information bearing records are themselves manipulated in order to search for and retrieve particular desired ones of those records. Furthermore, these systems are subject to additional infirmities in that the user of the system may not always associate the same key words with a particular information bearing record as did the particular person who wrote the abstract or abstracted the key words from the original text of the document in question. As will be appreciated, the abstract or key words associated with a particular document would no doubt be selected to represent what was then thought to be the significant aspects of that particular document. However, as is often the case, some subsequent user may be searching for that same particular document for a quite different reason which would appear completely secondary and perhaps even unimportant in the context of earlier thought as to what was significant about that particular document.
Many manual, semi-automatic and even automatic systems have been devised to aid in the art of information storage and retrieval in the past. For instance, U.S. Pat. No. 3,354,467 issued in 1967 to Beekley shows a machine for automatically comparing superimposed binary coded tapes wherein each tape represents some particular predetermined characteristic potentially associated with particular documents or information bearing records in an archive file. A selected set of such tapes representing a desired set of such predetermined characteristics is selected and they are simultaneously fed through a photoelectric scanning station such that only those documents having all of the desired predetermined characteristics are identified as to location within the archive file. A somewhat similar semi-automatic or even manual indexing system was involved in the system earlier offered by McBee Systems wherein an abstract card or the like was coded with notches around the periphery of the card and then selected by "pinning" those cards having holes instead of notches at some particular location around the periphery of the card.
However, it will be noted that none of these prior systems have utilized the basic language structure of the information contained on the set of information bearing records to construct a retrieval file which is uniquely custom fitted to each and every one of the documents or records in the base data file and which can therefore be utilized by a user in a manner which is uniquely and custom fitted to his peculiar information retrieval requirements without regard to whether or not those information retrieval requirements necessarily fit within the taxonomic organization of some preexisting index system. It is an object of this invention to provide such a unique and custom fitted retrieval file capability.
That is, the retrieval file of this invention is structured so as to optimally interface between a base data file and a user of that file. It takes into account the individualized language structure for the information content of each record in the base data file. The basic coding or generation of the retrieval file for this invention does not require the classification of the information content as being of this particular type or as of that particular type, although such traditional taxonomic classification may be conveniently incorporated within the retrieval file of this invention if desired.
Rather, in the exemplary embodiments to be described in more detail below, the basic retrieval file is coded so as to take into account the language structure of the information content of the records. For instance, the alphabetic value of informational characters in each record and the relative sequential location of such character values in associated groups of characters such as words in those records are utilized in one preferred exemplary embodiment. Thus, although the resulting retrieval file is irreversible (except in the most simple cases) it is still uniquely representative of the language structure used in the entire informational content of the corresponding information bearing record. If one wants to think of the system of this invention in terms of "key words", then every word in every record would constitute a "key word" in that the language structure of all such words would be coded in the retrieval file. Thus, the user is not restricted to some other persons prior choice of "key words".
Of course, if desired, one may combine the retrieval file and/or other teachings of this invention with earlier types of systems to provide a modified and greatly improved version of such earlier systems. For instance, a very large abstract or key word file itself may be utilized for generating a retrieval file according to this invention rather than using the full text of the information bearing records. In such a case, the retrieval file will reflect, of course, only the full information content of the abstract file and/or key word file from which it was coded. However, it will now be considerably more simple to search the retrieval file of this invention than to perform a sequential search of a more lengthy abstract or key word file, etc., thus making the marriage of the two systems a profitable one for certain applications.
In general, the retrieval file of this invention comprises a plurality of arrays of binary coded elements. Each such array is organized to include a binary coded element respectively corresponding to the address or location of each record in the base data file. In addition, each array in the retrieval file is assigned to correspond to a predetermined identifiable characteristic of language structure potentially present in or associated with such records. In one of the exemplary embodiments, those predetermined identifiable characteristics are themselves specially chosen to represent the alphabetic value and relative sequential location of informational characters in the text of the information bearing records in the base data file for associated groups of characters such as words, etc.
Each binary coded element in any given array is assigned a predetermined binary value to represent the presence or absence of the predetermined identifiable characteristic represented by the given array in the particular record represented by each element. In this manner, the arrays of binary coded elements comprising the retrieval file represent an irreversible data compression of whatever information from the records that has been used to generate the retrieval file. In the preferred embodiments, the full text of each record is utilized for such coding purposes so that the retrieval file itself represents an irreversible data compression of the entire full text of each record in the base data file. On the other hand, if only an abstract or key word set from the information bearing records is utilized for coding a retrieval file according to this invention, then the retrieval file will likewise represent an irreversible data compression of this more limited amount of information from such records.
Once the retrieval file has itself been generated directly from the information bearing records or from extracts thereof, etc., the retrieval file may be utilized by identifying those predetermined identifiable characteristics (i.e. the particular arrays of the retrieval file) associated with desired search or retrieval inquiry data (i.e. any particular word or groups of words, etc., thought to be in the text of the sought after record or its extract, etc.). Those particular arrays are then selected and the binary values of respectively corresponding elements in the selected arrays are compared to identify which records (actually the location or addresses of such records) in the base data file have all the desired predetermined identifiable characteristics.
To help in understanding the basic functioning of the invention, it is helpful to reexamine some of the basic characteristics of language structures. Using the English language as an example, it is readily apparent that almost all written information involves considerable redundant usage of a very limited number of alphabetic character values. Of course, besides character values per se, there are other important features of our written language such as the relative sequence of character values, upper and lower case differentiation of the alphabetic character values, punctuation, bold face type, italicized type, the size of printed type, etc. There are also word length differences (i.e. groups of characters associated together in different numbers).
On the typical page of a printed book there may be something on the order of 3,000 characters so that if each of the 26 alphabetic characters of our alphabet are equally used on such a page, there would be about 115 redundancies with respect to each alphabetic character if one were to look only at the alphabetic value of characters. Thus, it is clear that the real information conveying content in any document or record utilizing such a written language is critically dependent upon other language structures such as word length, character sequences, type case, hyphenation, etc.
Thus, this invention is directed to an information and storage and retrieval system wherein the retrieval file is custom fitted or keyed to the language structure of the information contained in each record of the data base fle. At the same time, the retrieval file of this invention inherently masks language structure redundancies and does not necessarily reference all the information contained in such a document. In short, the exemplary embodiment of the retrieval file is organized so as to conveniently represent at least some of the alphabetic values of informational characters and their relative sequential location in associated groups of characters such as words contained in the records of the base data file. There are many possible ways to take the alphabetic values and sequential location of such values into account. However, in the exemplary embodiment of this invention, the redundance of character value occurrences within words of various lengths has been utilized to greatly compress the storage of such character values and sequential location data in the retrieval file.
That is, for each letter of the alphabet, there is a specific position of occurrence for each possible character value for each possible word length. Thus, the letter "A" can be a single letter of a single letter word; it can be the first letter of a two-character word or the second letter of a two-character word; it can be the first letter of a three-character word, etc. Thus, a binary coded matrix of character value versus character position within words having particular numbers of characters can be formed and this process can be repeated for any word length. However, it has been found that it is not necessary to include such coding in the retrieval file for all word lengths.
In fact, depending upon the particular application involved, it may often be advantageous to simply make the matrix representative of character value versus character position within a word regardless of the word length. In effect, this would amount to the superposition of and hence further compression of data from the first discussed organization of the matrix. From a practical standpoint, even if the first mentioned more detailed matrix organization is used, it is usually only desirable to include data in the matrix with respect to character values up to 6 or 7 letter words. In other words, it is only necessary in the usual cases to keep track of character values and/or character value sequences for the first few characters of each word to produce an acceptable and usable information retrieval capability based on the language structure of information in the base data file of records.
As an example of the information compression involved in such a system, it should be recognized that an entire set of such predetermined language structure characteristics up to the maximum word length of seven characters would involve only 728 binary bits (28 bits per character for relative position information times 26 possible character values) so that for information retrieval purposes, the entire information content of an information bearing record would, in this simple example, be compressed to 728 binary bits of information. Of course, redundancies in character values and sequential locations would be irreversibly lost in the coding process but, as will be seen in the more detailed description given below, this loss of redundancy will not seriously affect most usages of the retrieval file and whatever loss of precision that is caused by such an organization of the retrieval file can be compensated for by other techniques.
Another possible example of coding representations of the alphabetic value of informational characters and their relative sequential locations within words would be to note the occurrence of alphabetic values as even or odd sequential positions within even or odd length words. For instance, the occurrence of the letter "A" at an even position within an odd word length; as an even position within an even word length; as an odd position within an odd word length; and/or as an odd position within an even word length, etc.
As will become more apparent, there are trade offs to be made in deciding how to structure the retrieval file of this invention. For instance, if more detailed and complete language structure information is maintained with respect to character value and/or sequential location within words, then more precision can be obtained in accessing the base data file. On the other hand, more arrays are needed in the retrieval file to maintain such detailed information.
As an example of some of the types of possible limitations of the exemplary embodiment of this invention with respect to the precision of retrieval, consider the following example. Assume that one of the documents in the base data file includes the word "BROWN" and the word "BLACK". Since these words are both five-character words in length, the initial B character value for each word represents a redundancy in information and this would be represented by a single binary bit in the retrieval file showing only that there was at least one word in this document having five characters total length and a first character of value "B". The remaining characters of each word would of course also be binary coded to represent their value and position sequence within these two five-character words. Thus, when the retrieval file is used for retrieval purposes, it would be absolutely 100 percent accurate in that these particular documents would always be indicated as including the word "BROWN" or the word "BLACK" respectively. However, the precision of retrieval is not necessarily 100 percent since there are at least the following three search words which would also result in the selection of that document: "BLOWN", "BRACK", and "BRAWN". That this is so can be seen since all are five letter words and include an initial letter character value of B. Furthermore, the word "BLOWN" is a five-character word having a second character value of L (this characteristic would have been coded because the word "BLACK" was coded from the original document); a third character value of "O" (this value would have also been coded because the word "BROWN" was found in the original document); a fourth character value of "W" (also resulting from the word "BROWN") and a fifth character value of "N" (also resulting from the word "BROWN".
Thus, so far as the user of the system is concerned, one would obtain responses from inquiries related to any of the five words BROWN, BLACK, BLOWN, BRACK or BRAWN although only the words BROWN and BLACK are actually found in the original documents. Accordingly, although the retrieval would be absolutely accurate (that is the required document would always be accessed whenever the words "BROWN" or "BLACK" were used for inquiry) it would not be 100 percent precise in that either or both documents might also be erroneously withdrawn in response to three other spurious inquiry words.
The degree of precision is, of course, a function of the particular algorithm used to chose the language structure characteristics such as alphabetic value and relative sequential location representations in the retrieval file. For instance, the further data compression achieved by taking into account only the odd-even word lengths and odd-even character positions and values therein would further compound such a "mis-hit" problem. On the other hand, other techniques such as the inclusion of upper and lower case as uniquely identifiable character values would increase the precision and restrict the number of improper possible combinations.
In short, although the retrieval technique of this invention is 100 percent accurate, its precision may possibly be less than that unless care is taken in choosing the language structure characteristics used in the retrieval file because some specific character sequence data contained in the original text is lost (in one exemplary embodiment) in the conversion to the retrieval file. This is, of course, necessary in order to reduce the amount of information in the retrieval file from that actually on the original documents. If all of the character value and sequence information were to be retained in the retrieval file, then the file would be reversible and would, in actuality, contain all of the information in the original document. However, since the retrieval file is itself used only to look up and find the location of the actual document in question, it is not desirable to include the entire detailed informational content of the original documents in the retrieval file as should be apparent. Thus, in many cases, it will be necessary to tolerate some small controllable amount of precision less than 100 percent. In fact, in some cases, a lack of precision is desirable. For instance, when the user is not too sure exactly what is to be retrieved (a telephone directory assistance operator without full name or other information), a lack of precision retrieval may actually assist in quickly accessing the actual desired record.
In the exemplary embodiment to be described, each word of data or block of text in a record of the base data file is handled in a uniform manner. All hypens are considered as spaces, all apostrophes are considered as spaces, all other punctuation is ignored. Any capitalized word will be considered containing all capitalized letters and all words starting with lower case letters will be considered lower case throughout. Words containing both numerals and letters will have their word length determined by the total number of numerals and alphabetic characters. However, the position of each digit and alphabetic character will be assigned at their nominal positions within such an overall word.
The information storage and retrieval technique of this invention may be conveniently practiced manually, semiautomatically, automatically with special purpose equipment, automatically with specially programmed and conditioned general purpose equipment, etc., as will be described in more detail below.





A more complete understanding and appreciation of this invention may be obtained by reading the following detailed description in conjunction with the accompanying drawings, of which:
FIG. 1 is a schematic depiction of an exemplary embodiment of an information storage and retrieval system incorporating this invention;
FIG. 2 is a more detaled schematic depiction of the binary coded retrieval file for the exemplary embodiment disclosed in FIG. 1;
FIG. 3 is a binary coded matrix showing explicitly the binary coding required for an exemplary embodiment of this invention with respect to a particular text reproduced in the detailed description given below;
FIG. 4 is a composite schematic showing of a few of the arrays in an exemplary system of binary coded arrays corresponding to the binary coding shown in matrix form at FIG. 3;
FIG. 5 is a schematic diagram of an exemplary embodiment of apparatus for practicing an embodiment of this invention;
FIG. 6 is a schematic diagram of another apparatus for practicing an exemplary embodiment of this invention;
FIG. 7 is a block diagram of the conversion, file construction and retrieval processing for data in an exemplary embodiment of this invention;
FIG. 8 is a more detailed block diagram of the conversion block shown in FIG. 7;
FIG. 9 is a more detailed block diagram of Program No. 1 shown in FIG. 8;
FIGS. 10-17 are block diagrams of subroutines entered from Program No. 1 shown in FIGS. 8-9;
FIG. 18 is a more detailed block diagram of Program No. 2 shown in FIG. 8;
FIGS. 19-24 are block diagrams of subroutines entered from Program No. 2 shown in FIGS. 8 and 18;
FIG. 25 is a more detailed block diagram of the file construction block shown in FIG. 7;
FIG. 26 is a schematic representation of the intermediate retrieval file format used in the retrieval file construction;
FIGS. 27-28 show the magnetic disk files organization for the exemplary embodiment of FIGS. 7 et. seq.;
FIGS. 29-32 together constitute a block diagram of Program No. 5 shown in FIG. 25;
FIGS. 33-43 are block diagrams of subroutines entered from Program No. 5 shown in FIGS. 25 and 29-32;
FIG. 44 is a more detailed block diagram of Program No. 3 shown in FIG. 25;
FIGS. 45-52 are block diagrams of subroutines entered from Program No. 3 shown in FIG. 44;
FIG. 53 is a more detailed block diagram of Program No. 4 shown in FIG. 25;
FIGS. 54-62 are block diagrams of subroutines entered from Program No. 4 shown in FIG. 53;
FIG. 63 is a more detailed block diagram of the retrieval processing block shown in FIG. 7;
FIGS. 64-68 together constitute a block diagram of Program No. 6 shown in FIG. 63;
FIGS. 69-84 are block diagrams of subroutines entered from Program No. 6 shown in FIGS. 64-68;
FIG. 85 is a schematic/block diagram of a semiautomated exemplary embodiment of this invention;
FIGS. 86-90 are enlarged photographs of PIC arrays for A.sub.1,1 ; A.sub.1,2 ; A.sub.2,2 ; A.sub.1,3 and A.sub.2,3 respectively for the exemplary embodiment of FIG. 85;
FIG. 91 is an enlarged photograph of a composite array formed by a Boolean AND operation on the arrays of FIGS. 87 and 88; and
FIG. 92 is a block diagram generally showing the interrelationship of computer programs for use in the semiautomated FIG. 85 embodiment to automatically construct the retrieval file arrays.
The following detailed discussion will be generally organized in three sections. The first section deals with generalized concepts and features of an exemplary embodiment of the invention with special emphasis on the binary coded retrieval file, its organization and content, etc. The second section will deal with a detailed description of a specific exemplary embodiment of the invention that has actually been tested on existing general purpose digital computing equipment for accessing telephone directory records in a machine readable base data file. Finally, the third section of this detailed description will relate to an exemplary embodiment of special purpose apparatus for practicing another exemplary embodiment of the invention.
GENERALIZED DISCUSSION OF AN EXEMPLARY EMBODIMENT
Referring to FIG. 1, a base data file 100 of accessible records stored at known addresses is shown. Here, the addresses are indicated as 4 unique digits associated with each record and the first 4 documents are shown in FIG. 1 together with a corresponding indication of their respectively corresponding 4 digit address numbers. This base data file may comprise an already existing archive of records; it may comprise a machine readable and machine accessible file of records such as might be stored on magnetic tape, magnetic disc, magnetic core, etc; it may comprise a collection of photographic images such as on microfilm or microfiche stored in conventional equipment such that any given document may be rapidly and automatically retrieved for display upon supplying the appropriate address information, etc. In short, the base data file 100 may comprise any accessible file of records wherein each record may be referenced by some unique address, location, or other equivalent pointer.
This base data file 100 is then utilized directly as indicated at 102 to construct binary coded arrays comprising a retrieval file 104 of such binary coded arrays. Of course, as previously mentioned, modifications of the exemplary embodiment may also be made wherein only extracts such as abstracts or selected key words, etc., from each record are utilized at 102 to construct the retrieval file 104. Exemplary processes and apparatus for constructing the binary coded arrays as indicated at 102 will be further detailed below. Of course the arrays could also be manually constructed as will be apparent. The result of such a construction or generation process will be a plurality of arrays of binary coded elements some of which are schematically illustrated in FIG. 1 within block 104. An enlarged and more detailed version of this portion of FIG. 1 is shown in FIG. 2.
It should be recognized that each array in the retrieval file corresponds to some predetermined identifiable characteristic of language structure (hereafter referenced merely as a "PIC"). Furthermore, there is an element or area of each array which corresponds to each and every one of the correspondingly associated record addressed in the base data file.
For instance, as may be seen more clearly in FIG. 2 where element address numerals have been added for clarification, there is an element in each of the arrays which corresponds to the address of record numbers 0000; 0001; 0002; 0003; etc. The cross-hatched elements are the ones that have been shown, for purposes of illustration, as having received a binary valued code representing the presence of a corresponding PIC (corresponding to the particular array) in the respectively corresponding record. For instance, the record stored at location or address 0201 is indicated by the cross-hatching as including PIC #1 while the record located at address 0202 is indicated as having an opposite binary value or code indicating the absence of PIC #1 from that particular record.
Of course, the particular set of PIC's are chosen as a function of the language structure of the information contained in the base data file. In the exemplary embodiment, for retrieving from a base data file comprising English language written records such as newspaper clippings, photo captions from newspaper clippings, etc., the set of PIC's are chosen to represent the alphabetic value of informational characters in the records and to represent the relative sequential location of such character values in associated groups of characters such as words contained within the records. In particular, the PIC's are chosen to correspond to the alphabetic value of the first, second, et. seq. character positions in words having one, two et. seq. total characters therein.
This exemplary technique for choosing the set of PIC's may be illustrated easily by the following matrix representation wherein each PIC represents an entry in the matrix in the form of .phi..sub.m,n where .phi. equal the alphabetic value; m equals the character position number within a word and n equals the word length. In this exemplary embodiment, .phi. takes on the values a through z; 0 through 9; and A through Z:
__________________________________________________________________________ Predetermined Identifiable .phi. = Alphabetic Value Characteristic = .phi..sub.m,n m = Character Position No. n = Word Length.phi..sub.1,1 .phi..sub.1,2 .phi..sub.1,3 .phi..sub.1,4 .phi..sub.1,5 .phi..sub.1,6 .phi..sub.1,7 .phi..sub.1,8 .phi..sub.1,9 .phi..sub.1,10 .phi..sub.1,11 .phi..sub.1,12 .phi..sub.1,13 .phi..sub.1,14 .phi..sub.1,15 .phi..sub.1.16 .phi..sub.1.17 .phi..sub.1.18 .phi..sub.1,19 .phi..sub.1,20 . . . .phi..sub.2,2 .phi..sub.2,3 .phi..sub.2,4 .phi..sub.2,5 .phi..sub.2,6 .phi..sub.2,7 .phi..sub.2,8+ -- -- -- -- -- -- -- -- -- -- -- -- .phi..sub.3,3 .phi..sub.3,4 .phi..sub.3,5 .phi..sub.3,6 .phi..sub.3,7 .phi..sub.3,8+ -- -- -- -- -- -- -- -- -- -- -- -- .phi..sub.4,4 .phi..sub.4,5 .phi..sub.4,6 .phi..sub.4,7 .phi..sub.4,8+ -- -- -- -- -- -- -- -- -- -- -- -- .phi..sub.5,5 .phi..sub.5,6 .phi..sub.5,7 .phi..sub.5,8+ -- -- -- -- -- -- -- -- -- -- -- -- .phi..sub.6,6 .phi..sub.6,7 .phi..sub.6,8+ -- -- -- -- -- -- -- -- -- -- -- -- .phi..sub.7,7 .phi..sub.7,8+ -- -- -- -- -- -- -- -- -- -- -- -- .phi..sub.8,8+ -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --__________________________________________________________________________ .phi. = a.fwdarw.z; .phi..fwdarw.9; A.fwdarw.Z
It will be appreciated that such a matrix has an infinite possible number of entries. However, as indicated in the above representation of the matrix, it has been found sufficient to truncate the matrix and use only a very few of the potential PICs represented by such a matrix. For instance, in the above representation only those 48 entries actually shown are utilized for this exemplary embodiment. Furthermore, it should be noted that the last 7 entries of the last full column of entries in the above matrix has been truncated so as to, in effect, include an overlay of any potential entries to the right of that element in that particular row. For instance, the entry .phi..sub.2,8+ indicates that this particular PIC represents the second character of any word having 8 or more characters therein.
Accordingly, in the exemplary embodiment utilizing only 48 different matrix entries, and where .phi. takes on any one of the 62 possible character values noted above, it follows that there are 2,976 PICs in all. (i.e., 48.times.62=2,976).
Thus, in the exemplary embodiment depicted by FIGS. 1 and 2, there would actually be 2,976 arrays wherein array No. 1 might correspond to PIC a.sub.1,1 ; array No. 2 might correspond to PIC a.sub.1,2 ; array No. 3 might correspond to PIC a.sub.1,3 ; array No. 4 might correspond to PIC a.sub.1,4 ; etc., on through the remaining 2,972 PICs. As should now be appreciated, in this exemplary embodiment, since there are 2,976 PICs in total, the retrieval file will comprise 2,976 possible binary bits for each record in the base data file.
Accordingly, if the base data file comprises 100,000 records, the retrieval file will, in turn, comprise 297.6 million bits of binary coded information. Of course, as should now be appreciated, one can selectively reduce the number of PICs in the retrieval file for a given application unless an unacceptable or undesirable loss in retrieval precision would accompany such a reduction or reorganization of PICs for the particular data base in question.
As should also be appreciated, the retrieval file might comprise machine readable data such as magnetically encoded on typical conventional magnetic disc drives, magnetic tapes, magnetic core storage arrays, etc. Furthermore the retrieval file may be both machine readable and humanly readable if it is stored in the form of arrays of photographic images, for example wherein each element is coded as the presence or absence of a transparent or non-transparent area in a film, etc.
In any event, once the retrieval file has been constructed, it may be utilized for accessing desired ones of the records in the base data file 100. For example, the input search data depicted at 106 might form still another (but much smaller) file, a search data file 108. Typically, the input search data would comprise a portion of a word, a whole word or a group of words which the user expects to be found within the record he desires to access.
Accordingly, the search data file may itself be processed in much the same manner as was the base data file in the original construction of the retrieval file. That is, the search data file is analyzed with respect to the same set of PICs represented by the set of arrays in the retrieval file 104. Once the particular ones of these PICs present in the search data file have been identified, then the corresponding arrays from the retrieval file representing these particular PICs are selected and individual elements or binary values thereof in all of the selected arrays are compared with one another to identify the addresses of records in the base data file having all the desired PICs.
For example, as shown in FIG. 2, the record stored at location or address 0304 is the only record having all of PICs 1 through 4. On the other hand, the record stored at location or address 0000 does not have PIC 3 but does have PICs 1, 2 and 4. As should now be apparent, the desired set of PICs identified by processing the input search data 106, 108 corresponds to a set of arrays in the retrieval file and by machine or otherwise comparing the binary values of corresponding elements in each array of the set, it may be easily determined that only records having certain now-identified addresses comply with the request or inquiry at 106. Of course, the inquiry information at 106 may well effectively request the absence of some particular PIC as well as the presence of some particular PIC, etc., asshould be appreciated.
Acordingly, as indicated in FIG. 1 at 110, the desired retrieval arrays are selected and compared to identify addresses of desired records whereupon the accessible base data file 100 in accessed as indicated at 112 in FIG. 1 to select the corresponding records and to display them as indicated at 114. Of course, the display may be in a form of a projection of a photographic image, a cathode ray tube or other electronic display of machine readable records, hard copy generated either from machine readable or photographic image records, etc.
Once the input search data 106 is presented, it is thus only necessary to identify the particular PICs corresponding thereto, to select the arrays corresponding to these PICs to compare the corresponding elements of the selected set of arrays thus identifying the location of desired records in the base data file and to then select and display the corresponding records.
If desired, all of these steps may be manually performed. Alternatively, some or all may be machine implemented. For instance, the previously referenced U.S. Pat. No. 3,354,467 to Beekley teaches apparatus that can be utilized to automatically perform the comparison step of these search and retrieval processes.
In the preferred embodiment and preferred mode of the invention, the identification of desired PICs, selection and comparison of corresponding arrays and identification of the addresses of desired records is all automatically accomplished by a programmed general purpose data processor in conjunction with conventional input-output devices and digital information storage devices. Furthermore, in the preferred embodiment of the invention, the accessible base data file or records 100 is itself a conventional storage device which may be automatically accessed by the programmed general purpose data processor such that the selection of a particular record at the identified address and its display are also automatically performed.
As a specific example of the above-described process for constructing the binary coded arrays in the retrieval file, FIGS. 3 and 4 have been prepared as a specific example showing the coding of the following text which is set forth here as an example of the text of any one of the records in the base data file 100 in FIG. 1. This text may, of course, be of any given length but will, in the exemplary embodiment, always be encoded in binary form in the retrieval file depicting the presence or absence of each of the individual 2,976 PICs. For instance, as shown in FIG. 3, only a relatively few of the 2,976 potential PICs are present in this exemplary text:
1 million ransom note Police press search for 2 men and missing wife of millionaire AP Minneapolis Minneapolis FBI Virginia Piper Harry C Jr Orono Hennepin County Jaffray Hopwood brokerage investment firms kidnapers Lake Minnetonka Mrs George Partridge III Addision L Tad David Lewis Associates Senate votes to ban war use of rain making AP Washington burning of forests U S weapons Indochina amendment Sen Gaylor Nelson D Wis Defense Secretary meluin R Laird North Vietnam cloud seeding Loas Cambodia Ho Chi Minh Trail weather modification activities House chemical sprays Firestorm operations reported Pentagon Southeast Asian jungles New York Times firestorm operations Sherwood Forest Hot Tip John Tower R Tex military helicopters Ft Sam Houston Tex Ft Lewis Wash Carson Colo Luke Air Force Base Ariz Mountain Home Idaho Work starts on gym Columbia University New York AP gymnasium campus Morningside Park Chicken exemption opposed WASHINGTON AP Cost Living Council price controls
Due to space considerations, and in an effort to show all 2,976 PICs, these entries have been shown in FIG. 3 in two-dimensional form wherein the .phi. or alphabetic value of the character is represented by the ordinate and the character position in words having 1, 2, et. seq. total characters is represented by the abscissa.
It should be noted that this particular text example as coded in FIGS. 3 and 4 represents an extract from some full text record. Nevertheless, even this extract contains considerable redundancies which are irreversibly coded in the set of PICs shown in FIG. 3. For instance, the exemplary text includes the words TAD, TIP and TEX. Since these are all three character words with the initial character of T, it follows that there is only a single binary representation for these initial letters of these three words in the PIC matrix as shown in FIG. 3. Furthermore, since the word "SAM" is also included in this text, it is clear that the second letter A of SAM and the second letter A of TAD are likewise represented by but a single binary coded PIC in the matrix shown in FIG. 3. Similar comments can be made for many of the entries as will become apparent by a study of the above quoted text with respect to the explicit binary coding of the PIC matrix shown in FIG. 3.
Actually, the 2,976 PICs in the exemplary embodiment are not associated together as indicated for exemplary purposes in FIG. 3. Rather, as indicated previously in FIGS. 1 and 2, the PICs correspond to individual arrays of elements which elements, in turn, correspond to the address or location of particular records in the base data file. Accordingly, FIG. 4 is simply a redrawing of FIG. 3 showing a portion of several of the 2,976 arrays in the exemplary embodiment of FIGS. 1 and 2 that would result from the encoding of the above noted document or text.
For instance, assume that this particular record is stored in location 4104. Accordingly, as indicated in FIG. 4, the array corresponding to PIC A.sub.1,1 would contain a binary code indicating the absence of this particular PIC in the record located at address 4104. However, array corresponding to PIC A.sub.1,2 would contain the other value of binary coding (indicated by cross-hatching in FIG. 4) indicating the presence of this particular PIC in the document located at address 4104 (for instance, this coding would result because the word "AP" appears in the text). The other entries for the arrays shown in FIG. 4 may be similarly confirmed by reference to the text and to FIG. 3. Of course, there would be 2,936 additional arrays in the actual retrieval file of FIGS. 1 and 2.
It should also be noted at this point that the arrays of the retrieval file might not be disposed in planar organizations such as schematically indicated in FIGS. 1 and 2. In some embodiments, such a planar organization of arrays might enhance the comparison process for corresponding elements in selected arrays, etc. However, it should be particularly noted that it is not necessary to associate the arrays with planar organizations as indicated for explanatory purposes in FIG. 1 and 2.
In fact, in the preferred embodiment where the retrieval file comprises arrays stored as machine readable magnetically encoded data on conventional magnetic discs, etc., each array would in fact simply comprise a string of binary bits associated with one another in a conventional manner that might or might not involve any particular predetermined physical locations on the magnetic discs, etc.
As should now be appreciated, each word in the text of a record can be easily analyzed within conventional general purpose data computing equipment to detect its language structure such as the alphabetic value of each character and its position within the word and the number of characters in the word, etc., and thus defining the binary value of each of 2,976 PICs with respect to that given document. Accordingly, all of the binary coding shown in FIG. 3 may be quickly and easily machine generated from the above quoted text corresponding thereto.
Thus, the construction step 102 shown in FIG. 1 need not be manual but can comprise entirely machine processing once the appropriate information from the base data file is itself in machine readable format. Similarly, all of the records in the base data file 100 can be processed in this manner to construct all of the arrays in the retrieval file 104 as should now be apparent.
In a similar manner, any input words or portions thereof may be automatically analyzed to determine its language structure such as the character values and their positional locations within the word of a particular size, etc., thus automatically identifying the particular PICs corresponding to the input search data. The arrays corresponding thereto may then be automatically selected by the programmed computer and the corresponding elements therein compared to identify the addresses of desired records.
Of course, all of these steps could also be carried out in an entirely manual operation and/or various degrees of automatic processing might be substituted for such manual operations. For instance, the arrays might be maintained as plates of film images which are manually or semiautomatically extracted and compared with automatic photoelectric reading of the superimposed binary coded elements, etc.
For some applications, it may be preferable to decrease the number of PICs but to generate and maintain a plurality of retrieval files for such PICs. That is, a first retrieval file for the given set of PICs would correspond to a particular predetermined portion of the information on the base data file or stored records, while the second retrieval file would be coded to reflect the information content of another predetermined portion of such records, etc. For instance, if the base data file comprises telephone directory records, one might want to organize several retrieval files wherein one of the retrieval files relates to the name associated with each telephone directory record and wherein another retrieval file relates to the street address associated with each telephone record, etc. However, as may also be appreciated, these separate retrieval files for a common data base may simply comprise separate groups of PICs within the same retrieval file. In short, the characteristic of language location or meaning within each record may be used in combination with alphabetic values, position within words, etc., as a part of the language structure represented by the PICs in the retrieval file.
That is, this may also be thought of as a single retrieval file wherein a first set of PICs includes the characteristic that it is associated with the name portion of each corresponding record in the base data file whereas another segment of PICs in the retrieval file includes the characteristic that it is associated with the street address portion of such records, etc. In any event, when a retrieval file or files are so organized, the input search data may also be organized accordingly and noted as relating to the name portion of such records or to the street portion of such records, etc., with the corresponding selection of arrays from the retrieval file being made from that particular portion of a retrieval file or a particular retrieval file, as appropriate, thus increasing the precision with which retrieval is accomplished.
If the entire text of each record in the base data file has been utilized in constructing the binary coded arrays in the retrieval file, then every word in the text of each record in a "key word". However, these key words can be used as input queries in any given order, etc., without affecting the output results. Furthermore, the selection of key words is not a function of some intermediate interpreter who has beforehand selected only a few of the words from the text of the record for use as such "key words".
The retrieval file may be made more flexible by modifying the selection of PICs so as to, in affect, overlay smaller character groups on larger character groups in the coding process. In effect, this has already been done in the exemplary embodiment with respect to word lengths of eight or more characters as previously noted. However, it is also possible to carry the process even further, thus decreasing the total number of PICs required for the retrieval file and saving storage space, etc.
Of course, this will increase the coding ambiguities inherent in the retrieval file and thus decrease the precision of retrieval capability to some extent. However, it will at the same time increase the flexibility of the retrieval system since one is then capable of retrieving with input or inquiry words without knowing exactly how many characters are in the finding word.
That is, for instance, if the retrieval file is related to a base data file of telephone directory records, one may not know the exact spelling or the number of characters in a person's name on the particular record desired. However, it is usually possible to provide at least the first few letters of the name for the record being requested, and, if the retrieval file has been coded so as to in effect overlay small character groups on larger character groups, then one may successfully retrieve the desired information even though the total number of characters in the name in question is unknown. Of course, since the coding ambiguities in such a retrieval file are necessarily increased to provide the increased flexibility, the precision with which retrieval may be effected is correspondingly decreased. Accordingly, in addition to the desired name, a retrieval using such a retrieval file may often produce retrieval results calling forth records other than the particular one actually desired. Nevertheless, the accuracy of the recall is still 100 percent in that the desired record will surely be among those that are requested and provided by the retrieval system.
The preferred embodiment involves a retrieval file wherein the PICs are chosen to represent the alphabetic value and relative sequential location of characters in groups of characters such as words contained on record in the base data file. However, it should now be apparent that the selection of PICs can be related to the language structure of the base data file records in other manners as well. For example, if the base data file contains records having many chemical or mathematical formulae, etc., some of the PICs may represent mathematical symbols, operations, quantities, etc. Even particular complete words might be designated as a PIC in a particular retrieval file for a particular application related to some particular base data file of records, for which such a PIC selection would be advantageous. For instance, if the base data file comprises clippings from a newspaper, one of the PICs in the retrieval file might well represent the presence or absence of an accompanying picture associated with the text on a given record, etc. Another PIC might relate to whether or not the record in question is an obituary or whether or not the particular record in question was written by a particular columnist, etc. In short, depending upon the type of base data file involved and the responsiveness desired by the user, one may adapt the set of PICs in a particular retrieval file to the particular language structure of the base data file records in many different ways to achieve particular desired end results as should now be apparent.
One exemplary embodiment of apparatus for practicing this invention is shown in FIG. 5 as comprising a programmed computer 120 together with associated peripheral equipment such as magnetic disc drive and storage units 122, 124; a cathode ray tube display and keyboard input/output unit 126 and possibly a paper tape punch and reader input/output unit 128 and/or other peripherals (not shown) such as magnetic tape drives. The paper tape or magnetic tape peripherals may be utilized, for instance, in inputting control program(s) to the magnetic core of computer 120 and thus conditioning it or adapting it for operation according to this invention. In this particular embodiment, the accessed base data file of records and the retrieval file of binary coded arrays are both stored on the magnetic disc units 122, 124 and/or further magnetic disc storage units as required. The CRT display and keyboard unit 126 is then used for inputting the search data or inquiry information and for displaying the retrieved record as a result of such inquiries.
Another exemplary embodiment of equipment for practicing this invention is shown in FIG. 6. Again, a programmed computer 130 is associated with the magnetic disc device 132 and a CRT display and keyboard input-output device 134. In addition, programmed computer 130 interfaces with a film image storage and retrieval device 136, which is adapted to store optical images of a set of information bearing records at predetermined addresses and is adapted for automatically delivering such information (for example at a display unit 138) corresponding to the information content of any given record when provided with the address of that given record by a computer 130. Such devices are available, for instance, to handle microfiche images from companies such as Image Systems at Culver City, California or Remington Rand. In this exemplary system, the base data file of records in accessible form would be stored at 136 while the retrieval file of binary coded arrays would be stored on the magnetic disc unit(s) 132. Inquiry information would then be input from the CRT display and keyboard unit 134. The programmed computer 130 would then analyze the inquiry information, identify the PICs represented thereby; select and compare the corresponding retrieval arrays from the retrieval file to identify the addresses of the desired records and provide that address information to the storage and retrieval device 136 whereupon the requested records from the identified addresses would be displayed at 138. Of course, the display 138 might include copying means, etc., as appropriate and/or as desired.
TELEPHONE DIRECTORY ASSISTANCE RETRIEVAL SYSTEM
A. In General
This section describes in detail a specific exemplary implementation of the invention that has actually been experimentally used in retrieving existing machine readable telephone directory records to provide an efficient telephone directory assistance service.
The apparatus for this embodiment corresponds to that shown in FIG. 5 with the addition of three magnetic tape drives interfaced with computer 120 for initial processing of the telephone directory records since the information happened to be readily available in machine readable form on magnetic tapes.
In this exemplary embodiment, computer 120 is a Model PDP 8/E computer available from Digital Equipment Corporation, Maynard, Massachusetts. Magnetic disks 122, 124 comprises a Model DF-32 D disk file (32 K words) and control therefor also available from Digital Equipment Corporation and DD 14/2 disk drives (2314 type) and control therefor available from Diva, Inc., Eatontown, New Jersey. The CRT display and keyboard 126 comprises an S4300 Model CRT display unit available from Ontel Corporation, Planview, New York. Paper tape punch and reader 128 is a PC 8-E Model paper tape reader/punch and control therefore available from Digital Equipment Corporation. In addition, a model 1045 NRZI magnetic tape transport (9 track, 800 bpi) and a Model 1045 PE magnetic tape transport (9 track, 1600 bpi) both available from Wangco, Santa Monica, California are also provided together with a Model 5091-P8 magnetic tape controller available from Datum, Inc., Anaheim, California. Of course, all controls have interface to the PDP 8/E mini-computer.
The structure and functioning of this computer equipment, per se, should already be apparent to those in the art. If not, reference may be had to "Small Computer Handbook", published by Digital Equipment Corporation, Maynard, Massachusetts and to "S4200 (4300) User Manual" published by Ontel Corporation, Plainview, New Jersey.
The computer 120 is a "mini" computer and is specially adapted to cooperate with the above-noted peripherals according to this invention by programmming it as explained below. The programming is in the standard assembly level programming language recommended by the manufacturer and explained in detail in published literature such as "Introduction To Programming", published by Digital Equipment Corporation, Maynard, Massachusetts; "Software Manuel, Magnetic Tape Controller", publication No. 1250.0 by Datum, Inc., Anaheim, California; and "Programming Manuel for Diva Disc Systems Used With PDP 8/E" published by Diva, Inc., Eatontown, New Jersey.
The programming or "software" used in this exemplary embodiment is broadly indicated in schematic form at FIG. 7. Since the telephone directory records are, in this instance, already in machine readable form on magnetic tapes 140, this data is first converted to a standard format at 142 before the accessible base data file 100 and retrieval file 104 are constructed at 144 (including the step shown at 102 in FIG. 1) and stored on magnetic disc (122, 124 in FIG. 5).
As will be appreciated, the conversion 142 and construction 144 shown in FIG. 7 are accomplished once in the start-up of the whole information storage and retrieval system and thereafter only as necessary to take into account changes in the base data file. Such changes may be accommodated by a complete reconstruction process based on a whole new input file of records 140 or as a special update modification of the accessible base data file and retrieval file already stored on magnetic disc. Since suitable updating programs for file maintenance purposes are well known, per se, in the art, and since same are actually not necessary to practice this invention, no detailed description of such updating programs will be given.
Once the retrieval file and base data files are in existence, then the system is ready for retrieval processing as shown at 146 in FIG. 7. Here, inquiry inputs from the CRT display and keyboard unit 126 are accepted and retrieved telephone directory data are shortly thereafter displayed thereon.
These three basic areas of this exemplary embodiment will now be explained in more detail.
B. Conversion.
B1. In General.
The conversion 142 is shown more explicitly in FIG. 8. It comprises two programs and merely serves to enter all relevant information into the system in a standardized format from pre-existing available data on magnetic tape.
The pre-existing data base is magnetic tape oriented and comprises two files: listings 150 and captions 152. Both are processed by Program No. 1 to produce standardized files 154, 156 having standard formatting, coding, etc. These standardized tapes are then merged by Program No. 2 into a final file 158 which is formatted for use by the retrieval and base data file construction program segment 144.
To increase the flexibility and speed of the final retrieval process, the standardized file 158 is actually separated into alphabetic segments, each of which is to be considered as a separate base data file 100 and for each of which a retrieval file(s) 104 will be constructed. The alphabetic segments are chosen so as to result in roughly equal sized base data files. In the exemplary embodiment, 16 alpha groupings or segments were chosen:
16 Alpa Groupings
1. E-L
2. Mi-N
3. Br through Bz-W
4. B through Bq-Cr through Cz-Q
5. Z-I-P
6. Ca through Cq
7. U-J-K
8. V-R
9. D-Y-X
10. G
11. H
12. F
13. S through Sn
14. T-A
15. O-So through Sz
16. M-Mh
B2. Description of Input Magnetic Tape Files 150,152
1. Physical Characteristics: 9 track, 800 bpi, regular mode.
2. Structure: Variable length blocks, variable length records per block, maximum characters per block-2048. Record terminator (477.sub.8). Block terminator (75.sub.8).
3. Record Structure:
A. Fixed Length Control Field (58 characters length)
0: N/A
1: field code (47.sub.8)
2-4: N/A
5-7: NPA (exchange)
8: Borough Code
9-15: Telephone #
37: listing type
38-57: N/A
B. Full name field- 0: field code
C. Full title- 0: field code
D. Full desig.- 0: field code
E. Supp Sort Criteria (36 characters length)
0: field code
1-29: N/A
30-36: Tel # or Z for non pub
F. Caption control # (N/A)- 0: field del
G. Listing name
H. Listing title
I. Listing desig.
J. Listing street
K. Listing house #
L. Listing house # suffix
M. Listing locality
O. Remainder of fields: N/A
4. Character Representation
23.sub.8 -34.sub.8 : characters 0-9
40.sub.8 -71.sub.8 : characters A-Z
Note: that these characters may also include parity bit
B3. description of Standardized Magnetic Tape Files 154, 156, 158
1. Physical Characteristics: 9 track, 1600 bpi, special core dump mode.
2. Structure: Fixed length block (512 words), one record per block.
3. Record Structure:
A. Header (16 words)
1. Locations 2,3,4-double precision sequence count, mpf (N/A).
2. Location 11- non pub info (YES=4000; NO=0)
3. Location 12-listing type (Bus=20; Prof=10; Res=0)
4. Other locations- N/A
B. NPA (area code)- 3 words
C. Telephone #-5
D. Borough- 21/2
E. Full Name- 731/2
F. Full Title- 71/2
G. Full Designation- 191/2
H. Listing Name- 671/2
I. Listing Title- 71/2
J. Listing Designation- 191/2
K. Street- 71/2
L. House #- 5
M. House # Suffix- 31/2
N. Locality- 161/2
O. Blank- 248
4. Standardized Coding Structure______________________________________Character Octal Character Octal______________________________________A 33 0 65B 34 1 66C 35 2 67D 36 3 70E 37 4 71F 40 5 72G 41 6 73H 42 7 74I 43 8 75J 44 9 76K 45 Space 77L 46 @ 0-77*M 47 ( 0-50*N 50 ) 0-51*O 51 " 0-75*P 52 : 0-72*Q 53 $ 0-44*R 54 % 0-45*S 55 ; 0-73*T 56 & 0-46*U 57 ' 0-47*V 60 0-55*W 61 * 0-52*X 62 . 0-56*Y 63 / 0-57*Z 64 # 0-43* , 0-54* field delimiter 0-41*______________________________________ * These characters are double coded. Note: Blanks are double zeroes.
B5. Functional description of Conversion Programs
1. Program #1
A. Obtain relevant elements from the input tape data base
1. NPA (area code)
2. Borough
3. Telephone number; non-pub info.
4. Full name
5. Full title
6. Full designation
7. Listing name
8. Listing title
9. Listing designation
10. Street
11. House number
12. House number suffix
13. Locality
14. Listing sequence
15. Non-pub info.
16. Listing type
B. Format to Standardized Specifications
1. Magnetic tape output; fixed length (512) blocks; one record per block; 9 track; 1600 bpi; special core dump mode.
2. Standardized codes
2. Program #2
A. Merge the standardized captions and listings files.
B. Add blank records for editing maneuverability.
C. Split into 16 separate and distinct alpha groups.
B6. Detailed Description of Program No. 1
Program No. 1 is shown in block form at FIG. 9 with subroutines utilized therein detailed in FIGS. 10-17. An explicit listing of the assembly level source program language for Program No. 1 and all related subroutines follows. With respect to FIG. 9, it will be noted that program sections I-V correspond to specific listing statement numbers:______________________________________Section Instruction Statements______________________________________I 0200-0213II 0214-0216III 0217-0260IV 0261V 0262-0273______________________________________
These five program sections are functionally described below as an introduction to the explicit source program listing:
I. Initialize
A. OH, LH are a double precision counter. This counter is used to allow a maximum 24,576 records to be output on a magnetic tape in the initial standardized file.
B. SEQH, SEQL are a double precision record sequence counter. It is stored in each record in position 2 and 3.
C. REOF is a single precision counter. It is used to allow a normal return from XREAD upon encountering the first two tape marks (after file and header labels). XREAD concludes tape processing upon encountering the third.
D. Four XREAD commands are dummy commands. They essentially pass the tape over the header and file labels and their associated tape marks.
II. XREAD: Input tape block. Inputs of the input tape file. If REOF signals that the third tape mark has been reached, the input tape is rewound and the proram halts at location 416, otherwise normal return. XREAD merely sets up the parameters for the magnetic tape operation, sends control to XMAG which accomplishes the physical magnetic tape operation, and received control from XMAG at completion and acceptance of the operation. XMAG also delineates tape mark vs normal record.
III. Standardized processing
A. XBUF: pads standarized record before data entry.
1. Pads data portion with spaces
2. Pads remainder with zeroes
B. Sequence # and mpf put into standardized header (locations 2, 3, 4).
C. The desired elements are retrieved from the input record by using the GETFLD and DOFLD subroutines. GETFLD merely positions the input data pointer at the beginning of the next field encountered. DOFLD, assuming the data pointer is at the beginning of the data field, goes to a list to get the number of characters desired transferred and the location of the DRC field to which the NYT field will be transferred. Exit from the subroutine is made upon encountering the next field delimiter in the input data. Also, at the time of transfer, the codes are changed to the standardized code (via PUTCHR) which is a 6 bit code (except for double codes).
IV. XWRITE: Output standardized record. Uses OH and OL as output counters. Upon overflow, the output tape is given a tape mark and rewound and the program halts at location 435. Note that XMAG is the subroutine which actually accomplishes physical magnetic tape operations.
V. Search and Distinquish Terminator. The remainder of the input record is scanned until reaching either and end of block code (75.sub.8) or an end of record code (477.sub.8). End of record causes the program to go to Section III of the mainline, while end of block causes the program to go to Section II. ##SPC1## ##SPC2##
Operating Instructions for the above listed Program No. 1 are as follows:
1. Set up and load magnetic tapes. Load program into memory bank 0. input file-tape unit 1 scratch-tape unit 0
2. SR=200: load, clear, continue Halts: 416- Input tape done.
A. To continue- put on next input tape and hit continue.
B. To terminate- set SR=421: load, clear, continue.
435- Output tape at limit: Put on next tape and hit continue.
647- Magtape error: If bit 6 in MQ indicates end of tape set SR=671 load, clear, continue.
652- Retry failure.
B7. Detailed Description of Program No. 2
Program No. 2 is shown in block form at FIG. 18 with subroutines utilized therein detailed in FIGS. 19-24. An explicit listing of the assembly level source program language for Program No. 2 and all related subroutines follows.
This program is written in three different versions (herein called "loaders") to distinguish processing of those alphabetic groupings or segments comprising whole character groups from split character groups.
Loader 1 processes whole character groups. Loader 2 processes the first half of split character groups while loader 3 processes the second half of split character groups.
For example, the letter "A" file can be processed in its entirety, therefore loader 1 would be used. The letter "M", which has more listings than can fit in a single XM block, must be split. Thus Ma-Mh is the 1st half segment and utilizes loader 2, and Mi-Mz is the 2nd half segment and utilizes loader 3. In the split programs (2 & 3) the split defining character (2nd character of the full name field which starts the 2nd half segment-i for Mi-Mz) must be manually entered into the program into location 317 before processing is initiated.
Note that the major difference between the programs is how the "COMP" subroutine handles the search.
Referring to FIG. 18, it should be noted that program sections I-V correspond to specific listing statements as follows:______________________________________Loader Section Instruction Statements______________________________________1 I 0340-03441 II 0200-02041 III 0205-02171 IV 0240-02461 V 0300-03102 I 0340-03442 II 0200-02042 III 0205-02172 IV 0240-02462 V 0300-03103 I 0340-03443 II 0200-02043 III 0205-02173 IV 0240-02513 V 0300-0310______________________________________
These five program sections are functionally similar if not identical and are functionally described below as an introduction to the explicit source program listing for loaders 1, 2 and 3 of Program No. 2.
I. Get the appropriate # of blanks from the SR (switch register). Note that this number is set up to be a multiple of 12 for simpler processing during the retrieval file storage. The actual number of blanks is determined manually with regard for the frequency of activity of the particular alpha segment.
II. Get the character to define the alpha group from the SR. Note that the split defining character is entered via toggle switch before processing.
III. Set the proper tape unit in the read command (listing or captions). Search through the file until the desired alpha segment is found, then transfer all subsequent records to the output file until a new alpha segment is encountered.
IV. If the listings file is currently being processed, change the tape unit to the captions file and repeat part III. If the captions file is currently being processed go to part V.
V. After both listings and captions files have been processed, blank records are added to each alpha segment per entry in Part I. Upon completion of this operation, the program is terminated. ##SPC3## ##SPC4## ##SPC5##
Operating Instructions for the above listed Program No. 2 are as follows:
1. Set up and load magnetic tapes. Load program into memory bank O. Add toggles for leaders 2 & 3 loc. 317=2nd char. which defines split.
Listings-tape unit 1
Captions-tape unit 2
Scratch-tape unit 0
2. Set Switch Register (SR)=340: load and clear
3. Set SR=# of blank records desired: continue. When program halts (almost instantly) set SR=DRC letter being processed (bits 6-11): continue. The program will halt at location 310 after it has completed the appropriate merge function of the desired alpha grouping:
a. To terminate unit- Set SR=421: load and continue. Program halts at location 435 (program complete).
b. To terminate output tape only- Set SR=421: load and continue. Program halts at location 435. Go to instruction #2 to continue processing.
c. To continue on same output tape- Go to instruction #2 to continue processing.
Halts:
310- see above
406- end of file on input tape. Manually rewind and dismount tape; mount and load next tape on tape drive; SR=401; load, continue.
450, 647, 652- Magtape failures
C. Retrieval File and Base Data File Construction
Cl. In General
This portion of the exemplary embodiment is generally depicted in FIG. 25. It comprises Program numbers 3, 4 and 5. Program 5 serves to create an ASCII coded base data magnetic disk file from the standardized converted output 158 of the conversion portion of the system. Program 3 takes this same input data and constructs an intermediate retrieval file on magnetic tape while program 4 uses the intermediate file to temporarily construct a miniature version of a retrieval file before finally constructing the magnetic disk retrieval file. An ASCII base data file and a corresponding retrieval file are constructed and stored on magnetic disk for each of the 16 alphabetic groupings or setments previously discussed.
For this exemplary embodiment, the following set of PICs is used:
Field 1. In Finding Name Field (first full word of standardized name field) and; .phi.2,2+; .phi.3,3+; .phi.4,4+; .phi.5,5+; .phi.6,6+; .phi.7,7+ where .phi.=A.fwdarw.Z (regardless of case) [Note: The first character value is ignored because the files are already separated based on the first character of this field.]
Field 2. In name field but not finding name (all other words of name field plus full title field) and; .phi.1,1+; .phi.2,2+; .phi.3,3+; .phi.4,4+; .phi.5,5+; .phi.6,6+ where .phi.=A.fwdarw.Z (regardless of case)
Field 3. In designation field or street, house number house number suffix, locality fields and .phi.1,1+; .phi.2,2+; .phi.3,3+; .phi.4,4+ where .phi.=A.fwdarw.Z (regardless of case) and .phi.=0.fwdarw.9
Field 4. In business-professional listings or in residential-professional listings.
Thus there are nominally 457 PICs in all associated with the alphabetic values and sequential locations of characters in each base data file record in this exemplary embodiment.
Program No. 3 first analyzes the above three noted fields and sets up a two-dimensional intermediate retrieval file comprising a bit matrix representing character value versus character position for each field of a given record. The binary bit values in this intermediate file are then transferred by Program 4 into a properly organized retrieval file (one array per PIC, one array element per base data file record, etc., before being transferred to magnetic disk for actual use. The format of the intermediated retrieval file matrix is shown in FIG. 26.
Here in FIG. 26, the bit matrix is shown as comprising several successive 12 bit words in magnetic core storage. Each of the 457 significant bits is indicated by a decimal numbered grid opening and is equivalent directly to the octal number (after the usual decimal-octal conversion) of a respectively corresponding array in the retrieval file.
The organization of the 16 data base files and associated retrieval files is depicted in FIGS. 27 and 28 for the two disk drives involved in this exemplary embodiment.
As should now be apparent, each alpha segment has 500 disk blocks reserved for storage of the ASCII-coded base data file records. Each block contains 60 fixed length records. Each record being 40 words in length (80 six bit characters). Note that the blocks and listings within the block are written in reverse order so that first in will be last out. Also, move cursor commands are embedded in the data before the telephone number for better handling in the retrieval program (program #6). A standardized six bit coding is used except for control codes. A control code is a double coded character- 0 followed by a six bit code which if 200 is added to it will produce a CRT control character.
Standardized fields transferred to the magnetic disk base data file are:
Listing Name
Listing Title
Listing Designation
House #
House # Suffix
Street
Locality
Telephone #, NPA, Non Pub Information
A brief functional description of program numbers 5, 3 and 4 follows as an introduction to a more detailed description of each:
1. Program #5
A. Obtains the necessary fields for CRT display at time of retrieval.
1. Listing name.
2. Listing title.
3. Listing designation.
4. Street
5. House Number
6. House Number suffix
7. Locality
8. NPA
9. Telephone number; non-pub info
B. Format to retrieval specifications.
1. Fixed length records, blocked, stripped ASCII coding (6 bit) packed
2. Diva disk output; fixed length records and blocks; 60 records per block; 40 computer words per record; maximum of 80 characters per record. The data blocks occupy 500 disk blocks max. For each alpha group. Thus allowing 30,000 listings per alpha segment.
2. Program #3
A. Obtains the necessary fields for XM searching capability.
1. Full name.
2. Full title.
3. Full designation.
4. Street.
5. House number.
6. House number suffix.
7. Locality.
8. Listing Type.
B. Format to intermediate retrieval file specifications
1. The data is entered into a bit matrix, the matrix being a character versus character position relocation.
2. The data is separated into 4 fields.
a. Finding name- first word of full name field
b. subsequent words- other words of full name field plus full title field.
c. address- full designation, street, house number, house number suffix and locality fields.
d. Listing Type
(i) business-professional
(ii) residential-professional
3. Output to magtage intermediate retrieval file blocks, 12 records per block, 64 computer words per record, 9 track, 1600 bpi, special core dump mode.
3. Program #4
This program takes 12 intermediate retrieval file records and makes one retrieval file word for each character/character position element. These are written onto the small disk (DEC DF32D) until 120 frames (10 computer words) are stored for each element. The disk is then dumped onto magnetic tape (intermediate process). After all intermediate retrieval files are processed in this way, these mini retrieval files (120 bits each) are then input from tape and written at the proper block on the Diva disk. Note that the intermediate file bits are processed in reverse order so that last in will be first out at retrieval time. The retrieval files occupy 456 disk blocks per alpha group; each retrieval file thus allows thirty thousand bit positions.
C2. Detailed Description of Program No. 5
Program No. 5 is shown in block form at FIGS. 29-32 with subroutines utilized therein detailed in FIGS. 33-43. An explicit listing of the assembly level source program language for Program No. 5 and all related subroutines follows. It will be noted that the main program is shown as comprising sections I, II and III in FIG. 29, IV in FIG. 30, V in FIG. 31 and VI in FIG. 32. These program sections correspond to specific listing statements:______________________________________Section Instruction Statement______________________________________I 0400-0412II 0413-0420; 0554-0560III 0421-0434IV 0435-0446; 0526-0536; 0561-0564V 0447-0465; 0515-0524; 0565-0573VI 0466-0514______________________________________
These six program sections are functionally described below as an introduction to the explicit program listing:
I. Initialize.
A. Set up the starting disk output block (which is manually entered before program initiation). Note that the blocks are processed in reversed order (as are each record within a block) so that last in will be first out.
B. XSTDSK: set controller and drive #6.
C. XZRO: zero entire output block.
D. XINIT: Set up starting output record pointer (see above); set up the buffer which will store the phone #, NPA, non-pub info, and cursor commands until the end of the record is ready for processing; set up output record per block counter (60 records per block).
E. Indicate that the 1st character will be put in the 1st half of the output buffer word. Reset the character counter for the line. Set the space indicator (for deciding whether the current character is the 1st character of a word).
II. Input.
A. Read record from magnetic tape.
B. Halt on eof, dump last output block manually and halt.
III. Processing.
A. Set output data pointer at begining of next record.
B. If blank record- ignore any data processing.
C. Check for full output block and output block to disk if full. Then go to process next record (rezeroing buffer if output transfer has occurred).
IV. Processing.
A. XSTORIT: set up output character"linct". Store telephone number characters- npa and either NP (for non pub) or telephone number digits.
B. Set up the list of data fields to be processed. Inclusive in this list are the following fields: listing name, listing title, listing designation, house #, house # suffix, street and locality. Process these fields into the output record then transfer the stored telephone information to the output record.
V. Processing.
Process a field. All spaces between words are removed within a field (with the exception of full 1) and each field is separated by a space except the telephone field which is positioned by move cursor characters.
IV. Processing.
Process double coded characters. Upon encountering a "start of field code", put a space into output record and go to V to process the next field. Also, make sure that the input area is not being overrun. For regular double coded characters, transfer then directly to the output record.
Some of the subroutines not shown in the drawings are briefly explained below:
DSK Subroutine
A general purpose subroutline which drives the disk hardward to write a block (2518.sub.10 words) of data onto the disk at a specified block. (The Disk contains possible 4060 blocks per sector and has 2 sectors for this program.
format:
JSM I XDSK
(block #)
control resumes here
DIVIDE Subroutine
A general purpose single precision divide subroutine, used by "dsk" to determine the proper head and track from the block # (block #20.sub.10)
format:
JMS I XDIVID
(divident)
(divisor)
remainder returned here
control resumes here with quotient in accumulator.
MAG Subroutine
General purpose subroutine to drive the magnetic tape transport.
format:
JMS I XMAG .sub.*1
(command)
(current address)
(word count)
(extension register) *2
control resumes here if tape mark encountered
control resumes here normally.______________________________________*1: command bits *2: extension bits 0 - formatter select 0-4 - N/A 1,2 - unit select 5 - special core dump mode 3-5 - N/A 6-9 - N/A 6-8 - tape command 10-11 - memory bank 9-11 - N/A______________________________________
OCDEC Subroutine
A general purpose octal to decimal conversion subroutine (double precision).______________________________________format: JMS I XOCDEC (high order octal #) (low order octal #) ten millions digit million's digit hundred thousand's digit equivalent digits ten thousand's digit returned here thousand's digit hundred's digit ten's digit unit's digit control resumes here______________________________________ ##SPC6## ##SPC7## ##SPC8##
Operating Instructions for the above listed Program No. 5 are as follows:
1. Load Program #5 into MBO.
2. Put input tape on unit 1.
3. Set the following parameters for unit being processed:______________________________________72 = sector (0 or 14)73 = drive (0 or 20)542 = starting block # See Sheet1566 = block limit______________________________________
4. 400 load, clear, and continue. Do not clear after starting program.
5. Halt at 555 indicates input eof has been read. Load and examine locations 3000-3020. They shoudl all be zero. If not, load and continue at location 20. If these locations are zeroed, go to instruction 6.
(a) If no more input tapes: load and examine loc 63. If loc 63.noteq.4626, 557 load and continue to dump final data block. Note contents of locs 63 and 1555. Manually rewind tape. If loc 63=4626, note contents of locs 63 and 1555 and rewind tape manually.
(b) If more input tapes: Be sure next tape is on unit. 556 load and continue.
Specifications
1. Input:
a. NYT/DRC tape, 1600 bpi, special core dump mode.
b. Input buffer: 3000-4000 MBO.
2. Output: Diva disk from 0-4726 MB2. 60-80 character (40 location) data records per block.__________________________________________________________________________HALT LISTHalt Location Reason for Halt Recovery Procedure__________________________________________________________________________247 Bad tape or offline Check drives252 Retry failure If loc 350=1020, 20 load and continue. If loc 350=240, 253 load and continue.502 Buffer overflow on Load and examine loc 62. This special character. (zero gives the position following the already in buffer) zero, using the standard 4000 trigger. Locate the Zero in mb2, and replace it with a space. Return to mb0. 526 load and continue.534 Buffer overflow on tel. Get programming assistance. number554 Input eof. See instruction 6.560 Program completed1564 Disk limit overflow Abort, more than 30,000 frames1642 Output buffer overflow Abort2125 Disk error Retry__________________________________________________________________________UNIT STARTING BLOCK BLOCK LIMIT (LOCATION 542) (LOCATION 1566)__________________________________________________________________________L,E 1057 (559) 7705 (-59)N, MI-MZ 3027 (1559) 5735 (-1059) SEC 0W, BR-BZ 4777 (2559) 3765 (-2059)CR, Q, B 6747 (3559) 2015 (-3059)P, I, Z 1057 7705C - CQ 3027 5735 SEC 14K, J, U 4777 3765R, V 6747 2015X, Y, D 1057 7705G 3027 5735 SEC 0H 4777 3765F 6747 2015S-SN 1057 7705A, T 3027 5735 SEC 14SO-SZ, O 4777 3765M 6747 2015__________________________________________________________________________
C3. Detailed Description of Program No. 3
Program No. 3 is shown in block form at FIG. 44 with subroutines utilized therein detailed in FIGS. 45-52. An explicit listing of the assembly level source program language for Program No. 3 and all related subroutines follows. It will be noted that the main program is shown as comprising sections I, II, III and IV in FIG. 44. These program sections correspond to specific listing statements.______________________________________Section Instruction Statement______________________________________I 0200-0202II 0203-0204; 0216-0223III 0205-0206IV 0207-0215______________________________________
These four program sections are briefly functionally described below as an introduction to the explicit program listing:
I. Initialize.
A. "REC" is used as a record pointer for the output block. Twelve is the blocking factor. (REC 0-11)
B. "BUF" zeroes the output block area.
II. Read input record.
A. "READ" goes to XMAG to physically read the input tape and indicate whether a tape mark has been reached by its returning position. If a normal exit occurs the program continues to part III (MX processing).
B. If a tape mark is encountered the input tape is rewound and the program outputs the last MX block if there are any records in it and halts. If the operator desires to manually continue, a tape mark will be placed on the output tape (it is also rewound) and again the program halts (processing is complete).
III. Process MX Record. "XMX" translates the input record into MX (PIC matrix) format and stores it in the proper record of the output block (determined by "REC").
IV. Output
A. Set record pointer "REC" to next record.
B. If output block is full, write output block and go to I, otherwise go to II.
The XMAG subroutine is not shown in the FIGURES but is described below:
XMAG: general subroutine to drive the magnetic tape transport.
Format:
JMS I XMAG
(command) *2
(current address)
(word count)
(extension register) *3
control resumes here if tape mark encountered
control resumes here normally______________________________________*2. command bits: *3 extension bits 0 .fwdarw. formatter select 0-4 .fwdarw. N/A 1,2 .fwdarw. unit select 5 .fwdarw. special core dump 3-5 .fwdarw. N/A 6-9 .fwdarw. N/A 6-8 .fwdarw. tape command 10-11 .fwdarw. memory bank 9-n .fwdarw. N/A______________________________________ ##SPC9## ##SPC10##
Operating instructions for the above-listed Program No. 3 are as follows:
1. Load Program (MBO).
2. Load Input tape (DRC) - Unit 1. Load Output tape (MX) - Unit 0.
3. SW=200; load, clear continue.
4. Halt at 221 indicates end of input tape:
A. To continue with same unit, replace input tape, go to instruction #3.
B. To terminate unit, hit continue (this puts eof on output tape between units), program will then halt at 223. Go to instruction #4-A.
Halts:
221, 223: see above.
264, 647: magtape error.
652: Magtape retry failure. Note: to get around retry failure on magtape read, SW=207: load, continue to blank out MX record for this input. To determine whether retry is on read, check loc 752. If read, loc 752 will contain 1020.
C4.Detailed Description of Program No. 4
As previously noted, Program No. 4 constructs a complete series of mini-xm's called "10 mini-xm segments" and is accomplished only after 120 MX records are processed; at which time there are 10 sequential mini-xm words (120 bits) for each significant bit position of the 120 MX records. For ease of manipulation they are constructed partially utilizing 12 MX records at a time (12 bits in one computer word). Thus a complete group of 10 mini-xm segments consists of 456 mini-xm segments, each XM segment occupying 10 computer words (120 bits). The manner in which the XMs are partially processed is thus:
1. Secure 12 MX records in core (one MX block).
2. Sequentially amass the first bits of each MX and store in a computer word (partial XM.sub.1).
3. Sequentially amass the second bits of each MX and store in a computer word (partial XM.sub.2).
4. Repeat this process for each element until all bits are processed. The resultant is a series of partial mini-xm's [xm (1,1).fwdarw.xm (1,456)]
5. Repeat the above 4 steps until 10 MX blocks (120 MX records) have been thus processed. Now there are 10 words for each XM element and the mini-xm's are complete [xm (1,1).fwdarw.xm (10,456)]. These mini-xm elements are stored on tape.
Program No. 4 is shown in block form at FIG. 53 with subroutines utilized therein detailed in FIGS. 54-62. An explicit listing of the assembly level source program language for Program No. 4 and all related subroutines follows. It will be noted that the main program is shown as comprising sections I, II, III, IV, V, VI, VII and VIII in FIG. 53. These program sections correspond to specific listing statements:______________________________________Section Instruction Statement______________________________________I 0400-0405II 0406-0411III 0412-0422IV 0423-0434V 0435-0440VI 0441-0462VII 0463-0465VIII 0466-0516______________________________________
These eight program sections are functionally described below as in introduction to the explicit program listing:
I. Initialize
A. XSETD: set disk controller and drive parameters.
B. Set up the list of data information. Eg: at the end of each input unit the program will store in this list: (1) the number of records dumped onto magtape per unit; and (2) the complement (cia) of the number of unused words from the last dump of 10 words.
C. "MONCT" is used for counting the units (indicating when the fourth is encountered).
II. Initialize small disk parameters
A. "LOOPCT" indicates that 10 input blocks of 12 records each have been processed onto the small disk, thus indicating that the small disk is full and ready to be dumped onto magtape.
B. "STAD" is an indicator for the position of the mini-xm storage on the small disk. With the first input block the first mini-xm is stored at the 10th used position of the small disk and each mini-xm of that input block is displaced 10 positions apart. With the second input block the first mini-xm is stored at the 9th used position of the small disk and again each mini-xm is displaced 10 positions apart and so forth until the disk is full.
III. Input MX Block (PIC matrix)
A. Clear the core input area. This is in effect a dummy operation since the input record overlays this area.
B. Read the MX block into core from magnetic tape. There are 12 records in this block. Each record is 64 words in length. If a tape mark is encountered, go to part VIII.
IV. Fill the small disk (Dec DF32d) with as many mini-xms as will comfortably fit (without stopping processing in the middle of an input block).
A. "XROT"--(see separate description of this subroutine) process input data into mini-xms in core mbl.
B. "XFILD"--stores the mini-xms on the disk. Note that each mini-xm word is displaced ten words apart from the next. This is done so that ten inputs of 12 records store the mini-xms in sequential order. Eg. After one input and processing of 12 MX records, the small disk contains one mini-xm for each significant MX bit position. But after 10 input and processing cycles the small disk contains 10 adjacent mini-xms for each significant MX bit position, and each "10 mini-xm segment" has its bits sequential in reverse order.
C. After "XROT" and "XFILD" are processed, if the entire unit is done or the small disk is full, part V is entered; if the small disk is not full (10 inputs), go to part III.
V. Transfer mini-xm segments from small disk to magnetic tape storage.
A. "XWRTD" is a physical transfer of the stored 10 "mini-xm segments" on the small disk to magnetic tape.
B. If the unit is completely processed go to part VI; otherwise continue to part II.
VI. End of a Unit Processing
A. Reset unit done trigger.
B. Store # of records processed in unit.
C. Store # of unused mini-xms in the last "10 mini-xm segment".
D. Write tape mark to separate units.
E. If 4 units are done, rewind the tape (of stored mini-xm segments) and go to part VII; otherwise go to part II.
VII. Transfer mini-xm segments to the large disk.
A. "XDO"--(see separate description of this subroutine.)
B. After all xms for 4 units are on the disk, rewind the magtape and halt: Program completed.
VIII. Tape mark encountered on MX input file (end of unit).
A. Set "EOFTRG" to indicate unit complete.
B. "LOOPCT" is queried to see if the last "10 mini-xm segment" has some incomplete elements stored on small disk to be dumped, in which case the last dump is made before going to part VI.
"XROT" Subroutine
To better understand how "XROT" processes the mini-xms, refer to FIG. 26. This figure shows how each MX record is formatted in core.
Each record consists of 64 twelve bit computer words (the last word of which is not depicted on the diagram and is unused), in which there are both significant and unused bit positions. For example, the first two words consist of all significant bits, the third word, however, has only 2 significant bits and 10 unused bits. This same pattern is continued through the 48th word. The 49th, 50th, and 51st words start a new pattern: one word of 10 significant bits and 2 unused bits, then two words of all unused bits. This pattern is continued through the 60th wrod. Finally the 61st word is the last word to contain any significant bits--two bits are significant in this word.
"XROT" processes 12 records at one time. It takes a bit from each of the twelve records and puts them together and stores this result as a mini-xm. It does this for each significant MX bit position. Note that only the significant bit positions are processed into min-xms and that the number of mini-xms produced in this operation is therefore equal to the number of significant bits in the MX record.
This subroutine takes the first MX bit of each record, combines them and stores them as a mini-xm. Then it does the same with the next MX bit from each record, until the first word (12 MX bits) of each MX record is transposed into 12 mini-xm words. "MXCT" indicates when 12 records have each processed one bit. "ROTCT" indicates how many significant bits are to be processed in the current MX word. This indicator is normally set to process 12 bits. "LETLP" indicates when the desired number of MX words have been processed to complete the 1st of the three patterns, so that the processing of only the significant bits is unaltered when the pattern changes. "LP12" indicates that the next MX word to be processed will have unused bits. It is originally set to process 2 words of each record into 24 mini-xms, then indicate done. At this time "ROTCT" is set to indicate that the next input word will have only 2 significant bits to process. "LP12" is then reset to do 3 more words (one containing significant and unused bits and the next two containing all significant bits), and so forth until the first pattern is finished, at which time 416 mini-xms will have been amassed completing the alphabetic portion of the matrix.
When the first pattern is complete "DONTRG" is set to indicate that the alpha portion is done. Notice that "ROTCT" is now set to process only 10 significant bits of the first MX word of the second pattern and the next two MX words are skipped over. This process is repeated until the second (or numeric bits) pattern is completely processed. Notice that the MX input position (data pointer) is used to test when the second pattern is done. Finally the last (or business-residence bits) pattern is processed. "ROTCT" is set to process only 2 significant bits of the next MX word. Again the input position is used to trigger the end of this pattern, at which time the subroutine is exited.
Note that each input MX word containing significant bits is processed in the forward direction (left to right) but the mini-xm bits are stored in reverse order (right to left). This is done to accommodate the first in last out function.
"XDO" Subroutine
"XDO" is actually a subprogram whose function is to combine each "ten mini-xm segment" from each magnetic tape record group into XM blocks onto a disk pack. It processes all four units processed onto tape.
First, the format of the "ten mini-xm segments" stored on magnetic tape in record groups will be described. Each of 458 bit positions of 120 MX records have been transformed into "10 mini-xm segments"; i.e., 120 bits (10 words) containing the bits of the first bit positions of each of the first 120 MX records, followed by 10 words containing the bits of the second bit position of each of the first 120 MX records, and so forth until 10 words containing the bits of the 458th bit position of each of the first 120 MX records. These are put onto magnetic tape as three records, the first and second records containing 204 (each of the 458 segments and the third record containing the remaining 50 segments. These three records from a group. The next 120 MX records were also processed into a group of 3 records in the same manner until all records were processed (the last group possibly containing unused bits and/or words). Each of the 4 units was separated by a tape mark.
Thus "XDO" is set up to process (through "PROC" subroutine" the 4 units into XM blocks in the following manner: With the disk pack initially zeroed (offline process), the first "ten mini-xm segment" (10 mini-xm words) are put onto the first disk block (record), the second "ten mini-xm segment" put onto the second block, and so forth until the 458th "ten mini-xm segment" has been put onto the 458th disk block. Note that because the bits are in reverse order, the "ten mini-xm segments" will be put into these blocks in reverse order also, i.e., all of these segments are put into the end of the disk block (words 2509-2518). Thus the first group has been processed. The second group is processed in the same manner except that the segments are now displaced 10 words in the XM blocks (words 2499-2508). This is continued until all the groups in the unit have been processed. The other three units are processed likewise, except that the disk block are offset 1000 positions for each unit.
Some of the subroutines not shown in the FIGURES are summarized below:
MAG: General subroutine to drive the magnetic tape transport.
Format:
JMS I XMAG
(command *1
(current address)
(word count)
(extension register) *2
control resumes here if tape mark encountered
control resumes here normally
*1. command bits
0: formatter select
1,2: unit select
3-5: N/A
6-8: tape command
9-11: N/A
*2.
extension bits
0-4: N/A
5: special core dump mode
6-9: N/A
10,11: memory bank
DISKS: General subroutine to execute a disk function on the DF32D (small) disk.
Format:
JMS I XDISK
(command)
(core address)
(word count)
(block # [0-17])
(disk address)
control resumes here
*3. This command is either DMAR to read or DMAW to write.
DIVIDE: General single precision divide subroutine.
Format:
JMS I XDIVID
(dividend)
(divisor)
remainder returned here
control resumes here with quotient in accumulator
REWIND: General subroutine for rewinding the magnetic tape unit last processed by XMAG.
CLEAR: General subroutine to clear MX input area.
INIT: Initialize parameters at the initial read of the 3 record group of "10 mini-xm segments".
A. "RDCT" is a counter to indicate that there are 3 magtape records to one block of mini-xm segments.
B. "TRCCT" is a counter to indicate that there are 204 mini-xm segments in the first two records of the block and 50 mini-xm segments in the last record of the block. (Set to 50 at some other part of the program)
DSKOP: Finishes executing a disk function. The function is specified by "COMM". In this subroutine the actual disk transfer takes place.
RD: Establishes the parameters for "XMAG" subroutine; goes there to perform a magnetic tape read function per parameters; returns from "XMAG" indicating tape mark if encountered, and offsets return from "RD" if tape mark not encountered.
SETDSK: Sets the constant parameters for the large disk. The parameters in this program which are constant throughout the program are: controller and drive (all 4 units are processed onto one drive). "LST3" is set up at this time also. It is a list of the starting XM disk block No.'s for each unit.
Note: Each XM is a self-contained disk record, 2518 words in length. * When we speak of a disk block (on the large disk), we are denoting one of these records. Note also that there are 4060 of these blocks per disk, with the first 60 blocks used for scratch purposes, followed by 1000 blocks allotted to each unit. The breakdown of the 1000 blocks per unit is: 500 for data, 458 for XM's, 42 for scratch purposes (actually unused). For a pictorial view see the "ascii-xm file disk layout."
*Only 2500 words of the XM are used--this allows 30,000 bits per unit in the XM's. ##SPC11## ##SPC12## ##SPC13##
Operating instructions for the above listed Program No. 4 are as follows:
ABSTRACT: This program takes 12 MX records and makes 1 XM word for each char position. These are written onto the small disk until 120 frames (10 words) are stored for each position. The disk is then dumped onto magtape. After all MXs are processed, the mini-XMs (120 frames each) are then input from tape and written at the proper block on the Div Disk. At the end of the program locations 474-507 contain (1) the # of records dumped onto magtape per unit and (2) the cia of the number of unused words from the last dump of 10 words. Locations 107-115 contain the end locations of each unit within the block.
1. Load NYT MX-XM program into mb0.
2. Put MX tape on unit 1 and scratch tape on Unit 0.
3. Set the following parameters in core:
55=drive (0 or 20)
56=sector (0 or 14)
4. Be sure the drive is ready and the proper disk pack is on.
5. 400 load, clear and continue.
6. Halt at 465 indicates processing complete. Note contents of locations 474-507 and 107-115.
SPECIFICATIONS
1. Input:
(a) MX tape, 1600 bpi, scd mode, 2000 wds per rec., 12 MX per record, eofs between units.
(b) Input buffer--2000 mb0.
(c) Input buffer for mini XMs--4000 mb0.
2. Output:
(a) to small disk from 0-536 mb1.______________________________________HALT LISTHalt Location Reason for Halt Recovery Procedure______________________________________247 bad record/offline check drives252 retry failure press continue465 end of processing646 DF32 retry failure755,1263 Buffer overflow; abort over 30,000 frames1147 Eof on writing on disk abort1152 record # exceeded on abort outputing from DF321236 Block overflow abort1304 unit # exceeded abort1534 DD14 error abort______________________________________
D. Retrieval Processing
D1. In General
This portion of the exemplary embodiment is generally depicted in FIG. 63. In terms of programming, it comprises Program No. 6. It is the portion of the system which searches and retrieves the information stored on disk. The user accomplishes this function by entering his query through the CRT keyboard into Program #6, which searches the retrieval file portion of the Diva disk to find where the query matches the disk retrieval file information. The relevant data records for these matches are then retrieved from the ASCII portion of the disk and subsequently displayed to the user via the CRT screen.
D2. Detailed Description of Program No. 6
There are two types of searches available in the system via this program. The first is a selective search. In this case, the user must specify the first letter of the "finding name", and only that segment .sup.*1 of the alpha grouping specified is searched. Otherwise, the user can search all 16 groupings (all portions) by eliminating this one word field. This is called a general search. The "finding name" is defined as the first word (Field 1) of a listing in the alphabetical telephone directory. Note that the first character (and also the second character in certain cases) of the "finding name" is the one which defines the grouping (and segment thereof) to which the listing queried will belong. Note that the listings are processed in an alpha-numeric sequence thereafter excepting that the listings and captions are not totally merged (one file merely follows the other) within a segment.
*1. Segment refers to one alphabetic segment within a grouping. Note that some alphabetic segments stand alone within the grouping (eg. "G") whereas others are combined (eg. "Z", "I", "P").
The program allows the user to enter his query in fielded format. The first field searches the first word of the "finding name" and if omitted causes the program to search the entire data base rather than one segment for any matches to subsequent information entered in the query. The second field, called "subsequent words" searches the rest of the finding name plus titles. The third field, called "address" searches the designation and address information.
The functional sequence of control can be summarized as follows:
I. User enters query through CRT
A. Fields
1. Finding name--7 characters max., one word only
2. Subsequent words and titles--6 characters max., no limit on words
3. Designation and Address--4 characters max., no limit on words
B. Control Characters
1. space--field separator
2. comma--word separator
3. carriage return--query terminator for business and professional listings
4. line feed--query terminator for residential and professional listings
5. semi colon--query delector
6. period--screen hits roll
7. asterisk--abort signal
II. Program No. 6 enters search and retrieval phases
A. The characters entered from CRT are then translated to pointers for specific retrieval file blocks. The first character is a pointer to the alpha group desired. .sup.*1 The first retrieval file segment (upon receipt of the next alphameric character from the CRT) is then brought into core from the Diva disk. All other retrieval file segments are ANDED (boolean operation) with the result from the previous operation. This produces a resultant retrieval file.
*1. The first character received from the CRT in a search is a director to the (a) proper disk pack, (b) sector within a pack, and (c) one of 4 segments (groups) within a sector, ie., it is a pointer to one of the 16 alpha groupings.
B. This resultant of the ANDING process is an indicator for each listing of its relation to the user's query. If a bit is "off", no match is indicated. If a bit is "on", a match with the user's query is indicated. Since the bits are sequential (with relation to the listings) in all retrieval file segments, the position of the "on" bits of the resultant retrieval file are directly proportional to the position of the ASCII data records they represent. Thus, the "hits" shown on the screen are those ASCII records represented by the "on" bits in the resultant retrieval file. For ease of reference, each retrieval file segment will hereafter be termed as an "XM".
As an example of the general processing involved, consider the following query:______________________________________Smith John, H 4, Park (Return)______________________________________
Upon receipt of each letter of this query, Program No. 6 will undertake the following actions:
S--indicate split letter segment
M--set disk parameters for alpha grouping #13 (SA-SN). bring XM block 12 into core (this becomes resultant).
I--bring XM block 34 into core; and with resultant and result becomes resultant.
T--same (XM 71)
H--same (XM 85)
J--same (XM 165)
O--same (XM 196)
H--same (XM 215)
N--same (XM 247)
H--same (XM 163)
4--same (XM 420)
P--same (XM 327)
A--same (XM 338)
R--same (XM 381)
K--same (XM 400)
Return--same (XM 457)--business professional filter
Program No. 6 is shown in block form at FIGS. 64-68 with some of the subroutines used therein detailed in FIGS. 69-84. An explicit listing of the assembly level source program language for Program No. 6 and all related subroutines follows. It will be noted that the main program is shown as comprising sections I-IX in FIGS. 64-68. These program sections correspond to specific listing statements:______________________________________Section Instruction Statements______________________________________I 0200-0202II 0203-0210III 0211-0222IV 0223-0226V 0400-0473VI 0227-0315VII 0316-0345VIII 1317-1354IX 0600-0657; 0736-0767______________________________________
These nine program sections are functionally described below as an introduction to the explicit program listing:
I. Program Initialization
A. XPRINT: print the form on the CRT. The form consists of dashes denoting where the query fields should be entered.
II. Start of Query
A. "6030" clears the keyborad flag if set
B. "FPOS" is set to indicate the program is processing the first field (FPOS=0)
C. "CHAR" is set to indicate that no characters of the query have been processed
D. A buffer is set to store XM coordinates for each query character if the whole disk is to be searched
E. Note that because the field is already set, the new field processing is ignored at this time III. New Field
A. Only three fields are allowed, if the fourth is attempted, the program aborts the query processing and reinitializes itself.
B. "CHAR" is queried to see if any query characters have been entered. If not, the program (upon the omission of any characters in the first field) determines that the search will cover the entire file (all 16 segments). If the entire file is to be searched, a resultant XM and parameters which are pertinent there to are stored for each of the 16 segments. "XSTBUF" is an initialization routine which stores one XM (with all bits set) and its initialized parameters for each segment. For a brief explanation of these parameters; "XMBIT" is the bit which is currently being tested for a hit, "XMLOC" is the location which contains the bit currently being tested for a hit, and rather than processing the entire XM for bits during the anding and searching cycles, "XMSTRT" points to the first "on" bit of the resultant, so that processing can start at this location.
C. "FPOS" is set to indicate the new field (FPOS=1 for second field; FPOS=2 for the third)
IV. New Word--At the beginning of a word, certain items must be reinitialized.
A. "CPOS"--used to denote character position within the word of the current character being processed. Initially set to 0 for the first character.
B. "SADD"--set to the starting XM block # for the current character position within a field. See FIG. 26.
C. "STRG"--When the first character of the finding name field is one which is split among more than one XM group (eg. "S"), "STRG" is set so that the next character can determine the XMs group to search.
D. An exit is made to "XASCIF" to get the next character from the CRT.
V. "XASCIF"
A. "KYBD" subroutine gets a character from the CRT keyboard and stores it in MQ register.
B. Program control is routed per MQ contents. For routing, see the flow chart on this section, and its accompanying legend.
VI. Current Character is an Upper Case Alpha
A. If "CPOS" and "FPOS" combined are equal to zero, this indicates that the current character is the first character of the finding name and
1. "GET1" selects the proper XM group and its disk parameters: drive, sector, and starting cylinder.
2. "STRG" is set if first characters denotes split group, so that proper disk parameters can be determined upon encountering the next character.
3. Parameters for the query are initialized (See new field section for description of "XMBIT", "XMLOC" and "XMSTRT").
4. "CHAR" is set to indicate that the search will include only one segment.
B. "STRG" is queried to see if disk parameters must be set on current character (when previous character could not set parameters because of segment split). And if so,
1. "STRG" is rezeroed for remainder of query.
2. "XGET2" sets the proper disk parameters with current character determining the proper XM group to be searched.
C. "LIMST" list tells how many characters are allowed to be processed in each word in field. (Actually it is a list composed of the last valid character position of the field). Field 1--7 characters. Field 2--6 characters Field 3--4 characters. If "CPOS" is greater than that allowed in the "LIMST" list for that field, character processing is ignored for the current character.
D. "FLDST" list is a list of the starting XM block numbers for each field (see FIG. 26 for block numbers). Field 1--Block 0. Field 2--Block 156 (234 octal). Field 3--Block 312 (470 octal).
E. the actual XM block to be processed is determined and put in "STMP". It is determined by adding the starting block of the current field (from "FLDST" listpointed to by "STMP" current contents), the starting block within the field of the current character position (in "SADD") and the stripped character (MQ register contents stripped of its ASCII).
F. "SADD" is then reset for the next character position.
G. "CHAR" is queried to determine whether all segments are to be processed for the current character; and if so, the XM block number to be retrieved is stored in a buffer via auto-index register "R17" (disk processing done at end of query) and immediate processing is bypassed.
H. Finally the block number (stored in "STMP" is divided by 20 (decimal)) and this result is added to the starting cylinder for this segment to determine the proper disk record to process. "XDOIT" is the ANDING subroutine which processes this disk record. Method: the records are read into core and ANDED together, the resultant containing only those bits which refer to matches in the query.
I. "CPOS" is icremented to the next character position. "CHAR" information is transferred to "ECHO" for further reference.
J. Program control goes to part IV to get next character from CRT.
VII. Current Character is a Number
A. "FPOS" is queried to see if the current field is the third (FPOS=2)--if not an error condition exists and a "no hits condition" is rendered to the user on the CRT screen.
B. Again "CPOS" is tested to see whether the character position is greater than the 4th (OPOS 3), and if so, processing is ignored.
C. The block number formulation is similar to that of the alphabetics except that the character position is multiplied by 10 to substitute for the "SADD" element and the starting block number for the field is set to 416 (640 octal)--see "MX-XM formats".
D. Remainder of processing is same as for alphabetics.
VIII. End of query
A. The last key selected in query determines type of search. Actually it allows one more ANDING process to occur (either the business--professional block or the residential--professional block noted in "BPRADD".
B. "ECHO" is tested to see if all segments are being searched. If so,
1. This last block # is stored in the XM block buffer and the buffer is terminated (0).
2. "DO16" processes all XM blocks whose numbers are stored in buffer.
3. "XGET16" brings the first segment's resultant and parameters into core. "UNIT" is set to donote first segment.
4. Program control is given to "XHIT".
C. If only one sigment is being processed, the last block is ANDED into the resultant via "XDOIT" and program control is given to "XHIT".
IX. Translate "on" bits in query's resultant XM into query answers (original listings) on CRT screen.
A. "XPRINT" at "MOVCUR" moves the cursor to the answer section of the CRT screen.
B. "HITCNT" is set to allow a maximum of 18 hits to be displayed on the CRT screen.
C. The current bit being processed in the resultant is stored in the MQ register.
D. Program control is transferred to "H32" to continue scanning for "on" bits.
E. "H20"--Scan resultant (computer) words for "on" bits.
1. "XMLOC" (current resultant location being scanned) is tested for "on" bits, and if any are encountered program control is given to "H30".
2. "XMLOC" is queried to see if the scanning process has been completed (if "XMLOC" is at beginning of resultant--note scanning done backwards) and if so, program control is given to "ENDH".
3. "XMLOC" is reset back one location to scan next location; program control is then given to "H21".
F. "H30"--The "on" bits are determined and processed.
1. Going from right to left, each bit is tested and if on program control is given to "H40", which processes the "on" bit.
2. After all bits are checked, program control is returned to "H22" to continue bit scan.
G. "H40"--Proper listing which matches "on" bit is written onto CRT screen.
1. "DOHIT"--selects relative ASCII record per the "on" bit in resultant and "XPHIT" prints this record on the CRT screen.
2. "HITCNT" is queried to see if CRT screen is full (18 listings printed), and if not program control is returned to "H32" to continue processing any "on" bits left in resultant.
3. If screen is full, the current bit being processed is returned to storage in "XMBIT"; "HITCNT" is queried to see if any hits were detected in resultant and if not--"no hits indication" is presented to user on CRT screen; the cursor is returned to the start of the query on the CRT screen; program control is returned to "XST2" to process a new query or possibly more hits if selected by user.
H. "ENDH"--end of resultant processing.
1. "ECHO" is queried to see whether more than one segment is being processed, and if not--program control is given to "H41" to terminate processing.
2. After four segments' resultants have been processed it is necessary to change the disk sector and drive parameters, otherwise "XGET16" will bring the resultant (and parameters) specified in call by "UNIT" into core. Program control is returned to "HIT14".
3. If when specifying the new drive-sector, it is determined that all segments have already been processed, program control is given to "H41" to terminate processing.
Some of the subroutine not explicitly shown in FIGS. 69-84 are briefly summarized below:
KYBD: gets character from CRT keyboard and stores it in MQ register.
PRT: prints characger currently residing in accumulator onto CRT screen.
PRINT: prints all characters in list specified in the call format.
format:
JMS PRINT
(location of print characters list)
NOTE:
Control resumes here
list is terminated by .phi..
DIVIDE: general purpose single precision division subroutine.
format:
JMS I XDIVID
(dividend)
(divisor)
remainder returned here
control resumes here with quotient in accumulator
ZMX: extraneous subroutine which is not used in program.
DISK: general purpose subroutine which drives the disk hardware to perform an operation (specified in call).
format:
JMS I XDISK
[memory bank (bits 0-5) & drive (bits 6-11)]
[cylinder]
[sector (bits 0-5) and head (bits 6-11)]
[command]
control resumes here
The CRT keyboard is utilized as follows:
UCA--upper case alpha; query character key
#--number; query character key
LF--line feed; key to indicate end of query and specify residential-professional type search
CR--carriage return; key to indicate end of query and specify business-professional type search
SP--space; key to indicate new field
*--asterisk; key to indicate abort situation
,--comma; key to indicate end of word
.--period; key to indicate screen roll (more hits)
;--semi-colon; key to indicate deletion of query
other--any other characters entered are not penetrable into the system (thus a question mark-backspace is printed for user). ##SPC14## ##SPC15## ##SPC16## ##SPC17##
Operating instructions for Program No. 6 are as follows:
1. Load Program through "DEMO". Else, after loading program through other means, set loc. 7000=7402, and start program at location 200 (mb 0).
2. The screen will print dashes to show where query is to be entered. The user may then type in a query via CRT keyboard.
3. CRT Format.
A. Fields
1.
Finding name
7 characters, one word
2.
Subsequent words and titles
6 characters per word
no limit on words
3.
Designations and Address
4 characters per word
no limit on words
B. Control Characters
1. space--field separator
2. comma--word separator
3. carriage return--query terminator for business and professional listings
4. line feed--query terminator for residential and professional listings
5. semi-colon--query deletor
6. period--screen roller
7. asterisk--returns control of proram to "DEMO" which it expects is resident in core.
Halts: 1531: disk failure
Note: If the program detects errors, the screen and program are automatically reinitialized.
SPECIAL PRUPORSE MANUAL/SEMI-AUTOMATED SYSTEM
Another exemplary embodiment of apparatus/method for practicing this invention is shown in FIG. 85. As will be appreciated from the following description, various degrees of manual and machine processing may be incorporated in this embodiment.
In this example, the base data file comprises typed extracts of the full text copy. Each record in the file is assigned a unique address number from 0001 onward. In the particular example to be described, 8448 records can be maintained and retrieved from such a base data file although, as will be appreciated, much larger files of records could be serviced with the same techniques.
Since this exemplary embodiment has been applied to an actual record file, the text from the first few records of that file are copied below:
Record #0001
Wed Sep 1 1971 CJ Burger asks caution in enforcing bus rule Quotations By PETER MILIUS LA Times Washington Post Service WASHINGTON Chief Justice Warren E Burger federal judges Supreme Court s busing decision school desegregation cities decision school system South U S Winston Salem NC President Nixon Department of Health Educaton and Welfare Fifth U S Circuit Court of Appeals New Orleans black school children Forsyth County Charlotte NC test
Record #0002
Wed Sep 1 1971 CJ Holsclaw resigns Acting chief takes county police reins Quotations By STAN MACDONALD Courier Journal Staff Writer Maj Russell S MacDaniel Jefferson County police chief Thomas R Holsclaw County Judge Todd Hollenbach merit board member county policemen county merit system Pixp Staff Photo by Thomas Mitchell MAJ RUSSELL S McDANIEL right was sworn in as acting chief of the Jefferson County Police Department by County Judge Todd Hollenbach after Chief Thomas R Holsclaw resigned
Record #0003
Wed Sep 1 1971 CJ Hollenbach s McDaniel Criminal Investigation Divisin CID Reports of friction Holsclaw Controversial transfer Capt Fred Roemele New duties for Holsclaw Was acting chief before Merit Board Chairman J Stanley Watson
Record #0004
Wed Sept 1 1971 CJ Nixon blocks disclosure of military ai plans From New York Times and AP Dispatches WASHINGTON President Nixon Senate Foreign Relations Committee the plans foreign military assistance President s military foreign aid program Foreign Relations Committee under Chairman J William Fulbright D Ark Pentagon s Congress Secretary of State William P Roger Defense Secretary Melvic R Laird Mr Nixon Another round in battle Legislation hinted House panel was rebuffed Elmer Staats Asst Gen William H Rehnquist Rep L H Fountain D N C FBI Atty Gen John N Mitchell
Record #0005
Wed Sep 1 1971 CJ As death rate worsens U S mine safety chief apparently will be fired By WARD SINCLAIR Courier Journal & Times Staff Writer WASHINGTON coal mine fatalities U S Bureau of Mines chief of health and safety Deputy Director Henry P Wheeler Jr Director Elburt F Osborn industry Fielf of action limited Iowa Republican Edward D Failor fatality statistics Says urgency is pointed up John F O Leary Farmington W Va It s his prerogative
Record #0006
Wed Sep 1 1971 CJ Postal unions file suit to break wage freeze By FRANK C PORTER LA Times Washington Post Service WASHINGTON representing postal workers filed suit government President Nixon s wage price freeze legal attack labor pay increases negotiated Boston Police Patrolmen s Association federal court Harry Bridges West Coast dock strike Joint Economic Committee Gardner Ackley economic policy AFL CIO President George Meany s National Association of Letter Carriers the American Postal Workers Union Mail Handlers Division of the Laborers International Union National Rural Letter Carriers Association cost of living Sen Fred R Harris D Okla
For this discussion the 2976 PICs identified above, supra, have been utilized.
The retrieval file construction shown in FIG. 85 is virtually automatic; however, as will be appreciated, such construction could be accomplished entirely by manual and photographic steps.
As shown in FIG. 85, the typed data records are scanned and converted to machine readable magnetic tape format where the typed characters/word groupings are still fully and uniquely represented by standard computer readable binary codes on the magnetic tape. A Scan Data Model 100 scanner may be utilized for this conversion to magnetic tape.
The resulting magnetic tape format of the base data file is then computer processed (a PDP 8/E computer with associated peripheral tape drives may be used) in a manner similar to that previously discussed in the telephone directory embodiment to automatically construct the requried 2976 binary code arrays comprising the retrieval file.
The magnetic tape format of the retrieval file is then converted to 16 mm microfilm format where each microfilm frame comprises one of the 2976 binary coded arrays. The presence of a particular PIC in a record is coded as an opaque spot on a corresponding portion of the array while the absence of that PIC would be represented by a transparent spot thereat. For instance, a Series F electronic beam recorder (EBR) available from the 3-M Company can be used to make this conversion.
The resulting 16 mm microfilm roll(s) is then placed in a standard 3-M microfilm cartridge(s) to form the accessible retrieval file of 2976 binary coded arrays.
As an example, if the word Smith occurs in record #1, then PIC arrays S.sub.1,5 ; M.sub.2,5 ; I.sub.3,5 ; T.sub.4,5 ; and H.sub.5,5 would carry opaque spots in the position assigned for the record address #1.
Since the Series F EBR used in this embodiment has a normal frame format of 132 characters per line and 64 lines, a total of 8448 (64.times.132=8448) base data file records can be accommodated by this example as should now be appreciated.
Actual photo copies of frames 0001 through 0005 of the microfilm retrieval file for a base data file including records 1-6 previously copied above are shown in FIGS. 86-90. These arrays represent particular PICs as noted below:______________________________________PIC FIGURE______________________________________A.sub.1,1 86A.sub.1,2 87A.sub.2,2 88A.sub.1,3 89A.sub.2,3 90______________________________________
Records 1-6 are assigned the first 6 successive positions from left to right in line 1 of each array just after the ".phi." reference position marker. Thus, as can be verified, only records #1 and #6 (out of the first 6 records) contain PIC A.sub.1,1. Only records #4 and #5 contain PIC A.sub.1,2, etc.
It should be appreciated that the retrieval file could also be manually coded on coding forms organized in the retrieval file format and then placed in the same end-resulting 16 mm microfilm format using conventional microfilm cameras.
However constructed, the microfilm retrieval file may now be utilized by simple operations on a Model 400 3-M Company microfilm viewer and printer as indicated in FIG. 85.
The microfilm retrieval file cartridge is loaded normally into the viewer-printer. The particular PICs contained in an inquiry word are manually identified and each of the corresponding retrieval file array microfilm frames are registered in the viewer and utilized to multiply expose a common output print. Thus, the multiply exposed print will be exposed wherever there is a transparent portion on any selected array but not exposed wherever all selected arrays have opaque spots. In effect, if a negative image is utilized, the selected arrays are Boolean ANDED to result in an output print having exposed portions only at locations corresponding to the address locations of records in the base data file having all the desired PICs. By overlaying this final print with an address grid (as indicated in FIG. 85), the addresses of the desired records can be readily ascertained and thus the desired records can be quickly and accurately located in the base data file.
As an example of the ANDING operation here contemplated, FIG. 91 is a photograph formed by Boolean ANDING of FIGS. 87 and 88 thus showing coded spots representing all record addresses having both PICs A.sub.1,2 and A.sub.2,2. It will be noted both from FIG. 91 and from an inspection of the above copied records 1-6 that only record number 5 (of the first 6 records) meets this critera. A.sub.1,2 in record #5 arises from "As" while A.sub.2,2 arises from "Va" (remembering that all letters are treated as upper case if the first letter of the word is upper case). Actually, this is an example of the type of "mis-hit" that is possible with this arrangement thus reducing retrieval precision to less than 100%. That is, presumably the combination of A.sub.1,2 and A.sub.2,2 would correspond to an inquiry word of "AA" or "Aa" neither of which appear in any of records 1-6. However, such an inquiry will, in this example, nevertheless produce an erroneous retrieval result (mis-hit). As previously noted, there are ways to minimize such "mis-hits" with this invention. In any event, retrieval accuracy is always 100% in spite of some possible lack of precision.
As an example of an actual inquiry process, consider a search for records containing the word "SMITH". The operator first notes that "SMITH" contains PICs S.sub.1,5 ; M.sub.2,5 ; I.sub.3,5 ; T.sub.4,5 and H.sub.5,5. Thus, the microfilm retrieval file is first advanced until the frame corresponding to S.sub.1,5 is properly registered in the viewing screen. Then, a "hold" switch and an "expose" switch are activated. In this manner, a photosensitive paper would be exposed to the registered S.sub.1,5 frame and this paper would be held for further exposures rather than being ejected.
Next, the operator would advance the retrieval film microfilm to the frame corresponding to M.sub.2,5 and repeat the "hold" and "expose" functions thus again exposing the same piece of photosensitive paper to another properly registered array image. The same procedures are repeated for I.sub.3,5 ; T.sub.4,5 and H.sub.5,5 except that the "hold" function is inactivated for H.sub.5,5 so that the multiply exposed photosensitive paper will be ejected after exposure to all the desired PIC arrays.
Since the developed ejected photosensitive paper contains a negative image, it will actually contain a light area (unexposed) in each of the possible 8448 locations representing addresses of documents having the word "SMITH" contained therein. By overlaying this single resultant print with a transparent grid of 132.times.64, the operator can then determine which of the 8448 records contain the word "SMITH".
As previously noted, the retrieval file arrays can be coded and photographed manually. However, it is preferred that this portion of the process be automated by proper programming of the computer shown in FIG. 85. A set of seven programs have been written to perform this task for use with the same mini computer equipment previously described for use in the telephone directory assistance retrieval system. An eighth program has also been developed to assist in the actual retrieval process.
The interrelationship of these programs is briefly illustrated in FIG. 92. In addition the following brief description of all eight of these programs follows before an explicit listing of the actual source program statements:
I. Program 1--DSC-DRC
A. Converts scan tape to drc formatted tape.
B. Produces a listing of frames containing errors such as incorrect numerical frame sequence and punctuation within the data.
II. Program 2--CRTE
A. Online correction and editing program; allows for display of drc record on CRT screen so that errors listed by Program 1 (DSC-DRC) may be corrected.
III. Program 3--DRCC
A. Combines like numbered drc frames so that only one record exists for each frame number.
B. Removes data information from data portion of record and stores it in special form in the record header portion.
C. Categorizes frames as to general type (editorial, cartoon, obituary, etc.) by setting bits in header position of record.
D. Writes a blank record for any missing drc frames so that each frame number has a corresponding magnetic tape record.
IV. Program 4--DRC-MX
A. Converts drc coded tape to MX formatted tape.
B. Program 4A--DRC-MX OVERLAY
1. Converts drc category bits from header portion of record into MX format.
V. Program 5--MX-XM
A. Converts MX formatted tape into XM format and writes XM's onto retrieval disk.
VI. Program 6--DATREC
A. Writes first 150 characters of each drc record onto the disk so that they may be displayed during the retrieval process when a hit is found.
VII. Program 7--MX Sort
A. Original test program to convert MX formatted records into XM formatted records.
B. Program 7A--MX-EBR
1. Converts XM records into EBR format for printing on microfilm.
VIII. Program 8--RETRIEVAL
A. User-oriented system allowing for retrieval of information from news clippings. ##SPC18## ##SPC19## ##SPC20## ##SPC21## ##SPC22## ##SPC23## ##SPC24## ##SPC25## ##SPC26## ##SPC27## ##SPC28## ##SPC29## ##SPC30## ##SPC31## ##SPC32## ##SPC33## ##SPC34## ##SPC35## ##SPC36## ##SPC37## ##SPC38## ##SPC39## ##SPC40## ##SPC41## ##SPC42##
Although only a few embodiments of this invention have been described in detail, those in the art will recognize that the exemplary embodiments may be adapted to many different situations without departing from the substance, spirit or advantages of this invention. Accordingly, all such embodiments are intended to be included within the scope of this invention which scope is defined solely by the appended claims.
Claims
  • 1. An information storage and retrieval system comprising:
  • a set of stored information bearing records having information stored in a language format which, at least in part, has intelligent meaning because of particular groupings of characters or symbols therein,
  • each of said records being disposed at a predetermined address or location,
  • a stored retrieval file for facilitating the retrieval of particular desired records from said set of information bearing records, said retrieval file comprising a plurality of arrays of binary coded elements,
  • each of said arrays including predetermined elements individually and respectively corresponding to the addresses of each of said information bearing records,
  • each of said arrays being formed to indicate the presence or absence of a predetermined identifiable characteristic of the language structure associated with the information content of each of said information bearing records, wherein said plurality of arrays constituting a comprehensive set of arrays correspond to a comprehensive set of said predetermined identifiable characteristics of language structure comprising substantially all such predetermined identifiable characteristics which are to be later utilized in searching for desired information bearing records, and
  • each element in a given array being binary coded in a first manner to represent the presence in the respectively corresponding record of the predetermined identifiable characteristic of language structure corresponding to the given array and being binary coded in a second distinguishable manner to represent the absence in the respectively corresponding record of the predetermined identifiable characteristic of language structure corresponding to the given array
  • whereby particular desired records bearing certain desired information may be located and thus retrieved by first determining the subset of said predetermined identifiable characteristics present in said desired information and then examining the respectively corresponding subset of said arrays to determine the storage address or location of each stored record containing all of said subset of predetermined identifiable characteristics.
  • 2. An information storage and retrieval system as in claim 1 wherein at least some of said predetermined identifiable characteristics of language structure correspond to the identity of characters in said records and to the relative sequential location of such characters in associated groups of characters contained in said records.
  • 3. An information storage and retrieval system as in claim 2 wherein said at least some of said predetermined identifiable characteristics of language structure correspond to the identity of characters and their relative sequential location in groups of characters having a predetermined number of total characters therein.
  • 4. An information storage and retrieval system as in claim 2 wherein said at least some of said predetermined identifiable characteristics of language structure correspond to upper and lower type-case representations of said characters.
  • 5. An information storage and retrieval system as in claim 2 wherein at least some of said predetermined identifiable characteristics of language structure corrrespond to the identity of characters and their relative sequential location in groups of characters having any arbitrary number of total characters therein.
  • 6. An information storage and retrieval system as in claim 1 further comprising:
  • means for extracting from said retrieval file particular ones of said arrays corresponding to desired particular ones of said predetermined identifiable characteristics of language structure.
  • 7. An information storage and retrieval system as in claim 6 further comprising:
  • means for comparing corresponding binary coded elements of said extracted particular arrays thereby identifying the addresses of any records having all the desired particular ones of said predetermined identifiable characteristics of language structure.
  • 8. An information storage and retrieval system as in claim 7 further comprising:
  • means for extracting the stored information from the ones of said information bearing records corresponding to particular records for which corresponding addresses have been identified.
  • 9. An information storage and retrieval system as in claim 8 further comprising:
  • means for displaying said extracted information from said set of information bearing records.
  • 10. An information storage and retrieval system as in claim 6 further comprising:
  • input means for accepting input search data and for identifying the subset of said particular ones of said arrays corresponding to desired particular ones of said predetermined identifiable characteristics of language structure in response to said search data.
  • 11. An information storage and retrieval system as in claim 10 wherein said input means is adapted to accept search data comprising at least a portion of one group of characters contained in a record for which information retrieval is desired.
  • 12. An information storage and retrieval system as in claim 1 including means for automatically constructing said retrieval file from input data corresponding to the stored information in said set of information bearing records.
  • 13. An information storage and retrieval system as in claim 1 wherein said information bearing records are in machine readable and machine locatable form.
  • 14. An information storage and retrieval system as in claim 1 wherein said retrieval file is in machine readable form.
  • 15. An information storage and retrieval system as in claim 1 wherein said information bearing records and said retrieval file are both in machine readable form.
  • 16. An information storage and retrieval system as in claim 1 including means for automatically locating and displaying at least part of the information content of any desired information bearing record from the addresses of records identifiable by comparative values of corresponding elements of a subset of said arrays having predetermined binary coded values.
  • 17. An information storage and retrieval system as in claim 1 further comprising means for accepting input search data and for automatically identifying particular ones of said arrays corresponding to desired particular ones of said predetermined identifiable characteristics of language structure in response to said search data.
  • 18. An information storage and retrieval system as in claim 17 wherein said means for accepting is adapted to accept search data comprising at least a portion of one group of characters potentially contained in a record for which information retrieval is desired.
  • 19. An information storage and retrieval system as in claim 1 comprising:
  • a plurality of said retrieval files, each retrieval file corresponding to only a particular predetermined portion of the potential information content of any given information bearing record.
  • 20. A method for information storage and retrieval comprising:
  • maintaining a set of information bearing records having information stored in a language format which, at least in part, has intelligent meaning because of particular groupings of characters or symbols therein, each of said records being maintained at a predetermined address or location,
  • generating and maintaining a retrieval file for facilitating the retrieval of particular desired records from said set of information bearing records, said retrieval file being formed as a function of the language structure of information in said records, the retrieval file comprising a plurality of arrays of binary coded elements,
  • said generating and maintaining step including the generation and maintenance of a comprehensive set of arrays corresponding to a comprehensive set of predetermined identifiable characteristics of language structure comprising substantially all such predetermined identifiable characteristics which are to be later utilized in searching for desired information bearing records,
  • providing predetermined elements in said arrays which individually and respectively correspond to the address of each of said information bearing records,
  • forming each of said arrays to indicate the presence or absence of a predetermined identifiable characteristic of language structure associated with the information content of each of said information bearing records,
  • said forming step including binary coding each element in a given array in a first manner to represent the presence in the respectively corresponding record of the predetermined identifiable characteristic of language structure corresponding to the given array and in a second distinguishable manner to represent the absence in the respectively corresponding record of the predetermined identifiable characteristic of language structure corresponding to the given array and,
  • locating and retrieving particular desired records bearing certain desired information by first determining the subset of said predetermined identifiable characteristics present in said desired information and then examining the respectively corresponding subset of said arrays to determine the storage address or location of each stored record containing all of said subset of predetermined identifiable characteristics.
  • 21. A method for information storage and retrieval as in claim 20 wherein at least some of said predetermined identifiable characteristics of language structure are caused to correspond to the identity of characters in said records and to the relative sequential location of such characters in associated groups of characters contained in said records.
  • 22. A method for information storage and retrieval as in claim 21 wherein said at least some of said predetermined identifiable characteristics of language structure are caused to correspond to the identity of characters and their relative sequential location in groups of characters having a predetermined number of total characters therein.
  • 23. A method for information storage and retrieval as in claim 21 wherein said at least some of said predetermined identifiable characteristics of language structure are caused to correspond to upper and lower type-case representations of said characters.
  • 24. A method for information storage and retrieval as in claim 21 wherein at least some of said predetermined identifiable characteristics of language structure are caused to correspond to the identity of characters and their relative sequential location in groups of characters having any arbitrary number of total characters therein.
  • 25. A method for information storage and retrieval as in claim 20 further comprising:
  • extracting information from said retrieval file representing particular ones of said arrays corresponding to desired particular ones of said predetermined identifiable characteristics.
  • 26. A method for information storage and retrieval as in claim 25 further comprising:
  • comparing said extracted information representing corresponding binary coded elements of said particular arrays thereby identifying the addresses of any records having all the desired particular ones of said predetermined identifiable characteristics.
  • 27. A method for information storage and retrieval as in claim 26 further comprising:
  • extracting the stored information from the ones of said information bearing records corresponding to the particular records for which corresponding addresses have been identified.
  • 28. A method for information storage and retrieval as in claim 27 further comprising:
  • displaying said extracted information from said set of information bearing records.
  • 29. A method for information storage and retrieval as in claim 25 further comprising:
  • accepting input search data and for identifying the subset of said particular ones of said arrays corresponding to desired particular ones of said predetermined identifiable characteristics of language structure in response to said search data.
  • 30. A method for information storage and retrieval as in claim 29 wherein said accepting step includes the acceptance as in claim 29 wherein said accepting step includes the acceptance of search data comprising at least a portion of one group of characters potentially contained in a record for which information retrieval is desired.
  • 31. A method for information storage and retrieval as in claim 20 including automatically constructing said retrieval file from input data corresponding to the informational content of said set of information bearing records.
  • 32. A method for information storage and retrieval as in claim 20 wherein said information bearing records are maintained in machine readable and machine locatable form.
  • 33. A method for information storage and retrieval as in claim 20 wherein said retrieval file is maintained in machine readable form.
  • 34. A method for information storage and retrieval as in claim 20 wherein said information bearing records and said retrieval file are both maintained in machine readable form.
  • 35. A method for information storage and retrieval as in claim 20 including automatically locating and displaying at least part of the information content of any desired information bearing record from the addresses of records identified by comparing values of corresponding elements of a subset of said arrays having predetermined binary coded values.
  • 36. A method for information storage and retrieval as in claim 20 further comprising accepting input search data and automatically identifying particular ones of said arrays corresponding to desired particular ones of said predetermined identifiable characteristics of language structure in response to said search data.
  • 37. A method for information storage and retrieval as in claim 36 wherein said accepting step includes the acceptance of search data comprising at least a portion of one group of characters potentially contained in a record for which information retrieval is desired.
  • 38. A method for information storage and retrieval as in claim 20 further comprising:
  • maintaining a plurality of said retrieval files, each retrieval file corresponding to only a particular predetermined portion of the potential information content of any given information bearing record.
  • 39. A method for identifying particular desired information bearing records having desired predetermined identifiable characteristics of language structure from a base data file containing a plurality of information bearing records having information stored in a language format which, at least in part, has intelligent meaning because of particular groupings of characters or symbols therein, said method comprising the steps of:
  • maintaining a retrieval file separately disposed with respect to said base data file, said retrieval file comprising a plurality of arrays of binary coded elements wherein said plurality of arrays constituting a comprehensive set of arrays correspond to a comprehensive set of said predetermined identifiable characteristics of language structure comprising substantially all such predetermined identifiable characteristics which are to be later utilized in searching for desired information bearing records, and including the steps of
  • organizing each array to include a binary coded element respectively corresponding to each record in said base data file, and
  • forming each array to correspond to indicate the presence or absence of a predetermined identifiable characteristic of language structure associated with each of said records,
  • said forming step including assigning each binary coded element in any given array a predetermined binary value to represent the presence or absence of said predetermined identifiable characteristic of language structure represented by said given array in the particular record represented by each element,
  • generating search data representing desired predetermined identifiable characteristics of language structure for sought after records,
  • selecting the subset of arrays representing said desired predetermined identifiable characteristics of language structure, and
  • comparing the binary values of respectively corresponding elements in said selected subset of arrays representing said desired predetermined identifiable characteristics of language structure to identify the addresses or locations of all records in the base data file which have all the desired predetermined identifiable characteristics of language structure.
  • 40. A method as in claim 39 wherein at least some of said predetermined identifiable characteristics of language structure are chosen to represent the identity of characters in said records and the relative sequential location of such characters in associated groups of characters contained in said records.
  • 41. A method as in claim 40 wherein at least some of said predetermined identifiable characteristics of language structure are chosen to correspond to the identity of characters and their relative sequential location in groups of characters having a predetermined number of total characters therein.
  • 42. A method as in claim 40 wherein at least some of said predetermined identifiable characteristics of language structure are chosen to correspond to upper and lower type-case representations of said characters.
  • 43. A method as in claim 40 wherein at least some of said predetermined identifiable characteristics of language structure are chosen to correspond to the identity of characters and their relative sequential location in groups of characters having any arbitrary number of total characters therein.
  • 44. A method as in claim 39 further comprising the steps of:
  • extracting information from the base data file corresponding to the particular identified information bearing records.
  • 45. A method as in claim 44 further comprising the step of:
  • displaying said extracted information from the base data file.
  • 46. A method as in claim 39 wherein said generating step comprises processing input search data comprising at least a portion of one group of characters potentially contained in a record for which information retrieval is desired and identifying said desired predetermined identifiable charateristics of language structure from said input search data.
  • 47. A method as in claim 39 wherein said maintaining step is repeated for each of a plurality of retrieval files, each retrieval file corresponding to only a particular predetermined portion of the potential information content of any given information bearing record.
  • 48. Apparatus for identifying particular desired information bearing records having desired predetermined identifiable characteristics of language structure from a base data file containing a plurality of information bearing records having information stored in a language format which, at least in part, has intelligent meaning because of particular groupings of characters or symbols therein, said apparatus comprising:
  • means for maintaining a retrieval file separately disposed with respect to said base data file, said retrieval file comprising a plurality of arrays of binary coded elements wherein said plurality of arrays constituting a comprehensive set of arrays correspond to a comprehensive set of said predetermined identifiable characteristics of language structure comprising substantially all such predetermined identifiable characteristics which are to be later utilized in searching for desired information bearing records, and including
  • means for organizing each array to include a binary coded element respectively corresponding to each record in said base data file, and
  • means for forming each array to indicate the presence or absence of a predetermined identifiable characteristic of language structure associated with each of said records,
  • said means for forming including means for assigning each binary coded element in any given array a predetermined binary value to represent the presence or absence of said predetermined identifiable characteristic of language structure represented by said given array in the particular record represented by each element,
  • means for generating search data representing desired predetermined identifiable characteristics of language structure for sought after records,
  • means for selecting the subset of arrays representing said desired predetermined identifiable characteristics of language structure, and
  • means for comparing the binary values of respectively corresponding elements in said selected subset of arrays representing said desired predetermined identifiable characteristics of language structure to identify the addresses or locations of all records in the base data file which have all the desired predetermined identifiable characteristics of language structure.
  • 49. Apparatus as in claim 48 wherein said means for assigning includes means for causing at least some of said predetermined identifiable characteristics of language structure to represent the identity of characters in said records and the relative sequential location of such characters in associated groups of characters contained in said records.
  • 50. Apparatus as in claim 49 wherein said means for assigning also includes means for causing at least some of said predetermined identifiable characteristics of language structure to correspond to the identity of characters and their relative sequential location in groups of characters having a predetermined number of total characters therein.
  • 51. Apparatus as in claim 49 wherein said means for assigning also includes means for causing at least some of said predetermined identifiable characteristics of language structure to correspond to upper and lower type-case representations of said characters.
  • 52. Apparatus as in claim 49 wherein said means for assigning also includes means for causing at least some of said predetermined identifiable characteristics of language structure to correspond to the identity of characters and their relative sequential location in groups of characters having any arbitrary number of toal characters therein.
  • 53. Apparatus as in claim 48 further comprising:
  • means for extracting information from the base data file corresponding to the particular identified information bearing records.
  • 54. Apparatus as in claim 53 further comprising:
  • means for displaying said extracted information from the base data file.
  • 55. Apparatus as in claim 48 wherein said means for generating comprises means for processing input search data comprising at least a portion of one group of characters potentially contained in a record for which information retrieval is desired and means for identifying said desired predetermined identifiable characteristics of language structure from said input search data.
  • 56. Apparatus as in claim 48 wherein said means for maintaining includes means for maintaining a plurality of retrieval files, each retrieval file corresponding to only a particular predetermined portion of the potential information content of any given information bearing record.
  • 57. A computerized information storage and retrieval system for storing and retrieving information stored in a language format which, at least in part, has intelligent meaning because of particular groupings of characters or symbols therein and including a programmed data processor, said system comprising:
  • first machine accessible information storage means adapted for storing a set of information bearing records at predetermined addresses and for delivering the stored information content of any given record when provided with the address of the given record,
  • second machine accessible information storage means adapted for storing a retrieval file comprising a plurality of arrays of binary coded elements,
  • each of said arrays including predetermined elements individually and respectively corresponding to the address of each of said information bearing records in said first machine accessible information storage means,
  • each of said arrays being representative of a predetermined identifiable characteristic associated with each of said information bearing records,
  • said plurality of arrays constituting a comprehensive set of arrays corresponding to a comprehensive set of said predetermined identifiable characteristics of language structure comprising substantially all such predetermined identifiable characteristics which are to be later utilized in searching for desired information bearing records,
  • each element in a given array being binary coded in a first manner to represent the presence in the respectively corresponding record of the predetermined identifiable characteristic corresponding to the given array and being binary coded in a second manner to represent the absence in the respectively corresponding record of the predetermined identifiable characteristic corresponding to the given array,
  • a programmed data processor computer means operatively connected with said first and second machine accessible information storage means, said data processor being adapted for accepting input retrieval search data, for automatically identifying the subset of predetermined identifiable characteristics present in said retrieval search data and for thereafter selecting a corresponding subset of particular ones of said arrays in response to said search data, for automatically comparing the binary values of respectively corresponding elements of said identified particular arrays thus identifying the addresses of particular records for which retrieval is desired and for automatically providing said identified addresses to said first machine accessible information storage means.
  • 58. A computerized information storage and retrieval system as in claim 57 wherein said programmed data processor means is further adapted for automatically generating said retrieval file in the second machine accessible information storage means from the set of information bearing records in the first machine accessible information storage means.
  • 59. A computerized information storage and retrieval system as in claim 57 wherein said programmed data processor means is further adapted to accept search data comprising at least a portion of one group of characters contained in a record for which information retrieval is desired.
  • 60. A computerized information storage and retrieval system as in claim 57 wherein:
  • said second machine accessible information storage means is adapted for storing a plurality of said retrieval files, each retrieval file corresponding to only a particular predetermined portion of the information content of any given information bearing record, and
  • said programmed data processor means is further adapted to accept a plurality of types of input search data, each type respectively corresponding to a predetermined one of said retrieval files.
  • 61. A computerized information storage and retrieval system as in claim 57 wherein said information bearing records comprise photographically recorded images.
  • 62. A computerized information storage and retrieval system as in claim 57 wherein said information bearing records comprise recorded machine readable records.
  • 63. A computerized information storage and retrieval system as in claim 57 wherein said retrieval file comprises photographically recorded images.
  • 64. A computerized information storage and retrieval system as in claim 57 wherein said retrieval file comprises magnetically recorded machine readable data.
  • 65. A computerized information storage and retrieval system as in claim 57 wherein at least some of said predetermined identifiable characteristics correspond to the identity of characters in said records and the relative sequential location of such characters in associated groups of characters contained in said records.
  • 66. A computerized information storage and retrieval system as in claim 65 wherein said at least some of said predetermined identifiable characteristics correspond to the identity of characters and their relative sequential location in groups of characters having a predetermined number of total characters therein.
  • 67. A computerized information storage and retrieval system as in claim 65 wherein said at least some of said predetermined identifiable characteristics correspond to upper and lower type-case representations of said characters.
  • 68. A computerized information storage and retrieval system as in claim 65 wherein said at least some of said predetermined identifiable characteristics correspond to the identity of characters and their relative sequential location in groups of characters having any arbitrary number of total characters therein.
  • 69. A computerized information storage and retrieval method for storing and retrieving information stored in a language format which, at least in part, has intelligent meaning because of particular groupings of characters or symbols therein and which method utilizes a programmed data processor, said method comprising:
  • storing a set of information bearing records at predetermined addresses in a first machine accessible information storage means and delivering the information content of any given record when provided with the address of the given record,
  • storing a retrieval file comprising a plurality of arrays of binary coded elements in a second machine accessible information storage means,
  • arranging each of said arrays to include predetermined elements individually and respectively corresponding to the address of each of said information bearing records in said first machine accessible information storage means,
  • arranging each of said arrays to be representative of a predetermined identifiable characteristic associated with each of said information bearing records wherein said plurality of arrays constituting a comprehensive set of arrays correspond to a comprehensive set of said predetermined identifiable characteristics of language structure comprising substantially all such predetermined identifiable characteristics which are to be later utilized in searching for desired information bearing records,
  • binary coding each element in a given array in a first manner to represent the presence in the respectively corresponding record of the predetermined identifiable characteristic corresponding to the given array and in a second distinguishable manner to represent the absence in the respectively corresponding record of the predetermined identifiable characteristic corresponding to the given array,
  • providing a programmed data processor means operatively connected with said first and second machine accessible information storage means, and adapting said data processor for accepting input retrieval search data, for automatically identifying particular ones of said arrays in response to said search data, for automatically comparing the binary values of corresponding elements of said identified particular arrays thus identifying the addresses of particular records for which retrieval is desired and for automatically providing said identified addresses to said first machine accessible information storage means.
  • 70. A computerized information storage and retrieval method as in claim 69 wherein said programmed data processor means is further adapted for automatically generating said retrieval file in the second machine accessible information storage means from the set of information bearing records in the first machine accessible information storage means.
  • 71. A computerized information storage and retrieval method as in claim 69 wherein said programmed data processor means is further adapted to accept search data comprising at least a portion of one group of characters contained in a record for which information retrieval is desired.
  • 72. A computerized storage and retrieval method as in claim 69 further comprising:
  • adapting said second machine accessible information storage means for storing a plurality of said retrieval files, each retrieval file corresponding to only a particular predetermined portion of the information content of any given information bearing record, and
  • further adapting said programmed data processor means to accept a plurality of types of input search data, each type respectively corresponding to a predetermined one of said retrieval files.
  • 73. A computerized information storage and retrieval method as in claim 69 wherein said first storing step comprises storing said information bearing records as photographically recorded images.
  • 74. A computerized information storage and retrieval method as in claim 69 wherein said first storing step comprises storing said information bearing records as recorded machine readable records.
  • 75. A computerized information storage and retrieval method as in claim 69 wherein said second storing step comprises storing said retrieval file as photographically recorded images.
  • 76. A computerized information storage and retrieval method as in claim 69 wherein said second storing step comprises storing said retrieval file as magnetically recorded machine readable data.
  • 77. A computerized information storage and retrieval method as in claim 69 wherein said second arranging step includes causing at least some of said predetermined identifiable characteristics to correspond to the identity of characters in said records and to the relative sequential location of such characters in associated groups of characters contained in said records.
  • 78. A computerized information storage and retrieval method as in claim 77 wherein said second arranging step also includes causing at least some of said predetermined identifiable characteristics to correspond to the identity of characters and their relative sequential location within groups of characters having a predetermined number of total characters therein.
  • 79. A computerized information storage and retrieval method as in claim 77 wherein said second arranging step also includes causing at least some of said predetermined identifiable characteristics to correspond to upper and lower type-case representations of said characters.
  • 80. A computerized information storage and retrieval method as in claim 77 wherein said second arranging step also includes causing at least some of said predetermined identifiable characteristics to correspond to the identity of characters and their relative sequential location within groups of characters having any arbitrary number of total characters therein.
US Referenced Citations (2)
Number Name Date Kind
3354467 Beekley Nov 1967
3408631 Evans et al. Oct 1968
Non-Patent Literature Citations (5)
Entry
Summit, Roger K., Proceedings of 22nd Natl. Conference of the Assoc. for Computing Machinery, 1967, pp. 51-56. (L-7140-137).
Davis, D. R. et al., Communications of the ACM, vol. 8, issue 4, Apr. 1965, pp. 243-246. (L-7140-701).
Heiner, Joseph A., Jr., et al., Proceedings of the 21st Natl. Conf. of the Assoc. for Computing Machinery, 1966, pp. 339-345. (L-7140-894).
Salton, G., Proceedings of the 19th Natl. Conference of the Assoc. for Computing Machinery, 1964. pp. L2.3-1-L2.3-20 (L-7140-1235).
Fossum, Earl G. et al., UNIVAC Techn. Status Report No. 5, Contract AF-49 (638) 1194, Mar. 30, 1965, pp. 1-28 (L-7140-2348).