Claims
- 1. Apparatus for storing and processing a plurality of data items each comprising supplied data values organized in one or more fields, at least a given one of said supplied data values comprising a sequence of character values representing natural language text, said apparatus comprising, in combination,
a random access memory for storing an array of fixed-length binary integers, first data conversion means for transforming each of said data values into a sequence of zero or more fixed-length integers and for storing each such sequence in said random access memory, means for accessing a selected sequence of integers representing said given one of said supplied data values, and second data conversion means for transforming said selected sequence of integers into said sequence of character values representing natural language text.
- 2. Apparatus as set forth in claim 1 wherein at least some of said fields contain variable length data values and wherein said apparatus further includes means for storing the number of integers in the sequence of integers representing the contents of each field containing said variable length data values.
- 3. Apparatus as set forth in claim 2 wherein each of said fields contains typed data and wherein random access memory contains metadata specifying the data type of each of said fields.
- 4. Apparatus as set forth in claim 1 wherein each given one of said data items is represented by a group of one or more integer sequence stored in said random access memory, each of said sequences comprising one or more fixed length integers representing the supplied data value of one of said fields.
- 5. Apparatus as set forth in claim 4 wherein at least some of said data items include one or more link values each of which identifies a referenced data item.
- 6. Apparatus as set forth in claim 5 wherein at least some of said link values specify parent-child relationships between different data items.
- 7. Apparatus as set forth in claim 5 wherein at least some of said link values specify linked listings of different data items.
- 8. Apparatus as set forth in claim 1 wherein character values representing natural language text are expressed in Extended Markup Language and wherein said first data conversion means transforms said character values into a sequence of nested groups of integer values each representing an XML element.
- 9. The method of storing and processing character data expressed in the Extended Markup Language which comprises, in combination, the steps of:
parsing said character data into substrings, converting each of said substrings into a sequence of fixed length integers, storing said fixed length integers in a random access memory, and retrieving selected ones of said integers from said memory and transforming said selected ones of said integers into the character data.
- 10. The method set forth in claim 9 wherein said character data represents a nested collection of XML elements each of which is represented by a sequence of one or more fixed length integers.
- 11. The method of storing and processing markup text comprising the steps of:
parsing said markup text into nested logical subdivisions each consisting of character data values and markup metadata values, converting of said logical subdivisions into correspondingly nested sequences of fixed length integers, storing said integers in an random access memory, retrieving a selected one of said sequences from said random access memory. and converting said selected one of said sequences into a character data value.
- 12. The method of storing and processing markup text as set forth in claim 11 wherein at least one of said subdivisions comprises natural language text consisting of a sequence of natural language words and wherein said step of converting said logical subdivisions includes means for converting each of said natural language words into one of said integers having a value which uniquely specifies a natural language word.
CROSS REFERENCE TO RELATED PATENT APPLICATIONS
[0001] This application claims the benefit of the filing date of U.S. provisional patent application Ser. No. 60/255,807 filed on Dec. 15, 2000, and further claims the benefit of the filing date of U.S. patent application Ser. No. 09/793,267 filed on Feb. 26, 200 entitled “Methods and apparatus for storing and manipulating natural language text data as a sequence of fixed length integers,” the disclosure of which is hereby incorporated by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60255807 |
Dec 2000 |
US |