The present invention relates to a computer program and, more particularly, to a computer program for collating linguistic data.
One of the greatest challenges in the globalization of computer technologies is to properly handle the numerous written languages used in different parts of the world. Languages may differ greatly in the linguistic symbols they use and in their grammatical structures. Consequently, it can be a daunting task to support most, if not all, languages in various forms of computer data processing.
To facilitate the support of different languages by computers, a standardized coding system, known as Unicode, was developed to uniquely identify every symbol in a language with a distinct numeric value, i.e., codepoint, and a distinct name. Codepoints are expressed as hexadecimal numbers with four to six digits. For example, the English letter “A” is identified by the codepoint 0041, while the English letter “a” is identified by codepoint 0061, the English letter “b” is identified by the codepoint 0062, and the English letter “c” is identified by the codepoint 0063 in the Unicode system.
A fundamental operation on linguistic characters (or graphemes) of a given language is collation, which may be defined as sorting strings according to a set of rules that is culturally correct to users of a particular language. Collation is used any time a user orders linguistic data or searches for linguistic data in a logical fashion within the structure of a given language.
Support of collation on a computer requires an in-depth understanding of the language. Specifically, there must be a good understanding of the graphemes used in the language and the relationship between the graphemes/phonemes and the Unicode codepoints used to construct them. For example, in English, a speaker expects a word starting with the letter “Q” to sort after all words beginning with the letter “P” and before all words starting with the letter “R.” As another example, in the Traditional Chinese, the ideographs are often stored according to their pronunciations based on the “bopomofo” phonetic system as well as by the numbers of strokes in the characters. Further, the proper sorting of the graphemes also has to take into account variations on the graphemes. Common examples of such variations include casings (upper or lower case) of the symbols and modifiers (diacritics, Indic matras, vowel marks) applied to the symbols.
Collation, i.e., sorting, is one of the most fundamental features that a user expects to simply work. Ideally, collation should be transparent. People simply expect that when they click on the top of a column in Windows® Explorer, that the column will be sorted according to their linguistic expectations. Such expectation may be easy to meet from a technical perspective for simple languages, such as English; however, when support for additional languages is needed, such support can be more complicated.
The challenges in achieving proper collation are due to several factors. For example, people usually have a clear idea of how the information they choose to collate should be ordered. However, few people can really describe the rules by which collation works for any but the simplest of languages, such as English. To make the matter even more complicated, collations that are appropriate for one language are often not appropriate for another; in fact, many collation schemes contradict each other.
Furthermore, people who generally understand the technical issues of collation do not understand the language or the linguistic structure. Contrariwise, experts in languages often lack the technical expertise to provide collation in a form that can be used in a traditional, multi-weighted collation format. In addition, existing platforms providing collation extensibility require full collation information as input. This requires extensive technical skill, knowledge of internal methodology and structures, and overt collation knowledge.
Usually, collation is done manually by professional collation providers, such as professional linguists.
Additionally, different institutions often need the capability of collating data in a linguistically appropriate fashion. Such institutions, for example, the U.S. Homeland Security Agency, may prefer not to share data with a professional collation provider. Therefore, there is a need to provide an automated collation support so as to allow data to be collated in a private matter.
In summary, proper collation support requires a comprehensive understanding of the language of the linguistic structure. Manually input collation information by professional collation providers, such as linguists, limits the ability to add collation support for linguistic data. As a result, there is a need to automate the collation process such that collation support can be easily extended for any given language and collation can be done by a general user when privacy is preferred. The invention described below is directed to addressing this need.
The invention is directed to a tool that automatically establishes collation support for sorted linguistic data. The tool analyzes the sorted linguistic data to identify the underlying collation rules (“collation creation”). During the collation creation process, the tool may present the user who provided the sorted linguistic data, through a user interface, iterative questions concerning the sorted linguistic data, thus collaborating with the user in reaching a correct collation support for the sorted linguistic data. The tool may further test the resultant collation support by sorting test data provided by the user through the user interface.
One aspect of the invention includes a user interface that enables a user providing the sorted linguistic data to interact with the collation creation process. The collation creation process sends a query to the user interface concerning the sorted linguistic data. Such a query can ask for clarification of behavior of a character, or for confirmation of a collation pattern inherent in the sorted linguistic data. The user may answer the query by, for example, providing additional data or modifying the sorted linguistic data. The user's input is preferably integrated into the collation creation process in real time to generate the collation support anticipated by the user. The user may also enter test data through the user interface to verify whether the collation support resulting from the collation creation process collates the test data properly.
In accordance with one aspect of the invention, the user interface contains a main window that displays the sorted linguistic data. The user interface may also attach visual cues to the sorted linguistic data after applying the identified collation support to the sorted linguistic data. The visual cues may indicate distinctions between two compared strings in the collated linguistic data. For example, the visual cue may indicate the break point of a string and the type of the weight difference at the break point. A break point of a string identifies the part of the string that actually caused the string to sort in its particular location.
In accordance with another aspect of the invention, the user interface may display queries concerning the sorted linguistic data. A query gives the user providing the sorted linguistic data an opportunity to confirm the collation and/or clarify the sorted linguistic data to produce correct collation support.
In accordance with yet another aspect of the invention, the user interface includes an advanced window that provides additional information concerning the sorted linguistic data. Such information includes Unicode codepoints for the characters in each string in the sorted linguistic data. Such information may also include character properties concerning each character in a string in the sorted linguistic data.
In accordance with a further aspect of the invention, the user interface includes a test surface, which uses sorted or unsorted test data from the user to test the identified collation support. The user may adjust the collated test data to suggest the correct collation support.
In summary, the invention provides a user interface that facilitates automatic generation of collation support based on sorted linguistic data. The invention also enables the user providing sorted linguistic data to guide the collation creation process through this user interface, fully utilizing the user's knowledge of the sorted linguistic data and the user's expectation of the collation support to be generated.
The foregoing aspects and many of the attendant advantages of this invention will become more readily appreciated as the same become better understood by reference to the following detailed description, when taken in conjunction with the accompanying drawings, wherein:
Embodiments of the invention provide a tool for automatically creating collation support, i.e., collation creation, for linguistic data. In contrast to conventional collation creation, which requires the work of a professional collation provider, such as a linguist, the invention enables a general user to create collation support for a human language.
For example,
In embodiments of the invention, TOOL 204 contains two major components: a collation engine and a user interface. The collation engine performs an automatic collation creation process by analyzing custom data to identify collation rules controlling the ordering of the custom data. The user interface can be used to receive custom data from a CU. The user interface can also be used by the collation engine to present queries concerning the custom data. The user interface can further be used to test collation rules identified by the collation engine. One advantage of the user interface is that the complexity of the underlying collation creation process is completely hidden under the user interface. Another benefit of the user interface is that throughout the collation creation process, iterative queries are sent to the user interface so that the CU can clarify the custom data to ensure proper collation of creation based on custom data. Thus, the user interface enables an interactive approach that engages the CU in real time to collaboratively create the desired collation support for the custom data.
The following description first describes an exemplary implementation of a user interface for TOOL 204. An exemplary collation creation process illustrating functions of the collation engine is then described. The illustrative examples provided herein are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Similarly, any steps described herein may be interchangeable with other steps or combinations of steps in the same or different order to achieve the same result.
In embodiments of the invention, TOOL 204 includes a user interface that provides appropriate interactions with a CU.
User interface 300 allows a CU to add new strings to first column 304 by actuating the “Load” button 303. In embodiments of the invention, there are three ways for inputting data into the ordered list 106 contained in first column 304. First, data can be inserted in an order chosen by a CU. As part of the insertion process, the CU ensures the data is verified, i.e., the data is sorted and the ordering is consistent with the target collation the CU is attempting to emulate. Secondly, a CU can have the TOOL 204 insert the data in a manner consistent with what the current, validated custom data demonstrates. In embodiments of the invention, custom data is validated after it goes through a validation process that ensures that the custom data is both consistent in ordering and complete in coverage.
The user interface 300 further includes an “Analyze” button 305, the actuation of which initiates a collation creation process that analyzes the list of sorted strings contained in first column 304 to identify the underlying collation rules. In embodiments of the invention, the collation engine component of the TOOL 204 performs the analysis function.
User interface 300 also permits a CU to save the complete custom data by actuating the “Save” button 306. In embodiments of the invention, user interface 300 may also permit a CU to save the collation support information resulting from the collation creation process in a binary file. A CU may exit the user interface 300 by actuating the “Quit” button 308.
In embodiments of the invention, user interface 300 also provides visual cues such as underline, color, shading, etc., to indicate some of the important distinctions between two compared strings. Such important distinctions include the break point of a string, i.e., the part of the string that actually caused the string to sort in its particular location. For example, when comparing “Cathy” and “Catherine,” the break point for each string would be the letter “y” and the letter “e,” respectively, such that the string “Catherine” sorts before the string “Cathy.” In some embodiments of the invention, the break point of a string is underlined.
User interface 300 may also provide visual cues indicating the type of weight difference at a break point. Generally, there are three types of weight differences: primary, secondary, and tertiary. Primary differences are generally alphabetic weights among characters. For example, the difference at the previously mentioned exemplary break points for the strings “Cathy” and “Catherine” (“y” versus “e”) would be a primary difference. Secondary differences are generally diacritic weights. For example, when comparing the string “resume” and “resume,” the difference between the letter “e” and the letter “e” is a secondary difference. Tertiary differences are generally casing weight. For example, when comparing the string “Spam” and the string “spam,” the difference in capitalization would be a tertiary difference. In an exemplary implementation of user interface 300, the break point of a string is colored differently to reveal the type of weight difference at the break point. For example, a red-colored break point implies a primary difference, a blue-colored break point implies a secondary difference, and a yellow-colored break point implies a tertiary difference.
One unique feature of the user interface in the TOOL 204 is to enable a CU to interact with the collation engine while it analyzes custom data to identify collation rules. In an exemplary embodiment of the invention, the interaction is realized by the collation engine posing questions to the CU through user interface 300 and by the CU answering the questions and/or correcting the problems identified by the questions.
In embodiments of the invention, main window 304 in user interface 300 provides a “Show Codepoints” button 314, the actuation of which changes main window 302 into an advanced window 402 that contains additional information, such as Unicode codepoints and Unicode properties.
Advanced window 402 contains multiple columns. Besides containing first column 304 that includes sorted strings to be analyzed, additional columns are provided for more advanced users, and are therefore optional. The additional columns supply supplementary information, such as the actual Unicode codepoints that comprise the string in question. For example, as illustrated in
The Unicode codepoints can help a user understand the linguistic structure of a string and how certain characters impact collation weighting. As noted in the Background of the Invention section, Unicode identifies each symbol in a language with a distinct numerical value and name. The numerical value is called a codepoint. Advanced window 402 displays the codepoints of each symbol in a string. For example, as illustrated in
In addition, advanced window 402 also includes a checkbox 404 for “Unicode Property Info.” Upon the selection of checkbox 404, user interface 300 provides information about character properties for the characters in a string. Such information about character properties provides better understanding of the string. In embodiments of the invention, typical character properties include General_Category, Bidi_Class, Canonical_Combining_Class, Decomposition_Type, Decomposition_Mapping, Numerical_Type, and Numerical_Value. For a detailed description about character properties, please see Unicode Character Database, http://unicode.org/public/unidata/ucd.html.
Furthermore,
In embodiments of the invention, user interface 300 further displays strings in different normalization forms. As those skilled in the art or related fields know, normalization is the process of removing alternative representations of equivalent sequences from textual data in order to convert the textual data into a form that can be compared for equivalency. In the Unicode standard, normalization refers specifically to processing to ensure that canonical-equivalent and/or compatibility-equivalent strings have unique representations. For more information on normalization in the Unicode standard, please see Unicode Normalization Forms, http://www.unicode.org/report/tr15/. Generally, there are four Unicode normalization forms, namely, Normalization Form C, Normalization Form D, Normalization Form KC, and Normalization Form KD. User interface 300 gives a CU the option to decide which normalization form(s) will be displayed. For example, as illustrated in
In embodiments of the invention, TOOL 204 provides a CU with the ability to test collation rules identified by the collation engine on applicable data that is not part of the custom data being used for collation creation. A CU can use the testing feature to determine if the collation engine has identified the expected collation rules. A CU can input test data into a test user interface (hereinafter “Test Surface”) to have the collation rules applied to the test data to determine if the collation of the test data is correct. In such embodiments of the invention, user interface 300 therefore further includes a test surface.
In embodiments of the invention, test surface 700 can receive a correctly-sorted list of strings from a CU. By inputting a correctly sorted list of strings to test, a CU can verify whether applying the current collation rules keeps the current order of the test strings intact. If the current order of the test data is changed, the changes can be highlighted so that they may be resolved by the CU. Test surface 700 can also accept an unsorted list of strings as test data. TOOL 204 can then collate the test data upon the CU actuating “Sort” button 702. The CU can then indicate whether the resultant collation of the test data was correct. If it is not, the CU can assist in the resolution of the problem by correcting the ordering of the collated test data, which is then used to produce correct collation rules.
By using test surface 700, a CU can test the collation rules prior to building a collation binary file. After viewing the collated test data, a CU can identify problems and make corrections to the sorting of test data. The corrections will trigger TOOL 204 to adjust the collation rules accordingly. The collated test data may be added to the custom data as soon as it is verified by the CU.
For example,
In summary, user interface 300 enables a CU to interact with the collation creation process executed by the collation engine component of TOOL 204, in real time, so as to ensure creation of the collation support expected by the CU. User interface 300 also provides an engaging and straightforward way for the CU to participate in the collation creation process by hiding the complexity of the collation creation process that is discussed in detail below.
After receiving custom data from a user interface, such as user interface 300 illustrated in
More specifically, process 800 first receives custom data, for example, through a user interface, such as user interface 300 of TOOL 204. See block 802. As mentioned above regarding user interface 300, there are essentially three different approaches to input custom data. The first approach considers the received custom data to have been verified by a CU. This means that the custom data has been sorted and the ordering is consistent with the target collation the CU attempts to emulate. Inputting sorted custom data can be done all at once, in batches, or one entry at a time.
The second approach, on the other hand, relies on the existing collation information the collation engine is holding. No additional custom data will be used until the collation engine has validated custom data it currently holds. As noted earlier, validation is a process that the collation engine uses to determine whether the custom data is both consistent in ordering and complete in coverage. This process usually occurs before the collation engine analyzes the custom data to identify the underlying collation rules.
The third approach is specific to languages that use ideographic systems. Such languages are primarily Chinese, Japanese, and Korean. The third approach is similar to the first approach in that custom data is considered verified. In embodiments of the invention, the collation engine has a basic understanding of many of the phonetic, stroke-based, and other indexing systems. Thus, a CU with a dictionary implementing such an indexing system in electronic form can pass the information in the dictionary directly to the collation engine. In general, under the third approach, it does not matter whether the custom data is in a sorted order or not because explicit collation support for the custom data is already available. Such existing collation support includes pronunciation-based ordering such as the “bopomofo” system for collating Traditional Chinese. Such existing collation support may be stroke count-based orderings. For example, one such ordering is based on the total stroke count within a Han character. Other existing collation supports include government or industry encoding standard-based ordering, such as the GB official standard of the People's Republic of China. In other cases, combinations of the various orderings are used. For example, the “bopomofo” pronunciation-based ordering for traditional Chinese could be used along with all ideographs that have identical pronunciations sorted in stroke order. Another example is the Kanji dictionary, which allows a Japanese reader to easily look up Chinese ideographic characters used in Japanese. Generally, Kanji ideographic characters are ordered by radical (an element in the ideograph that can represent a pronunciation or a core concept) and by stroke (the number of brush strokes needed to draw the character).
Because a given character may have multiple pronunciations in pronunciation sorts, embodiments of TOOL 204 support a frequency count, which identifies the number of pronunciations a given character may have. At one given time, TOOL 204 may enable only one pronunciation. TOOL 204 may leave the alternate pronunciations in a disabled state indicating that they are not being used.
Upon receiving custom data under any of the three approaches, process 800 executes a routine 804 to analyze the custom data and identify collation rules manifested by the ordering of the custom data.
If the answer to decision block 806 is YES, meaning that the custom data has been verified, process 800 proceeds to check if the CU has input more custom data. See decision block 810. If the answer is YES, process 800 loops back to block 802 to receive the additional custom data, which is then analyzed and checked for verification. If the answer to the decision block 810 is NO, meaning that there is no additional custom data from the CU, process 800 proceeds to check if the CU wants to test the current collation rules identified by executing routine 804. See decision block 812. If the answer is YES, process 800 executes a routine 814 that tests current collation rules upon receiving test data from the CU.
If the answer to decision block 812 is NO, meaning that process 800 receives no request to test current collation rules, process 800 may proceed to build the current collation rules into a binary file. The resultant collation information can be used in the future for collating other linguistic data. See block 816. In some embodiments of the invention, process 800 also allows the CU to save the complete custom data, preferably along with other information. For example, process 800 may save the custom data, possibly along with its Unicode codepoints.
After executing process 830 that validates and normalizes the custom data, routine 804 proceeds to Phase 1, which is the first step of identifying collation rules based on the ordering in the custom data. In this phase, routine 804 compares the ordering of the custom data with existing collation support schemes. For example, in the exemplary embodiment of the invention, routine 804 compares the ordering of the custom data with the Windows® default sorting table. See block 832. The Windows® default sorting table is a flat table of 32-bit values that contains the default sort weight for each character whose Unicode codepoint is in the range of 0000-FFFF. The Windows® default sorting table is the basis for all collations. Currently, more than 70 locales are supported by the Windows® default sorting table. In general, a locale is a unique combination of language, religion, and script that defines a set of preferences for formatting and sorting linguistic data. Thus, it is possible that the desired collation for the custom data may be covered in the Windows® default sorting table. In such a case, no further processing will be required. As illustrated in
If there is no matching collation in the Windows® default sorting table, routine 804 proceeds to Phase 2. Phase 2 determines if any of the available compression and exception tables matches the differences resulting from the comparison that occurred in Phase 1, i.e., the differences between the Windows® default sorting table and the ordering of the custom data. See block 836. As known to those of ordinary skill in the art or other related fields, an exception table lists changes that are to be made to the Windows® default table for a given language. An exception table should be a minimal subset of characters that must have their assigned weights changed for the sake of the given language's collation. Meanwhile, a compression table registers each type of compression, i.e., sort elements that contain more than one Unicode codepoint. In embodiments of the invention, the knowledge that a particular compression or exception table has a resemblance to the custom data may help the collation engine formulate clarifying questions to be presented to the CU. In situations where the custom data closely matches an existing exception or compression table, the possibility of a mistake will be presented to CU.
If there is a match between the differences resulting from the comparison that occurred in Phase 1 and the information in one of the compression and exception tables (see decision block 838), routine 804 returns to process 800 (
As noted above,
In some embodiments of the invention, the collation engine sends messages concerning the problems it finds in the custom data only when a certain point is reached, i.e., when there are too many problems for the collation engine to proceed further.
In most situations, custom data received by the collation engine will contain primarily valid data with only minor discrepancies. Thus, the collation engine assumes that the custom data is accurate information. The iterative nature of questions and answers during process 830 is collaborative, working with the CU in real time to determine the proper collation support for the custom data.
In some embodiments of the invention, when the quantity of the custom data and its coverage is acceptable to the collation engine, i.e., that nothing is incomplete or inconsistent, the collation engine sends a message to a user interface, such as user interface 300, to indicate to the CU that the data has been validated. As illustrated in
After identifying the problems in custom data (
Additionally, small differences from an existing collation support scheme may exist in the custom data. In this case (see decision block 862), routine 846 sends user interface 300 a message that points out the similarity, and prompts the CU to verify the difference. In some embodiments of the invention, the message does not reference the specific language with which the similarity exists so as to avoid any potential geo-politically sensitive issues. See block 864. This occurs when there appears to be specific variances to the collation used elsewhere, such as a script sorting uppercase before lowercase, despite the usual converse policy.
At times, additional information may be needed for a script or range of characters. This occurs when there appears to be missing information that may or may not be important. For example, if a CU is using the Latin script, but is missing letters within the Latin range, the collation engine may suggest a position in the collation rules for a missing letter. The collation engine then prompts the CU to confirm the suggested position, or to reject the position and suggest an appropriate position. In such a case (see decision block 866), routine 846 sends a message to user interface 300 to ask for the specific information needed. See block 868.
Furthermore, custom data may treat two equivalent strings as if they are not equal. For example, two strings may be equivalent because of the Unicode character properties and/or Unicode normalization. However, the custom data treats them as if they are not equal. In this case (see decision block 870), routine 846 sends a message to user interface 300 to prompt the CU to choose which position is correct. See block 872. Upon a user selecting a position, the other position is removed.
Because correct data is the essential premise of any effective collation creation effort, custom data usually needs some adjustment in order for it to be correct data for collation creation. Therefore, routine 846 may be invoked at any time for the CU to adjust custom data during the collation creation process.
During the execution of process 840, the collation engine may send clarifying questions to a CU because if any problem with the custom data occurs in process 840, it is likely that more information is needed to generate collation support that is completely correct. For example, if process 840 wants to confirm a specific behavior of a certain character, process 840 may ask the CU to input more strings containing the character to exemplify the behavior of the character. The query may also specify the options of positioning a character, and ask a CU to choose an option. Further, process 840 displays visual cues in the custom data to indicate the collation support. A CU can thus adjust the ordering of the strings to provide the collation engine instant feedback about the collation support.
In an exemplary embodiment of the invention, at each action in process 840, the current representation of the relationship between codepoints and sort weights, as described by the custom data and validated by the collation engine, is stored. The collation engine can then reference stored collation data at any time, thus enabling the CU to continue to refine the collation data.
In embodiments of the invention, when analyzing the collation patterns, for example, the weighting structures in the custom data, the collation engine first starts with the Windows® default table. The collation engine then goes to the existing exception and compression tables, and then creates internal exception and/or compression tables as well as additional data when necessary. The goal of the collation engine is to create the minimum subset of the collation support required to capture the ordering in the custom data. Therefore, if a CU knows what the minimum subset is, the CU may present it to TOOL 204 directly. The majority of the complexity of the collation engine's analysis work comes from the fact that a CU rarely has the minimum subset concerning a given language.
More specifically, as shown in
After finding the break point and the nature of the break based on the pointer character in each string, process 840 determines if there are other characters in the strings. See decision block 882. If the answer is YES, process 840 advances the pointer in each string to the next character in each string or to NULL if there is no further character in a string. See block 884. From there, process 840 returns to block 876 and begins to group strings based on the primary, secondary, or tertiary difference of the pointer character in each string. At the end of the loop, process 840 identifies both the first break point for each string and an initial ordering of the initial characters in the strings.
In embodiments of the invention, process 840 treats each character as being a unique sorting element and waits until an apparent contradiction is found in the data prior to looking for any expansions, compressions, and other constructs that cause collation to be more complicated. In embodiments of the invention, during one grouping section, if a difference appears to be ignored at some level, it will be ignored by the collation engine for the rest of this grouping section. For example, process 840 may examine the following custom data:
In this sample, there are variations in case and diacritics. The first grouping (block 876) groups the data into “c” grouping based on the alphabetic weight of the first character. It ignores the variations in case and diacritics. However, during the second grouping (block 878), process 840 notices that the lower case “{hacek over (c)}” comes after the plain lower case “c.” During the third grouping (block 880), process 840 further notices that the lower case “c” comes before the upper case “C.” Therefore, by analyzing this sample data, process 840 identifies these collation rules: lower case “c” comes before upper case “C” and the plain lower case “c” comes before the lower case “{hacek over (c)}.”
During Phase 3, the presence of special collation rules is determined and analyzed as well. The special collation rules include, for example, the “REVERSE DIACRITIC” rule for collation in French. In French, diacritics are evaluated in a string from back to from front. Therefore, the word “côte” sorts before the word “coté” in French, while other languages would not sort the words this way. Another example is the “DOUBLE COMPRESSION” rule seen in Hungarian, where the existence of a grapheme such like “dsz” implies that the grapheme “ddsz” is treated as “dszdsz” for collation purpose. In embodiments of the invention, these special rules are saved as additional data for the collation support of the custom data.
If the answer to decision block 882 is NO, meaning that process 840 has processed all the characters in each string, process 840 performs a meta-analysis of the groupings. See block 886. The meta-analysis examines the way that specific characters such as diacritics and other combining marks, as well as scripts in general, are handled as compared with existing Windows® sorts. For example, the meta-analysis may note the different behavior of the use of Anusvara across many of the Indic languages within Windows® and the custom data. The meta-analysis will use similarity to guide decisions about the custom data. If the decision is incorrect, the CU can override it in later review of the collated custom data.
After identifying collation rules for the custom data, in some embodiments of the invention, the collation engine may test the collation rules.
More specifically, as shown in
If the answer to decision block 892 is NO, meaning that the CU does not approve the collation support, routine 814 proceeds to present an interface for receiving corrections from the CU to the current ordering of the collated test data. The test data will then be regarded as verified by CU. See block 896. In some embodiments of the invention, the test surface 700 allows the CU to drag and drop a string to its proper place. Routine 814 then proceeds to insert the verified but invalidated test data back to the custom data. See block 898. In this situation, the collation creation routine 804 (
While the preferred embodiment of the invention has been illustrated and described, it will be appreciated that various changes can be made therein without departing from the spirit and scope of the invention.