The present disclosure relates to technical documentation and functional products. More particularly, the present disclosure related to systems and methods that help automate the detection of errors in technical documentation for functional products, such as devices and/or services.
As the value and use of information continues to increase, individuals and businesses seek additional ways to process and store information. One option available to users is information handling systems. An information handling system generally processes, compiles, stores, and/or communicates information or data for business, personal, or other purposes thereby allowing users to take advantage of the value of the information. Because technology and information handling needs and requirements vary between different users or applications, information handling systems may also vary regarding what information is handled, how the information is handled, how much information is processed, stored, or communicated, and how quickly and efficiently the information may be processed, stored, or communicated. The variations in information handling systems allow for information handling systems to be general or configured for a specific user or specific use, such as financial transaction processing, airline reservations, enterprise data storage, or global communications. In addition, information handling systems may include a variety of hardware and software components that may be configured to process, store, and communicate information and may include one or more computer systems, data storage systems, and networking systems.
Ever increasing demands for data and communications have resulted in vast arrays of ever expanding networks that comprise information handling systems. As these networks evolve and expand, new features and functionality are added at different times and for different reasons.
When new features are added to a product, new documentation needs to be generated that describes the new features and how to implement or execute those features. Because several changes may be made in a new version of a product or in a new product, the corresponding amount of documentation can also be quite voluminous.
Regardless of the amount of documentation, it is critical that the documentation accurately describe the product and its functionalities. If the documentation is incorrect (e.g., fails to include descriptions of new features, fails to exclude descriptions of features that are no longer supported, has omission, has typographical errors, or other errors), then customers are likely to become frustrated.
Frustrated customers are a serious concern to any business. Costs increase due to added technical support calls. Engineering talent is diverted from developing new products to troubleshooting. And, sales can be negatively impacted. Thus, any mismatches between a product's functionality and its corresponding documentation can have severe consequences to a company's profitability.
Given the complexity of today's information handling systems, not only is the documentation vast but it is also highly technical—making it quite difficult and laborious to check for errors. Furthermore, the engineers developing the products are often a different group than the ones that develop the documentation. While these groups try to work closely together, there is still opportunities for information to be missed and other errors to enter.
Accordingly, what is needed our systems and methods that help automate the process to check for errors in technical documentation.
References will be made to embodiments of the invention, examples of which may be illustrated in the accompanying figures. These figures are intended to be illustrative, not limiting. Although the invention is generally described in the context of these embodiments, it should be understood that it is not intended to limit the scope of the invention to these particular embodiments.
In the following description, for purposes of explanation, specific details are set forth in order to provide an understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these details. Furthermore, one skilled in the art will recognize that embodiments of the present invention, described below, may be implemented in a variety of ways, such as a process, an apparatus, a system/device, or a method on a tangible computer-readable medium.
Components, or modules, shown in diagrams are illustrative of exemplary embodiments of the invention and are meant to avoid obscuring the invention. It shall also be understood that throughout this discussion that components may be described as separate functional units, which may comprise sub-units, but those skilled in the art will recognize that various components, or portions thereof, may be divided into separate components or may be integrated together, including integrated within a single system or component. It should be noted that functions or operations discussed herein may be implemented as components. Components may be implemented in software, hardware, or a combination thereof.
Furthermore, connections between components or systems within the figures are not intended to be limited to direct connections. Rather, data between these components may be modified, re-formatted, or otherwise changed by intermediary components. Also, additional or fewer connections may be used. It shall also be noted that the terms “coupled,” “connected,” or “communicatively coupled” shall be understood to include direct connections, indirect connections through one or more intermediary devices, and wireless connections.
Reference in the specification to “one embodiment,” “preferred embodiment,” “an embodiment,” or “embodiments” means that a particular feature, structure, characteristic, or function described in connection with the embodiment is included in at least one embodiment of the invention and may be in more than one embodiment. Also, the appearances of the above-noted phrases in various places in the specification are not necessarily all referring to the same embodiment or embodiments.
The use of certain terms in various places in the specification is for illustration and should not be construed as limiting. A service, function, or resource is not limited to a single service, function, or resource; usage of these terms may refer to a grouping of related services, functions, or resources, which may be distributed or aggregated. Furthermore, the use of memory, database, information base, data store, tables, hardware, and the like may be used herein to refer to system component or components into which information may be entered or otherwise recorded.
The terms “packet,” “datagram,” “segment,” or “frame” shall be understood to mean a group of bits that can be transported across a network. These terms shall not be interpreted as limiting embodiments of the present invention to particular layers (e.g., Layer 2 networks, Layer 3 networks, etc.); and, these terms along with similar terms such as “data,” “data traffic,” “information,” “cell,” etc. may be replaced by other terminologies referring to a group of bits, and may be used interchangeably. The terms “include,” “including,” “comprise,” and “comprising” shall be understood to be open terms and any lists the follow are examples and not meant to be limited to the listed items. Any headings used herein are for organizational purposes only and shall not be used to limit the scope of the description or the claims.
Furthermore, it shall be noted that: (1) certain steps may optionally be performed; (2) steps may not be limited to the specific order set forth herein; (3) certain steps may be performed in different orders; and (4) certain steps may be done concurrently.
Aspects of the current patent document include systems and methods to extract data using natural language expressions in technical documents related to a product and to verify that data against formal structured data associated with source code for that product.
In embodiments, the command template database (CT-DB), command context database (CC-DB), or both may be used to verify information from a document against a command definition data set associated with the product. For example, a command definition data set, such as a YANG (“Yet Another Next Generation”) data model, may be included with the source code of a product release, whether a new product release or an update release. A YANG model explicitly determines or defines the structure, semantics, and syntax of data, which can be configuration and state data. It should be noted that while references are made in this patent document to YANG models, other data models, schema, and the like (which may be referred to herein generally as a “structured data set,” a “definition data set,” or the like) may also be used.
In embodiments, a command template database (DB) is consulted by the document verification system to lookup a command template for the particular product, which is a closest match to the command input selected from a structured data file associated with the particular device or platform. In embodiments, a term frequency-inverse document frequency (TF/IDF)-based ranking function is used to get the most relevant match for a command query input. In embodiments, the APACHE LUCENE index engine may be used to index commands (e.g., CLIs and REST APIs) for template lookup.
As shown in embodiment depicted in
Returning to
In embodiments, a command context DB is consulted by the document verification system to check if the semantic context of the command captured in the technical document matches that of the command definition file captured from the source code. The semantic context of a command in the technical document is usually the combined information entropy of the command, present in the description, examples, and references.
As shown in
Embodiments of the query/input lookup are presented below. In embodiments, the query may be done against the command template database, the command context database, or both. Also, it shall be noted that while embodiments involve one or more commands of the definition data set be compared against the data from the documentation (e.g., the command template database and the command context database), one skilled in the art shall recognize that data from the documentation may be compared against data generated using the definition data set, or may be compared both ways.
In embodiment, one of the command from the query set of commands is selected (710) to be tested against the command template database. This selected command is queried (715) against the indexed command template database to find a set of matches, which may include commands that closely match. In embodiments, a term frequency-inverse document frequency (TF/IDF)-based ranking function is used to obtain the most relevant matches for a query command input.
The returned results may be examined to ascertain if one or more errors exist. In embodiments, the match results are checked to determine (720) whether or not there is at least one exact match. In embodiments, if there is not at least one exact match, then that query command, which exists in the code for the device, does not exist in the documentation, which is an error. Thus, in embodiments, it may be logged (725) that this command is missing from documentation.
As shown in
If there are no close matches or if the close matches have been logged for that command, in embodiments, a check is performed (740) whether there are any remaining commands in the query set. If not all of the commands in the query set of commands have been queried against the command template database, then, in embodiments, the process returns to step 710 in which the next command from the query set of commands is selected.
As shown in
Given the error logs, in embodiments, these error logs may be further examined to reduce false positive error detections, to classify errors, or both.
In embodiments, a reverse lookup of the close matches in the error log may be performed (805) against the set of flattened plain text commands to remove logged commands from the error log that are actually correct commands. Because a command may be similar to other valid commands, these commands may appear as close matches but are not actually errors. By checking whether these close matches in the error log match actual commands in the command definition data set, errors that are false positives can be readily removed.
In embodiments, a check is made whether there are any errors remaining in the log once the false positive errors have been removed. If there are no errors, an output of the results can be performed (820) showing that there are no errors.
If there are remaining errors in the error log, in embodiments, error classification may be performed on one or more of the remaining errors in the error log. In embodiments, the error classification may include checking one or more of the following: keywords, sequence of keywords, data types of value, range or values, and the like. For example, in embodiments, each command from the definition data set may be compared with a corresponding command template from the CT-DB. In embodiments, the comparison may be performed on the following categories:
(A) Keywords: check if key words between the query command and the corresponding command from the CT-DB are identical;
(B) Sequence of keywords: check if keywords in the query command and the corresponding command from the CT-DB appear in the same sequence;
(C) Data-Types of values: check if the data-types of the values for each key are same between the query command and the corresponding command from the CT-DB; and
(D) Range of values: check if the range of values between the query command and the corresponding command from the CT-DB are the same.
In embodiments, every comparison failure may be logged into the error log and the output of which may be provided (820) to a user for review and to take the appropriate action, such as correcting the documentation.
Consider, by way of illustration, the following example where the keywords [‘tagged’, ‘untagged’] are included in the documentation, but do not exist in the command definition data set in the code. A possible error in documentation where a feature not supported in the released product is present in its technical document.
Test Input: “interface vlan <>”, [“interface eth <>”,“interface vlan <>”]
CT-DB Output: “interface vlan <>”, [“interface eth <>”, “tagged interface vlan <>”, “untagged interface vlan <>”].
In embodiment, one of the command from the query set of commands is selected (910) to be tested against the command context database. This selected command is queried (915) against the command context database to determine whether there are any semantic mismatches with the query command relative to the data model of the command context.
For example, in embodiments, a semantic relevance test may be performed on each command line. In embodiments, a classifier iterates over each query command from the query set and queries the CC-DB for irrelevant words in the query command. In embodiments, the classifier may rely on the property of a vectorizer, such as Word2Vec, to generate vectors for words in the input corpus, which satisfy the property that the semantic similarity between words varies linearly with the cosine similarity between the vectors. Hence, the vectors of words which are semantically related are closer to each other than the vectors corresponding to unrelated words. In embodiments, the genism library function Word2VecModel.doesnt_match( ) function may be used to perform this semantic comparison and returns unrelated words. In embodiments, a command context classifier makes use of this function to find semantic outliers present in the query set of commands, but not covered in the technical document. The semantic mismatches may be logged into a “Context Error Log”.
Consider the following example: model.doesnt_match ([‘config-mode’, ‘interface’, ‘vlan’]). If it returns an empty set, then all words are semantically related; that is, from the documentation it can be inferred that ‘interface vlan’ is available in “config” mode. If the text of the documentation had erroneously represented ‘interface vlan’ to be an “exec” mode command, the test would return ‘config-mode’ as an outlier. In embodiments, this error may not be caught in the command template comparison since the mode of configuration is typically not part of the command and hence may not be present in the block of text representing the command in the technical document.
Returning to
Whether or not there are mismatches, in embodiments, a check is performed (930) whether there are any remaining commands in the query set. If not all of the commands in the query set of commands have been queried against the command context database, then, in embodiments, the process returns to step 910 in which the next command from the query set of commands is selected.
As shown in
Given the error logs, in embodiments, these error logs may be further examined.
Presented here are some examples to help illustrate the usefulness of the document verification system. These examples are provided by way of illustration only and shall not be used to limit the scope of the present patent document.
The CT-DB did not detect a key named “name” in the VLAN creation template. This leads us to infer documentation did not cover this newly introduced parameter.
The CT-corpus has detected an incorrect order of execution for the CLI. This leads us to infer documentation has covered the execution order incorrectly.
It shall also be noted that, in certain embodiments depicted herein, that the query set of commands are selected from data obtained from the command definition data set and compared with the CT-DB, the CC-DB, or both. However, one skilled in the art shall recognize that the system may be configured to select one or more query command from the technical documentation system 1100 and compare them against commands from the command definition data set. In yet another alternative embodiment, the system may check commands in both ways as a cross-verification. It shall also be noted that as the system 1100 is used or as it is provided more documentation, it becomes more robust.
1. Database Generator System
In embodiments, the documentation 1120 is provided to database generator 1130, which takes the documentation and generates a command template database 1135. In embodiments, the database generator 1130 may obtain the command template database 1135 by performing the methods describe above with reference to
In embodiments, the database generator 1130 also takes the documentation and generates a command context database 1145. In embodiments, the database generator 1130 may obtain the command context database 1145 by performing the methods describe above with reference to
In embodiments, the database generator 1130 also takes the command definition data set 1125 and generates the normalized command set 1140, which may be the flattened plain text command set. In embodiments, the database generator 1130 may obtain the normalized commands 1140 by performing the methods describe above with reference to
2. Command Template Query System
In embodiments, the command template query system 1110 comprises a command template classifier 1150 that access the command template database 1135 and the normalized commands 1140. In embodiments, the command template classifier 1150 receives a set of query commands and compares those against the command template database 1135 to obtain a command template error log or logs 1160. In embodiments, the command template classifier 1150 performs the methods describe above with reference to
3. Command Context Query System
In embodiments, the command context query system 1115 comprises a command context classifier 1155 that access the command context database 1145 and the normalized commands 1140. In embodiments, the command context classifier 1155 receives a set of query commands and compares those against the command context database 1135 to obtain a command context error log or logs 1165. In embodiments, the command context classifier 1155 performs the methods describe above with reference to
Finally, in embodiments, the template error log(s) 1160 and the context error logs 1165 may be combined by the system 1100 and output 1170.
Aspects of the present patent document are directed to information handling systems. For purposes of this disclosure, an information handling system may include any instrumentality or aggregate of instrumentalities operable to compute, calculate, determine, classify, process, transmit, receive, retrieve, originate, route, switch, store, display, communicate, manifest, detect, record, reproduce, handle, or utilize any form of information, intelligence, or data for business, scientific, control, or other purposes. For example, an information handling system may be a personal computer (e.g., desktop or laptop), tablet computer, mobile device (e.g., personal digital assistant (PDA) or smart phone), server (e.g., blade server or rack server), a network storage device, or any other suitable device and may vary in size, shape, performance, functionality, and price. The information handling system may include random access memory (RAM), one or more processing resources such as a central processing unit (CPU) or hardware or software control logic, ROM, and/or other types of nonvolatile memory. Additional components of the information handling system may include one or more disk drives, one or more network ports for communicating with external devices as well as various input and output (I/O) devices, such as a keyboard, a mouse, touchscreen and/or a video display. The information handling system may also include one or more buses operable to transmit communications between the various hardware components.
A number of controllers and peripheral devices may also be provided, as shown in
In the illustrated system, all major system components may connect to a bus 1216, which may represent more than one physical bus. However, various system components may or may not be in physical proximity to one another. For example, input data and/or output data may be remotely transmitted from one physical location to another. In addition, programs that implement various aspects of this invention may be accessed from a remote location (e.g., a server) over a network. Such data and/or programs may be conveyed through any of a variety of machine-readable medium including, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs and holographic devices; magneto-optical media; and hardware devices that are specially configured to store or to store and execute program code, such as application specific integrated circuits (ASICs), programmable logic devices (PLDs), flash memory devices, and ROM and RAM devices.
Embodiments of the present invention may be encoded upon one or more non-transitory computer-readable media with instructions for one or more processors or processing units to cause steps to be performed. It shall be noted that the one or more non-transitory computer-readable media shall include volatile and non-volatile memory. It shall be noted that alternative implementations are possible, including a hardware implementation or a software/hardware implementation. Hardware-implemented functions may be realized using ASIC(s), programmable arrays, digital signal processing circuitry, or the like. Accordingly, the “means” terms in any claims are intended to cover both software and hardware implementations. Similarly, the term “computer-readable medium or media” as used herein includes software and/or hardware having a program of instructions embodied thereon, or a combination thereof. With these implementation alternatives in mind, it is to be understood that the figures and accompanying description provide the functional information one skilled in the art would require to write program code (i.e., software) and/or to fabricate circuits (i.e., hardware) to perform the processing required.
It shall be noted that embodiments of the present invention may further relate to computer products with a non-transitory, tangible computer-readable medium that have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind known or available to those having skill in the relevant arts. Examples of tangible computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMS ROMs and holographic devices; magneto-optical media; and hardware devices that are specially configured to store or to store and execute program code, such as application specific integrated circuits (ASICs), programmable logic devices (PLDs), flash memory devices, and ROM and RAM devices. Examples of computer code include machine code, such as produced by a compiler, and files containing higher level code that are executed by a computer using an interpreter. Embodiments of the present invention may be implemented in whole or in part as machine-executable instructions that may be in program modules that are executed by a processing device. Examples of program modules include libraries, programs, routines, objects, components, and data structures. In distributed computing environments, program modules may be physically located in settings that are local, remote, or both.
One skilled in the art will recognize no computing system or programming language is critical to the practice of the present invention. One skilled in the art will also recognize that a number of the elements described above may be physically and/or functionally separated into sub-modules or combined together.
It will be appreciated to those skilled in the art that the preceding examples and embodiment are exemplary and not limiting to the scope of the present invention. It is intended that all permutations, enhancements, equivalents, combinations, and improvements thereto that are apparent to those skilled in the art upon a reading of the specification and a study of the drawings are included within the true spirit and scope of the present invention.