The present invention relates to XBRL (eXtensible Business Reporting Language) and, in particular, to an XBRL application or program.
The present invention is directed to a method that applies Evolutionary Optimization algorithm to the task of automated XBRL data mapping and to a computer program that manages the following processing steps:
The use of Evolutionary Optimization for the task of XBRL Data Mapping is the core of the invention. The search for document locations of data values presented in XBRL filings can be interpreted as a task of combinatorial optimization. Most of the values presented in XBRL Instance documents can correspond to more than one text object in the initial document. Average XBRL filing contains over a hundred data items. This makes the number of variations of mapping huge and inaccessible for the complete enumeration.
Evolutionary Data Mapping algorithm proposed in this invention allows reaching the best possible variant of data localization in several hundred steps. With the support of in-memory data caching the algorithm manages to find the required mapping solution in minutes, even at a personal computer with modest processing power.
The method starts from random mapping solution generation. According to generic Evolutionary Optimization schema, it is required to generate an initial population of random solutions. Using the XBRL and HTML Utilities we create a list of possible document locations for every XBRL data item. A Random mapping solutions generator produces complete variants of data mapping, combining random locations for every data item.
Population plays a very important role in the Evolutionary Optimization process. It maintains a restricted set of the best variants of a solution, and thus serves as a store of features that have proved their usefulness as higher than average.
After creating the initial population of random solutions, an algorithm starts the main loop of Evolutionary Optimization. At every step of the main loop the algorithm creates a new variant of mapping solution, combining locations of data items from parents, two randomly selected members of the population. Two mutually complimentary modification methods provide a transformation of the best parent solutions' features to a new offspring solution and the restoration of missed features. They are crossover and mutation.
Crossover takes two solutions and combines their features that are document locations for the same data items in our case. The whole purpose of the crossover is propagation of the promising features found at the prior steps of Evolutionary Process and saved in population. In order to enhance the productivity of crossover, we calculate and save individual estimations for every data link in the solution. The estimations allow selecting better links with higher probability. Thus, crossover presents the conservative side of optimization, saving and passing to new generations the best findings of the past trials.
Mutation does quite the opposite. It provides new solutions with minor random deviations from the mainstream of the features existing in the population. The idea behind the mutation is the following: crossover alone is capable of combining parents' features only. Thus, it would never be able to include into a new solution a link that is missed in the population. Mutation closes the gap, providing new solutions with all the variations of links existing for the corresponding data items. It uses individual link plausibility estimations for convergence optimization. The links with the worst estimations get mutated more frequently.
In order to support XBRL Data mapping, the program comprises all the classes and utility components required for input and output format conversions and in-memory processing, in addition to Evolutionary Mapping classes. Among them, specialized classes and utility methods for loading the XBRL document schema and basic taxonomy presentations and calculations structures referenced from the schema. Taxonomy structures are presented in multiple XML files saved on internet sites. The structures loading classes traverse through them, load and save the structures as a collection of in-memory objects for further use.
The program further comprises data, presentation and calculations conversion classes and utility methods for XBRL instance files. They support the creation of in-memory instance objects and structures from instance XML files and basic structures loaded, as reviewed above.
One more part of the program essential for the mapping process is HTML conversion utility. It provides the successful mapping of data items to initial document locations, it is absolutely required to be able to:
HTML Utility supports all these actions by creation of in-memory presentation of the HTML document and providing methods for loading, manipulations and modifications.
The last part of the program to be mentioned is the Mapping Request class that plays the role of interface between the user or automatic script and the program. It allows specifying files containing all parts of the instance filing:
XBRL (eXtensible Business Reporting Language) has become a de facto standard for business and financial data representation (http://xbrl.org/frontend.aspx?clk=LK&val=20). It normalizes data hidden in report texts providing unified semantic tags for data items and a structure covering relations between data categories. It is hard to overestimate the importance of such standardization, as it allows the collection and fast processing of financial data from various sources.
At the same time, the step to XBRL representation doesn't come free. Text representation of financial data is more habitual for human readers and it takes a substantial effort for those making preparations to create appropriate mapping of the data to the more computer-oriented XBRL representation. The size of the XBRL structure (over 13,000 categories) and the subjective interpretation of data elements makes mapping highly tedious and imprecise.
One of the filing process problems is the lack of visibility. XBRL format doesn't save links to the data location in the initial business report document and thus the user loses the ability to verify the correctness of data extraction.
With reference to
In the course of building a mapping solution, the mapping application first loads essential parts of XBRL Taxonomy 102 consisting of:
After basic Taxonomy structures had been loaded, the mapping application loads XBRL Instance files 104 and converts them into in-memory structures. The instance files include:
The next data source required for building a mapping solution is HTML document file 106. The mapping application loads the HTML document and converts it into an in-memory structure. It saves links between the parts of in-memory structure and the HTML document for further use at output forms generation time.
Statistical models 108 help to better identify the most plausible locations of data items. The models contain statistical relations between text objects built on review of multiple precedents of XBRL data locations. The mapping application loads statistical models for every data item category, including end terms and abstract text objects.
After processing the mapping application converts the resulting solution into output forms 110. Depending on the input parameters, output forms can be created as a set of linked HTML files or a combination of HTML and Microsoft Excel files
With reference to
Additionally, the XBRL Utility 204 provides the ability to import XBRL taxonomy 216 and instance XBRL files 214. The utility is able to browse through multiple inter-linked schema, presentations and calculations files, load the required ones, and convert them into in-memory objects.
Mapping Request Manager 202 controls processing of other parts of the data mapping application by loading names of XBRL Instance and HTML document files.
Consequently, the Mapping Request Manager checks the availability and correctness of all specified data files, and in successful cases starts the Evolutionary Mapping Engine 206. The Evolutionary Mapping Engine, in its turn, imports statistical Text Mining models and performs the Evolutionary Mapping Algorithm in a separate thread.
After the optimal mapping has been built, the Output forms generator 208 creates output forms 220 as a set of interlinked HTML files for source business document, presentation and calculations.
With reference to
Next class XBRLDataSolution 302 extends generic abstract class EvSolution 308. Each instance of this class contains a complete variant of the mapping of instance data items to locations in the document text. In the course of optimization, Evolutionary Search generates several thousand of such variants. The first several hundred of them serve as a source of random features that should be generated as uniformly as possible. XBRLDataSolution generates random variants at the initial stage of search in the method fillRandompy( ). Further convergence of the search to the best variant depends on the way variants of the solution selected to population are used for the creation of new solutions. XBRLDataSolution combines features of a couple of selected population members in method crossover( ). One more method requiring implementation is mutation( ). It updates variants created by crossover( ), supplying them with random deviations.
One more class that requires implementation for the given optimization problem is EvTask 310. It is meant for the calculation of optimization criteria. XBRLDataTask 304 implements the estimation of data mapping variant. Composed estimation criteria for the mapping data optimization combines the following partial estimations:
With reference to
With reference to
With reference to
XBRLUtility 612 provides a set of utility methods used by other conversion utilities.
With reference to
With reference to
With reference to