A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the United States Patent and Trademarks Office patent or records, but otherwise reserves all copyright rights whatsoever.
Data analysis is a process involving the organization, examination, display, and analysis of collected data using narratives, figures, structures, charts, graphs and tables. Data analyses are aided by data-analysis processor, which are computational engines, either in hardware or software, which can execute the data analysis process. High-end data-analysis processor typically have a language component like the R, S, SAS, Mathlab®, Python, and Perl families of languages. The availability of a language component facilitates data analysis in numerous ways including the following: providing arbitrary data transformations; applying one analysis result to results form another; abstraction of repeated complex analysis steps; and development of new methodology.
A principal challenge in using data-analysis processors is communicating the results of data analysis to data owners. Generation of reports as part of a data analysis project typically employs two separate steps. First, the data are analyzed using a data-analysis application based on a data analysis processor. And two, data analysis results (tables, graphs, figures) are used as the basis for a report document using a word processor application. Although, many data analysis applications try to support this process by generating pre-formatted tables, graphs and figures that can be easily integrated into a report document using copy-and-paste from the data analysis application to the word processor application, the basic paradigm is to construct the report document around the results obtained from data analysis.
Another approach for integration of data analysis and report document generation is to embed the data analysis itself into the report document. The concept of “literate programming systems”, “literate statistical practice” and “literate data analysis” are big efforts in this area. Proponents of this approach advocate software systems for authoring and distributing these dynamic data-analysis documents that contain text, code, data, and any auxiliary content needed to recreate the computations. The documents are dynamic in that the contents, including figures, tables, etc., can be recalculated each time a view of the document is generated. The advantage of this integration is that it allows readers to both verify and adapt the data analysis process outlined in the document. A user can readily reproduce the data analysis at any time in the future and a user can present the data analysis results in a different medium. Accordingly, a need exists for computer-implemented applications, methods and systems that enable users to integrate data analysis and data-analysis results generation using familiar software applications like a word processor application.
Whatever the precise merits and features of the prior art in this field, the earlier art does not achieve or fulfill the purposes of the present invention. The prior art does not provide for the following:
A computer-implemented method for generating data-analysis results in a word processor application is disclosed. The method may entail providing a data-analysis template wherein the data-analysis template comprises a word processor document and a data-analysis parts container, including at least one data-analysis part in the data-analysis container, communicating the data-analysis parts container to a data-analysis processor for generating a data-analysis results collection using the data-analysis parts container, and generating a data analysis results document.
The method may entail the following: using a word processor document comprising a data structure wherein presentation content and data content may be separated; using a word processor document comprising a Microsoft Word document; using a data-analysis parts container comprising an extensible markup language data structure; using at least one data-analysis part selected from a group comprising an object, a code block and an expression; using a data-analysis processor comprising one or more of the following: a language interpreter, a library of methods, and a runtime environment; using a data-analysis processor selected from a group of data-analysis processors; using a data-analysis processor provided by the local machine, a network server or a web service; generating a data-analysis results document comprising a word processor document comprising information from the data-analysis template and information from the data-analysis results collection.
The method may further entail the following: storing the data-analysis document as an electronic document file; storing the data-analysis document file in a format selected from a list of file formats; modifying the data-analysis results document; editing the data-analysis template; generating the data-analysis template; managing the data-analysis template in an electronic document management system; and managing the data-analysis results document in a electronic document management system.
The method may also operate on a computer readable medium having computer readable information or a computing apparatus.
Referring now to the drawings, in which like numerals represent like elements through several figures, aspects of the present invention and the exemplary operating environment will be described.
The steps of the claimed method and apparatus are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the methods or apparatus of the claims include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The steps of the claimed method and apparatus may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and other computer instructions or components that perform particular tasks or implement particular abstract data types. The methods and apparatus may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network, such as web services. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
With reference to
Computer 110 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 110 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer 110. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
The system memory 130 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 131 and random access memory (RAM) 132. A basic input/output system 133 (BIOS), containing the basic routines that help to transfer information between elements within computer 110, such as during start-up, is typically stored in ROM 131. RAM 132 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 120. By way of example, and not limitation,
The computer 110 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
The computer 110 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 180. The remote computer 180 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 110, although only a memory storage device 181 has been illustrated in
When used in a LAN networking environment, the computer 110 is connected to the LAN 171 through a network interface or adapter 170. When used in a WAN networking environment, the computer 110 typically includes a modem 172 or other means for establishing communications over the WAN 173, such as the Internet. The modem 172, which may be internal or external, may be connected to the system bus 121 via the user input interface 160, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 110, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
When reading the discussion of the routines presented herein, it should be appreciated that the logical operations of the various embodiments of the present invention are implemented as (1) computer-executable instructions, such as program modules, being executed by a computer and/or (2) as interconnected machine logic circuits or circuit modules within the computing system. The implementation is a matter of choice dependent on performance requirements of the computing system implementing the invention. Accordingly, the logical operation illustrated in
Referring now to
Illustrative data-analysis templates for generating data-analysis results include but are not limited to the following: data-analysis templates for assembly as electronic laboratory notebooks (for example: templates in chemistry discovery, biology discovery, chemical development, bioprocess development, formulation development, analytical development and clinical development); data-analysis templates for life sciences (for example: genomic analysis, microarray analysis, Taqman analysis, cheminformatics analysis, clinical trial design and analysis, biostatics analysis, health services and outcomes analysis, process analytical technology analysis); data-analysis templates for economics and finance (for example: loan portfolio valuation analysis, portfolio optimization analysis, risk management analysis, trading strategies analysis, consumer behavior analysis); data-analysis templates for manufacturing (for example: design and analysis of experiments, reliability and life expectancy analysis, field failure analysis, supply chain optimization analysis, demand forecasting optimization analysis, statistical process control analysis, six sigma analysis); and data-analysis templates for business performance analysis (for example: customer churn analysis, fraud detection analysis, data quality management analysis, marketing campaign analysis, customer behavior analysis).
The routine 200 continues from operation 210 to operation 220, wherein the method entails including at least one data-analysis part in the data analysis parts container.
Management and retrieval of the data-analysis container and its included data-analysis parts may be achieved by the use of program modules 270. Implementation of such program modules may be through the use of smart document technology, which provides an architecture to build context-sensitive data-analysis templates. Smart document solutions associate an electronic document like a word processor document 262 with an XML schema, so that presentation content 263 like a paragraph of text may be distinguished from data content 264 like a string of text corresponding to a data-analysis parts container 266. It is important to note that the base functionality of the word processor application is retained in a smart document solution. Smart document solutions allows programmatic customization for searching within and operating on extensible markup language (XML) nodes within a data-analysis template, which is comprised of a data-analysis parts container. Data-analysis templates may be documents in a word processor application or may be files that can be opened by a word processor application such as Word developed by Microsoft Corporation.
Smart document solutions may be created using many modern programming systems such as Microsoft Visual Basic™ 6.0, Microsoft Visual Basic .NET™, Microsoft Visual C#™.NET, Microsoft Visual J#™ or Microsoft Visual C++™ development systems. Creation of smart document solutions may be assisted by use of software development tools such as Visual Studio Tools for Office developed by Microsoft Corporation. Smart document solutions may be deployed over a corporate intranet, over the Internet, or through Web sites. Further descriptions and details for the creation of smart document solutions may be found in the book by Eric Carter and Eric Lippert entitled “Visual Studio Tools for Office: Using C# with Excel, Word, Outlook, and Infopath,” Addison Wesley Professional, Microsoft .NET Development Series, 2006.
A user may create a smart document solution as a dynamic linked library (DLL) or as an XML file. An example of the data-analysis template development cycle using the DLL approach may be as follows:
Including at least one data analysis part in the data-analysis parts container may performed in a variety of ways including but not limited to the following: include the data-analysis part in the data-analysis template; modify a data-analysis part included in the data-analysis template; and insert a data-analysis part into an empty data-analysis template.
The routine 200 continues from operation 220 to operation 230, wherein the method entails communicating the data-analysis parts container to a data-analysis processor 280 for generating a data-analysis results collection using the data-analysis parts container. The data-analysis results collection may comprise a collection of computer-readable objects, a collection of serialized objects such as disk files, or a combination of both. Initiating communication is illustrated in
Construction of program modules for communication of the data-analysis parts container with the data-analysis processor and for generation of the data-analysis results document may be aided by the use of an object-orient framework of cooperating components. Such a framework and its use is described in co-pending U.S. patent application entitled “Object-Oriented Framework for Data-Analysis Having Pluggable Platform Runtimes and Export Services,” the disclosure of which is incorporated herein, in its entirety.
The routine 200 continues from operation 230 to operation 240, wherein the method entails generating a data-analysis results document 290. In one embodiment, the data-analysis results document comprises a word processor document comprising information from the data-analysis template and information from the data-analysis results collection. In such an embodiment, a data-analysis results collection of objects is returned by the data-analysis processor and merged with presentation content from the word processor document to generate a data-analysis results document in accordance with the specifications of the data-analysis template. The routine 200 then ends.
Again referring to
A user may be able to store the data-analysis results document files resulting from “Export Document Contents” action 560 to an electronic document management system (EDMS). It should be understood that an embodiment of the present invention may serve as a knowledge management system for applications including but not limited to the following: an electronic laboratory notebook (ELN) system; an electronic data analysis notebook (EDAN) system; and a laboratory information management (LIMS) systems. An EDMS is a computer system or set of computer programs used to track and store electronic documents, like those in the embodiments of the present invention. An EDMS commonly provides storage, versioning, metadata, security, indexing, and retrieval capabilities. Also, an EDMS may provide workflow and collaboration capabilities. For example, if the word processor application used in an embodiment of the present invention is Word developed by Microsoft Corporation, a user may export and store word processing documents in a Document Workspace, a shared workspace which is part of Microsoft Windows SharePoint Services site. Within such a shared workspace a user may be provided with the following EDMS features: a central shared area for storing documents; automatic indexing; document check-in/check-out; automatic versioning of documents; and document status information including version, check-out status, and last modified date. In an analogous manner, a user may also be able to manage data-analysis templates in a document management system.
A user may also be permitted to edit data-analysis templates that are communicated to the word processor application. For example, a user may open a data-analysis template and modify its contents. A user may modify the text (for example the title and opening paragraph) and formatting (for example font size and styles) of the data-analysis template using the standard editing capabilities of the word processor application for standard templates. The user may modify the contents of the data-analysis parts container using an embodiment of the present invention. One possible embodiment of the present invention, which allowed a user to modify data-analysis parts, was illustrated in
A user may be able to create entirely new data-analysis templates. For example, a user may open a copy of a data-analysis template and insert new data-analysis parts in addition to standard static text and formatting.
In another embodiment of the invention, the word processor application may work entirely in the background without interaction with a user. In an illustrative example, the user may employ a custom application to initiate a data analysis request yet never see the word processor application. In such an embodiment, the custom application may employ the following method: select a data-analysis template; open the data-analysis template in the word processor application; include at least one data-analysis part in the data-analysis parts container by insertion or modification; communicate the data-analysis parts container to the data-analysis processor for generation of the data-analysis results collection; generate the data-analysis document comprising information from the data-analysis template and information from the data-analysis results collection; and return a word processor file to the user. The word processor application may work entirely in memory and processor of the computer and there may be no visual indication to the user that a word processor application was involved with communicating the document. In another embodiment, the word processor application may be replaced by a system capable of reading, writing and manipulating word processor documents but lacking a user interface. Said background operation may occur on a local machine or on a remote machine.
Although the forgoing text sets forth a detailed description of numerous different embodiments, it should be understood that the scope of the patent is defined by the words in the claims set forth at the end of the patent. The detailed description is to be construed as exemplary only and does not describe every possible embodiment because describing every possible embodiment would be impractical, if not impossible. Numerous alternative embodiments could be implemented, using either current technology or technology developed after filing date of this patent, which would still fall within the scope of the claims.
Thus, many modifications and variations may be made in the techniques and structures described and illustrated herein without departing from the spirit and scope of the present claims. Accordingly, it should be understood that the methods and apparatus described herein are illustrative only and not limiting upon the scope of the claims.
U.S. patent application Attorney Docket No. BLUEREF-001, filed on Jan. 3, 2007 and entitled “Method and Apparatus for Utilizing an Extensible Markup Language Data Structure For Defining a Data-Analysis Parts Container For Use in a Word Processor Application,” U.S. patent application Attorney Docket No. BLUEREF-002, filed on Jan. 3, 2007 and entitled “Method and Apparatus for Managing Data-Analysis Parts in a Word Processor Application,” and U.S. patent application Attorney Docket No. BLUEREF-003, filed on Jan. 3, 2007 and entitled “Object-Oriented Framework for Data-Analysis Having Pluggable Platform Runtimes and Export Services,” which are assigned to the same assignee as the present invention, are hereby incorporated, in their entirety, by reference.