The present invention relates generally to the field of spreadsheet applications. More particularly, the invention relates to the reorganization of data in a spreadsheet.
To survive in the information age, businesses and organizations require useful information about their organizations, markets, and operations. Useful business information is usually a condensed collection of data that is easy to understand and easy to absorb. To create useful information, businesses and organizations generally acquire vast collections of data that come from numerous and disparate sources then condense and organize the data into more useful forms and formats. Businesses and organizations usually rely on software packages to condense, manipulate, or formulate the data into useful information. The software packages make the manipulation of the vast data collections possible and efficient. One of the most common software packages used to derive more useful information from data collections is the spreadsheet application, such as the Microsoft® Excel spreadsheet application offered by Microsoft® Corporation.
Some spreadsheet applications allow some manipulation of data using database operations. Database operations, such as sort or filter, allow the user to format, condense, or organize data within a spreadsheet. Within spreadsheet applications, such as the Microsoft® Excel spreadsheet application, the database operations are only available to the user as a one-time operation. The database operation is generally accessed through a drop down menu in the spreadsheet application. The menu selection generally opens an operation user interface to step the user through the process of using the database operation. Unfortunately, the user can employ the operations only once, and the user is given only a few options to change the way the database operations function. To repeat the database operation, the user must again select the operation from the menu and enter the options again. Thus, the database operations are generally not automatically repeatable and are not flexible.
These problems are exacerbated when the user is a large organization that wants to manipulate large collections of data that originate from several, disparate sources. Users who require complex, table-based, database operations in a spreadsheet application often turn to manually manipulating the data or creating macro code to prepare the data. The first option is often time-consuming, expensive, and error-prone because the solution requires manual intervention every time the data changes. To automate this process, a user could employ macrocode, such as that created by Visual Basic for Applications® program. Unfortunately, the syntax and organization of macrocode is difficult for most users of spreadsheet applications. The macrocode can also be difficult to maintain. Most users cannot decipher the actual code, and thus, users cannot correct errors and cannot adjust the solution in the future.
In addition, the users often introduce new data to the spreadsheet on a recurring or periodic basis. For instance, monthly sales data is incorporated into spreadsheet applications every month. The users must manipulate the data monthly to create the useful information about business performance during the past month. To accomplish the manipulation of the monthly data, the users must either use the numerous database operations every month, which is extremely time consuming and wasteful, use the manual operations mentioned above, which the user must often change to work with the new data, or create the macro code mentioned above that may work with a new set of data but is difficult to create.
The present invention relates to a suite of database functions that can be used to create “database formulas” that can manipulate data in a spreadsheet. Each database function has a number of definable parameters or arguments, and the database functions can be combined with other database functions, other spreadsheet functions, and other operators to create a customizable output. Thus, users may enter numerous options that can cover the user's desire for output data in certain forms or formats.
In embodiments of the present invention, the input into each database function is a set of data. A set of data can be an array of cells occupying two or more rows, two or more columns, or two or more columns and two or more rows. In embodiments of the present invention, the output of each database function can be a set of data. Thus, each database function can create new tables of data that output “bulk” data, data that has more than one data value. The database functions allow the user to create sets of data that vary in organization and size depending on how the database function operates on the input data.
In one embodiment of the present invention, a method for manipulating a table of data in a spreadsheet application comprises receiving one or more database functions, receiving a selection for input data from within the table of data, manipulating the input data according to the one or more database functions, and presenting the output data in the spreadsheet. In some embodiments, two or more of the database functions are received in an embedded database formula.
The present invention may include inputs into a spreadsheet and outputs from the spreadsheet. In one embodiment, the one or more of the database functions have one or more arguments. In a further embodiment, one of the arguments has a semantic reference. In some embodiments, the user provides the one or more arguments. In further embodiments, the one or more of the arguments include a column token. In one embodiment, one or more of the arguments includes a mathematical expression.
In some embodiments, the input data is a table of data. In embodiments of the present invention, the output data is a table of data. In some embodiments, presenting the output data causes one or more items of pre-existing data to move. In some embodiments, the output data occupies two or more cells in the spreadsheet.
The present invention also includes exemplary embodiments of database functions. In one embodiment, the database functions include a filter function, a sort formula, a remove duplicates formula, a table append formula, or a add column formula. In some embodiments, the filter function comprises comparing at least one item of data to a filter argument, and returning output data without any item from the input data that meets a filter argument. In some embodiments, the sort function comprises returning an organized set of output data according to one or more sort arguments. In some embodiments, the duplicate removal function comprises determining that two items of data in the input data, designated in an argument, are the same, and returning output data, wherein one of the two items of like data are removed. In some embodiments, the table append function comprises retrieving two sets of input data, and returning output data, wherein at least a portion of the first set of input data is appended to at least a portion of the second set of input data. In some embodiments, the column add function comprises creating a column of data from the input data according to one or more arguments, and returning output data, wherein the output data includes the input data and the column of data.
In embodiments of the present invention, manipulating the input data comprises receiving one or more arguments with one or more of the database functions, parsing out the one or more arguments and the one or more database functions, determining if one or more arguments or one or more database functions could be applied row-by-row, if one or more arguments or one or more database functions could be applied row-by-row, applying those arguments or those database functions to each row separately, and applying any remaining arguments or database functions. In further embodiments of the present invention, a first intermediate set of data is created from applying the arguments and database functions to each row separately. In some embodiments, one or more subsequent intermediate sets of data are created because of each remaining argument or database function applied to the first intermediate set of data or to one or more of the subsequent intermediate sets of data.
In embodiments of the present invention, a user interface for displaying output data provides for receiving one or more database functions to apply to a set of input data, receiving a selection of input data, and displaying output data after the database function has been applied to the input data.
The invention may be implemented as a computer process, a computing system or as an article of manufacture such as a computer program product. The computer program product may be a computer storage medium readable by a computer system and encoding a computer program of instructions for executing a computer process. The computer program product may also be a propagated signal on a carrier readable by a computing system and encoding a computer program of instructions for executing a computer process.
A more complete appreciation of the present invention and its improvements can be obtained by reference to the accompanying drawings, which are briefly summarized below, and to the following detailed description of presently exemplary embodiments of the invention, and to the appended claims.
The present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which embodiments of the invention are shown. This invention, however, may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the scope of the invention to those skilled in the art.
In general, the present invention relates to embodiments of methods of applying database functions in spreadsheet applications. In accordance with embodiments of the present invention, the methods described herein may be executed as a set of computer instructions read and performed on a single, stand-alone computer system.
The present invention may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically the functionality of the program modules may be combined or distributed as desired in various embodiments.
An exemplary spreadsheet application having aspects of the present invention is shown in
An exemplary spreadsheet 100 is shown in
A database formula having one or more database functions can be entered into the spreadsheet 100. In embodiments of the present invention, a database function is a function in a spreadsheet application that can retrieve an input set of data, manipulate that input data into another form or format, and produce an output set of data. A range of data or set of data is one or more items of data that occupy one or more cells of a spreadsheet. The database functions change the appearance or organization of the data. In general, the database functions reorganize the input data, eliminate items of data within the input data, or add data to the input data. In embodiments of the present invention, some database functions comprise sort, filter, table append, remove duplicates, and add a column. These particular embodiments will be explained in grater detail below, but the invention is not limited to these specific embodiments. In some embodiments of the present invention, the database functions are repeatable. In other words, as the input data changes, the database functions can be automatically rerun to provide updated output data.
In some embodiments, the input data need not have a set size but can change and update, and the database functions will adjust to the changes. In one embodiment, the database function can employ semantic designations. Semantically designating or a semantic designation refers to a particular way of selecting a set of data. Instead of giving a cell, row, or column reference, the user may use some type of language descriptor that the spreadsheet may use to find the spreadsheet data the user wants to select. For instance, if column C included all the ages of a group of children, the user could use a semantic designation to column C in an expression, like “[age]=3.” The semantic designation, “age,” is the column token in the argument. In embodiments of the present invention, a column token is a kind of argument that refers to a particular column. In the example above, the term “age” is the column token. The column token may be phrased in any manner that allows the spreadsheet to recognize a certain column and apply a formula or argument to that column. Semantic designations are explained in more detail in related U.S. patent application, “Method and Implementation for referencing dynamic data in spreadsheet applications,” filed on Sep. 30, 2004, which is hereby incorporated by reference in its entirety.
The output of the database functions may be a set of data that can occupy one or more cells. Output data is the final set of data created from one or more applied database functions. In other words, after all database functions have been applied to the input data, the output data forms the final set of data. In one embodiment, the user or spreadsheet application does not need to define the size of the output data, the range of cells the output data will occupy, but the size of the output data is unbounded and can be completely reflective of the input and the effect of the database functions on that input data.
Each database function may have one or more customizable arguments or parameters that can adjust or adapt the execution of the database function. In embodiments of the present invention, an argument or arguments refers to the one or more expressions or modifiers to a database function. The arguments may adjust or adapt the formula to execute in a certain manner. Arguments can be in algebraic, Boolean, or other format. In one embodiment, an argument may refer to a cell. In embodiments of the present invention, an expression is a mathematical or logical symbol or a meaningful combination of symbols. In this description, an expression may refer to either a database function or an argument within the database formula.
In embodiments of the present invention, a database formula is a formula that makes use of one or more database functions to perform a series of database operations in a single formula definition. In embodiments of the present invention, an embedded database formulas or nested database formulas refers to the ability to place one or more database functions inside the syntax of a database formula. For example, a database formula may have the expression: “SORT(table 1(FILTER)).” In the example of a database formula, the FILTER function is embedded within the SORT function. Embedding functions allows the user to create complicated expressions that manipulate the data in several ways without creating separate formulas.
An exemplary database formula 102 having a database function is shown in cell G5112 of
Referring to
In some embodiments, the process of manipulating data with the database function may end once the output data is displayed. In other embodiments, the output data from one database formula, having one or more database functions, may form the input for other spreadsheet functions or other database formulas, having one or more database functions. In some embodiments, the database formula or database function may be rerun repeatedly if the selected input data changes. Thus, the database formula or function reapplies and recreates the output data after any change to the designated input data. In essence, database functions can operate like other spreadsheet functions that can be recalculated upon a change in input data. The reapplication of the database functions and formulas is unlike the single application of database operations. Reapplication of the database formulas and database functions is described more fully above.
A suitable operating environment, in which the invention may be implemented, is depicted in
With reference to
Device 200 may also contain communication connection(s) 212 that allow the device 200 to communicate with other devices. Communication connection(s) 212 is an example of communication media. Communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media, such as a wired network or direct-wired connection, and wireless media, such as acoustic, RF, infrared, and other wireless media.
Device 200 may also have input device(s) 214 such as keyboard, mouse, pen, voice input device, touch input device, etc. Output device(s) 216 such as a display, speakers, printer, etc. may also be included. All these devices are well know in the art and need not be discussed at length here.
Computing device 200 typically includes at least some form of computer readable media. Computer readable media can be any available media that can be accessed by processing unit 202. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Combinations of the any of the above should also be included within the scope of computer readable media. The term computer readable media as used herein includes both storage media and communication media.
A functional diagram of a spreadsheet application 300 having aspects of the present invention is shown in
The spreadsheet application 300 can accept database functions 314, whether in a database formula 312 or separate from the database formula 312, and perform database operations according to the database functions 314. In other words, the user can select or provide a database function 314 that manipulates the input data's form, format, organization, structure, or content. In some embodiments, the database functions 314 can be applied to an array of input data 304 including whole tables and can provide output data 310 as a separate array of data. In other embodiments, the database functions 314 are dynamic, in that the database functions 314 will rerun when the underlying input data 304 changes. Thus, the user need not manually reapply the database functions 314, but the database functions 314 will continually generate output data 310 from any ever-changing source of input data 304.
An embodiment of a method 400 for applying one or more database functions (such as database function 314) to a set of input data (such as input data 304) is shown in
Next, an accept operation 404 accepts the user's entry of the database function. In embodiments of the present invention, a spreadsheet module can store the database function into the cell where the user entered the database function. In other embodiments, the spreadsheet module can store a database formula (such as database formula 312) containing a database function into a cell where the user entered the database formula.
Next, an accept operation 406 accepts one or more arguments. In embodiments of the present invention, an argument is an expression that adjusts or adapts the execution of the database function. In one embodiment, the arguments may be part of a larger database formula. In other embodiments, the arguments may be part of the database function syntax. The spreadsheet module can store the arguments with the database function.
Next, an accept operation 408 accepts a selection of a set of input data. The input data may be one or more cells of data. In one embodiment, the user may input a cell range for the input data. In another embodiment, the user may employ a semantic designation to refer to a set of input data. In still other embodiments, the user may employ a mouse or other graphical user interface input device to highlight and select the input data. The spreadsheet module can store this input data selection with the database function. Any correction needed to the database function, arguments, or input data selection may be prompted by the spreadsheet module and made by the user. In some embodiments, the user may enter the database function, input data selection, and one or more arguments as a single expression.
Next, manipulate operation manipulates the input data according to the one or more database functions. In embodiments of the present invention, the spreadsheet module executes the database functions, with the arguments, on the selected input data. The spreadsheet module manipulates the input data according to the type of database function and the accompanying arguments. As explained, manipulating the input data may mean any type of change to the organization or format of the input data. In one embodiment, a first database function or argument (meaning the first listed in any embedded database formula or first argument in a series of arguments) is executed on the input data. Then, the spreadsheet module proceeds through each successive function or argument in series order. In some embodiments, the spreadsheet outputs a set of intermediate data (such as intermediate set of data 316) after the application of each function or argument. The final function or argument creates the output data (such as output data 310).
For example, if a FILTER function and a sort formula are applied to the same table of data, the spreadsheet can first create an intermediate set of data, such as a second table of data after the filter operation but before the sort operation. The spreadsheet may store this intermediate set of data. Then, the spreadsheet may apply the sort formula to the intermediate set of data to create the output data. Any data created and/or stored between sequential or subsequent executions of formulas or expressions is an intermediate set of data. A more detailed description of an embodiment of a method for manipulating data with one or more database functions is described with reference to
Next, a send operation 412 sends the output data to the spreadsheet. The output data may be a new table of data or set of data. In embodiments of the present invention, the spreadsheet module forms the output data. In one embodiment, the spreadsheet module produces the table with column and row headers. In one embodiment, the spreadsheet module may name the output data.
In one embodiment, the spreadsheet module determines the size and configuration of the output data. The spreadsheet module may determine how many rows and how many columns comprise the output data. Then, the spreadsheet module may determine if enough space exists in the spreadsheet to place the output data into the spreadsheet. In one embodiment, the spreadsheet module may find the cell where the database function or formula is anchored. The spreadsheet module may then check the rows and columns around the anchor cell for any existing data. If existing data exists, the existing data may be shifted to make room for the output data.
Next, a present operation 414 presents the output data. In embodiments of the present invention, the spreadsheet module displays the output data in the spreadsheet. The output data may form one cell of data, a range of data in a column or row, a table of data, or any other arrangement of data in any arrangement of cells or spreadsheets.
A further embodiment of a method 500 for manipulating input data (such as input data 304) with one or more database functions (such as database function 314) is shown in
In some embodiments, the embedded database formula can require the spreadsheet to parse the different expression, whether those expressions are database functions or arguments of the database functions. The database formulas, whether embedded or independent, may have one or more arguments. In one embodiment, the spreadsheet module can treat each expression as a separate database operation. Thus, each expression is separated and evaluated individually. The collection of database operations may be applied to the input data in any order and at any time in the process. Thus, rather than execute the expressions in series order, the spreadsheet module may change the order and/or the method of application of the database operations to optimize the execution of the database formula. In some embodiments, the parsed expressions may be stored for later access.
Next, a retrieve operation 504 retrieves a parsed expression. In some embodiments, the spreadsheet module can retrieve one of the stored parsed expressions. The spreadsheet module can access memory to call the expression and provide the expression for further evaluation. By retrieving the expressions one at a time, the spreadsheet module can reorder the execution of the database functions and arguments.
Next, determine operation 506 determines the class of the expression. In embodiments of the present invention, the expression may have one of two classes. First, the expression may be applied to a single row of data or be applied row-by-row. Second, the expression may be applied to a “whole table” of data and not to a single row. Certain database operations can be operated on a single row. For example, a filter operation that looks at a data value in one cell can be applied row-by-row. In contrast, certain database operations can be applied only to entire or “whole” tables of data. For example, a sort operation can be applied only to a whole table of data. Thus, in embodiments of the present invention, the spreadsheet module checks the expressions against a set of expressions that the spreadsheet module knows can be applied to single rows of data. If the expression is in the list, that expression may be applied immediately.
If the expression can be applied row-by-row, then separate operation 508 separates the row-by-row expressions. In one embodiment, the spreadsheet module separates out the expressions and stores the expressions as a separate group. In some embodiments, rather than store the expressions, the expressions are immediately applied in series order to each row of the input data. In other embodiments, the spreadsheet module stores every row-by-row expression for further evaluation before applying the expressions. Therefore, the spreadsheet module may separate out the expressions for later processing.
Next, determine operation 510 determines if there are other expressions to review. In embodiments of the present invention, the spreadsheet module may determine if all expressions have been reviewed for class, row-by-row or “whole table.” If an expression has not been reviewed the process 500 returns to retrieve operation 504. However, if there are no other expressions needing review, the process 500 proceeds to operation 512.
Next, an optional order operation 512 orders the row-by-row expressions. In embodiments of the present invention, the spreadsheet module may order the expressions rather than apply the row-by-row expressions immediately. Ordering may include placing the expressions into an organization that is most efficient for processing. In one embodiment, the order may be from the most restrictive to the least restrictive expression. For example, the expression “C=5” is more restrictive than the expression “C>5” because the first expression has only one possibility, a value equal 5, and the second expression has numerous possibilities, all values above 5. Ordering the expressions is more efficient because applying the more restrictive expressions first can eliminate the row from further processing before applying the less restrictive expressions.
Next, evaluate operation 514 evaluates each row by all the ordered, row-by-row expressions. In embodiments of the present invention, the spreadsheet module applies the parsed, separated, ordered, row-by-row expressions, as a set, to each row individually. The spreadsheet module can evaluate each row either serially or at random. If any expression applies to the row, the data within that row may be eliminated, if necessary. Any expression that has yet to be applied to the eliminated row need not be applied, and the spreadsheet module can move to the next row for evaluation of the expressions. Thus, the row-by-row expressions can operate like an array formula rather than a table formula. While the present embodiment applies the expression row-by-row, it may also be possible to apply expressions column-by-column. Thus, in some embodiments, the separating, ordering, and evaluating may be completed for column-by-column expressions. After the spreadsheet module applies the row-by-row expressions to each row, the spreadsheet module creates a first intermediate data set 516 (such as intermediate set of data 316).
If the determine operation 506 determines the parsed expression is not a row-by-row expression, the process 500 may proceed to optional order operation 518. Order operation 518 may order the “whole table” expressions rather than apply the “whole table” expressions immediately. In some embodiments, the “whole table” expressions can be ordered from most restrictive to least restrictive, as explained above. Other ordering methodologies are completed for both row-by-row expressions and “whole table” expressions because processing of the expressions may be more efficient or effective if some expressions follow other expressions. For example, processing may be more effective if a sort precedes a “take the top ten” operation.
Next, an apply operation 520 applies the “whole table” expressions. In one embodiment, the spreadsheet module retrieves, from memory, the first intermediate data set created by the row-by-row expressions. Then, the spreadsheet module applies the first “whole table” expression. The spreadsheet module creates a subsequent set of intermediate data 522 and stores the subsequent set of intermediate data 522.
Next, determine operation 524 determines if other “whole table” expressions need to be applied. If another “whole table” expression has yet to be applied to a subsequent intermediate data set, the process 500 returns to apply operation 520. If no other “whole table” expression needs to be applied, the last set of subsequent intermediate data 522 stored in memory becomes the output data (such as output data 310).
Referring to receive operation 526, if the underlying input data changes, receive operation 526 receives the change. In one embodiment, a database formula or function may have already been entered within a spreadsheet. The pre-existing database formula or function has a selected set of input data to which the expressions are applied. The stored selection of the set of input data represents “dependencies” of the database formula or database function. A user may change some part of the “dependent” cells that were designated as inputs to a database formula or function. The change to the input data may come from editing the input data, adding to the input data, refreshing the input data, or other actions. If the input data is changed, the process 500 of applying the expressions may start at block 502.
In some embodiments, the spreadsheet module receives the change. When the change effects the input data in the dependent cells, the change results in process 400 being rerun starting at manipulate operation 410. In one embodiment, the database functions are re-evaluated or rerun to return a correct output data set using the changed input data. Thus, database formulas and database functions, unlike database operations, can be recalculated.
Several exemplary database formulas containing database functions are presented in
The FILTER database function 602 can remove items of data from a spreadsheet according to one or more arguments. As shown in
In embodiments of the present invention, the exemplary FILTER function has a structure and syntax as follows: “FILTER(Table, Filter Expression 1, [Filter Expression Expression2], . . . ).” The FILTER database function 602 returns all items (rows) from a set of data that meet all filter conditions as embodied by the arguments. In embodiments of the present invention, the arguments can be algebraic, mathematical, Boolean, or other types of expressions that may contain one or more references to sets of data within the input data. The arguments may be applied, and the FILTER database function 602 returns a list of rows that meet all the arguments. Only items (rows) where all arguments are true are included as a part of the output data.
For example, an exemplary table, shown below, may be called “MyTable.” The table follows:
The FILTER function may be entered into the spreadsheet as the following expressions: =FILTER(MyTable, [Bar]>1, [Foo]=“C”)
The result of this FILTER function will be the following table:
The FILTER function used semantic references for column tokens, for instance, the FILTER function used the name of the columns “Foo” and “Bar” and the name of the table, “MyTable,” which is a named range for the entire table.
Another example of a database function is the SORT function 604, as shown in
Another example of a database function may be the REMOVE DUPLICATES function 606 shown in
As an example, the selected data, shown below, is called “MyTable”:
The example REMOVE DUPLICATES function 606 is shown below: =REMOVEDUPLICATES(MyTable, [Foo], [Bar]).
The REMOVE DUPLICATES function 606 returns a set of output data as follows:
Still another example of a database function is the TABLE APPEND function 608 shown in
A final example of a database function may be the ADD COLUMN function 610 shown in
Number | Name | Date | Kind |
---|---|---|---|
5557787 | Shin et al. | Sep 1996 | A |
5603021 | Spencer et al. | Feb 1997 | A |
5842180 | Khanna et al. | Nov 1998 | A |
5890174 | Khanna | Mar 1999 | A |
6341292 | Cho et al. | Jan 2002 | B1 |
6460059 | Wisniewski | Oct 2002 | B1 |
6510420 | Cessna et al. | Jan 2003 | B1 |
6631497 | Jamshidi et al. | Oct 2003 | B1 |
6738770 | Gorman | May 2004 | B2 |
6754677 | Cho et al. | Jun 2004 | B1 |
6757867 | Bauchot et al. | Jun 2004 | B2 |
7003504 | Angus et al. | Feb 2006 | B1 |
7058631 | Pal | Jun 2006 | B1 |
7082569 | Voshell | Jul 2006 | B2 |
7089261 | Hladik, Jr. | Aug 2006 | B2 |
7139745 | Nakamura et al. | Nov 2006 | B2 |
7191184 | Laborde et al. | Mar 2007 | B2 |
7266763 | Peyton-Jones et al. | Sep 2007 | B2 |
20020143780 | Gorman | Oct 2002 | A1 |
20020169799 | Voshell | Nov 2002 | A1 |
Number | Date | Country | |
---|---|---|---|
20060075328 A1 | Apr 2006 | US |