TECHNICAL FIELD
The present invention is directed to the field of software application customization, and, more particularly, to the field of user-customization of software applications.
BACKGROUND
Spreadsheet applications like MICROSOFT EXCEL permit their users to perform mathematical calculations. In particular, spreadsheet applications enable users to create worksheet documents that are divided into cells, which are typically arranged in rows and columns. The user can specify contents for any cell in a worksheet. Where a cell contains a mathematical formula, the spreadsheet application attempts to evaluate the mathematical formula to obtain a result for the cell.
As one example, the user may specify that a cell contains a mathematical formula that is a single constant, which evaluates to itself:
A1: 5=5 (1)
The user may also specify that a cell contains a mathematical formula that is comprised of one or more operands having constants as their operands:
A2: 40/5=8 (2)
The user may also specify that a cell contains a mathematical formula that references the results of one or more other cells:
A3: A1+A2=5+8=13 (3)
The user may also specify that a cell contains a mathematical formula that includes one or more built-in functions, such as the square root (SQRT) function:
A2: SQRT (9)=3 (4)
Typically, a set of built-in functions is defined for a spreadsheet application by the developers of the spreadsheet application. Users of the spreadsheet application can use any of these built-in functions in a worksheet. In some spreadsheet applications, users cannot define their own built-in functions. Other spreadsheet applications permit users to define their own built-in functions, but only in ways that require significant familiarity with computer languages and, in some cases, other sophisticated software development tools. In these spreadsheet applications, a user defines his or her own built-in function not by editing a spreadsheet, but rather by creating a separate programmatic entity that is called by the spreadsheet program.
A cell reference included in an formula contained in a cell may either be an absolute reference or a relative reference. If the cell reference is absolute, when the formula is copied from its source cell to a target cell, the referenced cell is the same as the same in the target cell as in the source cell. For example, if the formula below for cell B1, which includes an absolute reference to cell A1 (“$A$1”), is copied to cell B2, cell B2 afterward has the same contents and result as cell B1:
B1: $A$1+10=15
B2: $A$1+10=15 (5)
On the other hand, if the cell reference is relative, when the formula is copied, the referenced cell in the target cell bears the same relationship to the target cell that the referenced cell in the source cell bears to the source cell. For example if the formula below for cell C1, which includes a relative reference to cell A1 (“A1”), is copied to cell C2, cell C2 afterward has the contents and result shown below:
C1: A1+10=15
C2: A2+10=18 (6)
It can be seen that, when the contents of cell C1 were copied to cell C2, the relative reference to cell A1 in cell C1 was changed to a reference to cell A2 in cell C2.
Cell formulas are commonly copied in this manner to perform the same set of operations on each of a number of different sets of input data. For example, in a spreadsheet where each of 500 rows corresponds to a different shipment of items, the first column may contain the total weight of the shipment, and the second column may contain the number of items in the shipment. To calculate the average weight per item for each shipment, the user may, at the cell in the second row and third column, specify an formula that is the truncated quotient of the first and second columns:
C2: TRUNC (A2/B2, 0) (7)
The user can then copy that formula, containing relative references to cells A2 and B2, to each of the other cells in the third column:
C3: TRUNC (A3/B3, 0)
C501: TRUNC (A501/B501, 0) (8)
FIG. 1A is a spreadsheet diagram showing a portion 100 of such a spreadsheet with the formulas contained by each cell displayed. FIG. 1B is a spreadsheet diagram showing a portion 150 of the same spreadsheet with the results of the formulas contained by each cell displayed.
If the user later decides that the quotients contained in the third column should be rounded rather than truncated, the user must first modify the contents of row C2 as follows:
C2: ROUND (A2/B2, 0) (9)
The user must then re-copy these new contents of cell C2 to the other cells in the third column as follows:
C3: ROUND (A3/B3, 0)
C501: ROUND (A501/B501, 0) (10)
FIG. 2A is a spreadsheet diagram showing a portion 200 of such a spreadsheet with the formulas contained by each cell displayed. FIG. 2B is a spreadsheet diagram showing a portion 250 of such a spreadsheet with the results of the formulas contained by each cell displayed.
The process outlined above can in many situations require a substantial amount of manual effort. Throughout this process, there is a risk that the user will make an error, compromising the accuracy of portions of the spreadsheet.
In view of the foregoing, a spreadsheet application that permitted a user to specify and use a user-defined function using native formula syntax and formula evaluation, and update the behavior of the function everywhere the function is used with a single modification, would have significant utility.
BRIEF DESCRIPTION OF THE DRAWINGS
FIGS. 1A-2B are spreadsheet documents showing a conventional approach to performing the same set of operations on each of a number of different sets of input data in the spreadsheet.
FIG. 3 is a block diagram illustrating an example of a suitable computing system environment in which the facility may be implemented.
FIGS. 4A-5B are spreadsheet diagrams depicting the use of the facility to prepare a spreadsheet similar to the one shown in FIGS. 1A-2B.
FIG. 6 is a flow diagram showing steps typically performed by the facility when the function F is called.
FIG. 7 is a stack diagram showing the state of the evaluation context stack after the facility performs step 603 for the invocation of function F.
FIG. 8 is a flow diagram showing steps typically performed by the facility each time the function R is called.
FIGS. 9A-9B show a second example, in which the arguments X and Y are passed to a first user-defined function, which in turns calls a second user-defined function to which it passes a separate parameter X.
FIGS. 10A-10B are stack diagrams showing the contents of the evaluation context stack at different points in the second example.
FIGS. 11A-11B are spreadsheet diagrams showing a third example.
FIG. 12 is a stack diagram showing the contents of the evaluation context stack during the third example.
DETAILED DESCRIPTION
A software facility that enables users to specify user-defined functions for use in a spreadsheet using native formula syntax and formula evaluation (“the facility”) is described. In some embodiments, the facility adds to a set of built-in functions shipped with conventional spreadsheet application two additional functions: (1) a built-in function called from a spreadsheet cell to invoke a user-defined function and optionally pass arguments to the user-defined function—in some embodiments named F—and (2) a built-in function called by a user-defined function to retrieve arguments passed to the user-defined function—in some embodiments named R.
A user specifies a user-defined function by entering an formula for the function into a spreadsheet cell. This cell is referred to herein as the function definition cell. As part of the formula, the user includes calls to R to retrieve the result of each argument needed to evaluate the function. For example, in the formula that follows, R is invoked to retrieve results for the two arguments of the user-defined function, total_weight and number_items. These arguments are divided, and the result is truncated to zero decimal places:
A1: TRUNC (R(“total_weight”)/R(“number_items”), 0) (11)
A user calls the user-defined function by entering a call to F into a spreadsheet cell. This cell is referred to herein as the function calling cell. A call to F passes in (a) a reference to the function definition cell for the user-defined function being called and (b) zero or more name/result pairs each corresponding to an argument of the user-defined function. For example, in the formula that follows, F is called to evaluate the user-defined function defined in cell A1, with the argument name/result pairs (total_weight, A4) and (number_items, B4):
C4: F(A1, “total_weight”, A4, “number_items”, B4) (12)
Some spreadsheet applications enable users to name individual cells. If, for example, the user named cell A1 “AverageWeight”, the function calling cell could instead contain the following argument, in which the cell name “AverageWeight” is substituted for the column/row cell reference A1:
C4: F(AverageWeight, “total_weight”, A4, “number_items”, B4) (13)
In some embodiments, the facility stores user-defined function argument results by name on a stack, called an evaluation context stack. The arguments are said to be stored on the stack in argument “registers.”
In some embodiments, the facility provides scoping services for the arguments of the user-defined function in connection with the evaluation context stack. These scoping services (1) ensure that any local references in the function definition are translated into references that are global enough to identify the referenced cell from within the calling cell's evaluation context; (2) ensure that the most recently-passed result for a given argument name is used in the function's evaluation; and/or (3) where (a) a first function is defined to call a second function, and (b) no result for a given argument name is passed from the first function to the second function, enable the second function to use a result for that argument name passed to the first function, or an ancestor in the chain of functions that called the first function, known as “dynamic scoping.” In some embodiments, however, the facility instead implements static scoping, in which a register will only be found if it was passed in the evaluation context for the current function call.
By enabling users to specify user-defined functions for use in a spreadsheet in some or all of the manners outlined above, the facility permits users to more efficiently and straightforwardly create, maintain, and exploit reusable formulas.
FIG. 3 is a block diagram illustrating an example of a suitable computing system environment 300 in which the facility may be implemented. The computing system environment 300 is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the facility. Neither should the computing environment 300 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment 300.
The facility is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the facility include, but are not limited to: personal computers, server computers, hand-held or laptop devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
The facility may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, and so forth, which perform particular tasks or implement particular abstract data types. The facility may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in local and/or remote computer storage media including memory storage devices.
With reference to FIG. 3, an exemplary system for implementing the facility includes a general purpose computing device in the form of a computer 310. Components of the computer 310 may include, but are not limited to, a processing unit 320, a system memory 330, and a system bus 321 that couples various system components including the system memory to the processing unit 320. The system bus 321 may be any of several types of bus structures including a memory bus or memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus also known as Mezzanine bus.
The computer 310 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by the computer 310 and includes both volatile and nonvolatile media, and removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by the computer 310. Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer-readable media.
The system memory 330 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 331 and random access memory (RAM) 332. A basic input/output system 333 (BIOS), containing the basic routines that help to transfer information between elements within computer 310, such as during start-up, is typically stored in ROM 331. RAM 332 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 320. By way of example, and not limitation, FIG. 3 illustrates operating system 334, application programs 335, other program modules 336 and program data 337.
The computer 310 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only, FIG. 3 illustrates a hard disk drive 341 that reads from or writes to non-removable, nonvolatile magnetic media, a magnetic disk drive 351 that reads from or writes to a removable, nonvolatile magnetic disk 352, and an optical disk drive 355 that reads from or writes to a removable, nonvolatile optical disk 356 such as a CD ROM or other optical media. Other removable/non-removable, volatile/nonvolatile computer storage media that can be used in the exemplary operating environment include, but are not limited to, magnetic tape cassettes, flash memory cards, digital versatile disks, digital video tape, solid state RAM, solid state ROM, and the like. The hard disk drive 341 is typically connected to the system bus 321 through a non-removable memory interface such as interface 340, and magnetic disk drive 351 and optical disk drive 355 are typically connected to the system bus 321 by a removable memory interface, such as interface 350.
The drives and their associated computer storage media, discussed above and illustrated in FIG. 3, provide storage of computer-readable instructions, data structures, program modules and other data for the computer 310. In FIG. 3, for example, hard disk drive 341 is illustrated as storing operating system 344, application programs 345, other program modules 346 and program data 347. Note that these components can either be the same as or different from operating system 334, application programs 335, other program modules 336, and program data 337. Operating system 344, application programs 345, other program modules 346, and program data 347 are given different numbers herein to illustrate that, at a minimum, they are different copies. A user may enter commands and information into the computer 310 through input devices such as a tablet, or electronic digitizer, 364, a microphone 363, a keyboard 362 and pointing device 361, commonly referred to as mouse, trackball or touch pad. Other input devices not shown in FIG. 3 may include a joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 320 through a user input interface 360 that is coupled to the system bus, but may be connected by other interface and bus structures, such as a parallel port, game port or a universal serial bus (USB). A monitor 391 or other type of display device is also connected to the system bus 321 via an interface, such as a video interface 390. The monitor 391 may also be integrated with a touch-screen panel or the like. Note that the monitor and/or touch screen panel can be physically coupled to a housing in which the computing device 310 is incorporated, such as in a tablet-type personal computer. In addition, computers such as the computing device 310 may also include other peripheral output devices such as speakers 395 and printer 396, which may be connected through an output peripheral interface 394 or the like.
The computer 310 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 380. The remote computer 380 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 310, although only a memory storage device 381 has been illustrated in FIG. 3. The logical connections depicted in FIG. 3 include a local area network (LAN) 371 and a wide area network (WAN) 373, but may also include other networks. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet. For example, in the present facility, the computer system 310 may comprise source machine from which data is being migrated, and the remote computer 380 may comprise the destination machine. Note however that source and destination machines need not be connected by a network or any other means, but instead, data may be migrated via any media capable of being written by the source platform and read by the destination platform or platforms.
When used in a LAN networking environment, the computer 310 is connected to the LAN 371 through a network interface or adapter 370. When used in a WAN networking environment, the computer 310 typically includes a modem 372 or other means for establishing communications over the WAN 373, such as the Internet. The modem 372, which may be internal or external, may be connected to the system bus 321 via the user input interface 360 or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 310, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation, FIG. 3 illustrates remote application programs 385 as residing on memory device 381. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
While various functionalities and data are shown in FIG. 3 as residing on particular computer systems that are arranged in a particular way, those skilled in the art will appreciate that such functionalities and data may be distributed in various other ways across computer systems in different arrangements. While computer systems configured as described above are typically used to support the operation of the facility, one of ordinary skill in the art will appreciate that the facility may be implemented using devices of various types and configurations, and having various components.
FIGS. 4A-5B are spreadsheet diagrams depicting the use of the facility to prepare a spreadsheet similar to the one shown in FIGS. 1A-2B. FIG. 4A shows a spreadsheet portion 400 in the first state with the formulas, or “formulas,” of each cell displayed. It can be seen that cell A1 contains the function definition shown on line (11) above. It can further be seen that cells C4-C6 and spreadsheet portion 400 include calls to the user-defined function in cell A1 similar to the one shown on line (12) above.
FIG. 4B shows the same spreadsheet portion 450 with the results of the formulas contained by each cell displayed. For example, it can be seen that the formula in cell 4C evaluated to the result 11, truncated from 11.{overscore (6)}, the quotient of 35 over 3. It can also be seen in FIG. 4B that the function definition cell A1 evaluates to the result #N/A since it contains calls to the R function that are executed with empty evaluation contexts that do not contain the registered names passed in the calls to the R function. In some embodiments, where the R function is implemented in a way that returns 0 for register names not found in the evaluation context function definition cell A1 would contain the error message #DIV/0! Rather than the error message #N/A, as it would evaluate to TRUNC (0/0, 0). In some embodiments, the R function is implemented to take a second, optional argument that specifies a default value to be returned by the R function if the registered name specified in the R function call is not found in the evaluation context.
FIG. 5A shows the same spreadsheet portion 500 after the function definition in cell A1 has been changed to the formula below.
ROUND (R(“total_weight”)/R(“number_items”), 0) (14)
Because cells C4-C6 all refer to the function definition in cell A1, none of these needs to be changed in order to take advantage of the revised function definition shown in FIG. 5A.
FIG. 5B shows the same spreadsheet portion 550 with the results of the formulas contained by each cell displayed. It can be seen, for example, that cell C4 now has the result 12, obtained by rounding the quotient 11.{overscore (6)} to 12.
FIG. 6 is a flow diagram showing steps typically performed by the facility when the function F is called. In step 600, the function F is called. Each such call passes, at a minimum, a cellReference parameter that is a reference to a cell containing the user-defined function to be evaluated. Each call to the function F can further include one or more name/result pairs for arguments defined for the function to be evaluated. In step 601, the facility retrieves the formula for the user-defined function from the function definition cell identified by the cellReference parameter. For example, for the invocation of the function F shown in line (12) above, the facility retrieves the formula from cell A1. In step 602, the facility translates any local references contained by the retrieved formula into global references that can be used in the function evaluation context. In step 603, the facility pushes each pair of registeredName and registeredResult parameters onto the evaluation context stack as a new stack entity.
FIG. 7 is a stack diagram showing the state of the evaluation context stack after the facility performs step 603 for the invocation of function F shown above on line (12). It can be seen that the top entry on stack 700 contains the result 3, retrieved from cell B4 shown in FIG. 5B, for a register name “number_items.” The stack also contains a result 35, retrieved from cell A4 shown in FIG. 5B, for a register name “total_weight”.
Returning to FIG. 6, in step 604, the facility evaluates the formula against the evaluation context established with the evaluation context stack. For each reference to the function R encountered while evaluating the formula, the facility typically performs the steps shown in FIG. 8, discussed below. In step 605, the facility pops the stack entry for each passed registerName and registerValue pair off of the evaluation context stack. In the example, after the facility performs step 605, the evaluation context stack is empty. In step 606, the facility returns the result obtained by evaluating the formula.
FIG. 8 is a flow diagram showing steps typically performed by the facility each time the function R is called. In step 800, the function R is called with a single parameter, registerName. In steps 801-804, the facility loops through each stack entry in the evaluation context stack, starting at the top of the stack and progressing downward in the stack. In step 802, if the registerName passed as a parameter to the function R matches the registerName of the current stack entry, then the facility continues in step 803 to return the registerValue in the current stack entry, else the facility continues in step 804. In step 804, if additional stack entries remain to be processed, then the facility continues in step 801 to process the next stack entry, else the facility continues in step 805. In step 805, the facility returns 0.
FIG. 8 shows a linear search through a stack for the matching register. In some embodiments, the facility uses a potentially more efficient search through a tree of registers, such as a balanced, binary red-black tree of registers.
FIG. 8 shows the application of dynamic scoping. In some embodiments, however, the facility uses static scoping, and this search in the R function is only performed for arguments pushed onto the stack for the current function evaluation.
FIGS. 9A-9B show a second example, in which the arguments X and Y are passed to a first user-defined function, which in turns calls a second user-defined function to which it passes a separate parameter X. FIG. 9A shows a spreadsheet portion 900 containing these two functions and a call to the first with the formulas contained by each cell displayed. It can be seen that cell A5 calls a user-defined function whose definition is in cell A2, and that the function definition in cell A2 calls a user-defined function whose definition is in cell A1.
FIG. 10A is a stack diagram showing the contents of the evaluation context stack after the function F is called in cell A5. It can be seen that, as specified in this call to function F, the stack contains entries reflecting the result 3 for a register X and reflecting the result 4 for a register Y.
FIG. 10B is a stack diagram showing the contents of the stack 1050 after the call to function F in cell A2. It can be seen that, in accordance with the call to function F in cell A2, the top entry on the stack now contains the result 25 for the register X.
When the function R is called in cell A2 to return the result of register X, it returns the result for the top-most occurrence of register X, the result 25 in stack entry number 1. Accordingly, it can be seen that the function R returns the result most recently passed as a parameter having a matching name. The indication of the function F in cell A2 returns the result 5, which is in turn returned by the invocation of the function F in cell A5. FIG. 9B, which shows the results of the formulas contained by each cell, shows that the formula contained by cell A5 evaluates to the result 5.
FIGS. 11A-11B are spreadsheet diagrams showing a third example.
FIG. 11A shows the formulas contained in each cell. It can be seen that cell A5 calls the user-defined function defined in cell A2, passing it the result 1 for an X parameter.
FIG. 12 is a stack diagram showing the evaluation context stack 1200 after the call to the user-defined function defined in cell A2 is made in cell A5. It shows that a register named X has the result 1. Returning to FIG. 11A, when the user-defined function defined in cell A2 is called, it uses the R function to retrieve the result 1 from the evaluation context stack for the parameter X, and calls the user-defined function defined in cell A1 without passing the user-defined function defined in cell A1 any parameters. Accordingly, when the user-defined function defined in cell A1 is called, the evaluation context stack is in the same state shown in FIG. 12, i.e., no further arguments have been pushed onto the stack by the call in cell A2. Accordingly, when the user-defined function defined in cell A1 twice calls the R function to retrieve a result for the argument X, the R function returns the register result 1 from stack entry number 1, provided by the function invocation in cell A5. Thus, a later-called user-defined function in a calling chain can take advantage of a parameter result specified for an earlier-called user-defined function in the chain.
FIG. 11B shows the spreadsheet portion of FIG. 11A with the results of the formulas contained by each cell displayed. It can be seen that cell A5 has a result that is based upon the evaluation of both of the user-defined functions defined in cells A1 and A2.
It will be appreciated by those skilled in the art that the above-described facility may be straightforwardly adapted or extended in various ways. For example, the facility may be used with spreadsheet applications and other similar applications that use data structures other than cells, and/or those that can be referenced using various other techniques and reference types than those discussed above. Further, the facility may attribute different names to the built-in functions F and R. Alternatively, the facility may operate without such built-in functions, and perform additional parsing or other techniques to identify user-defined function definitions and invocations in a spreadsheet. While the foregoing description makes reference to preferred embodiments, the scope of the invention is defined solely by the claims that follow and the elements recited therein.