Method and system for defining semantic categories and actions

Information

  • Patent Grant
  • 7716163
  • Patent Number
    7,716,163
  • Date Filed
    Tuesday, July 17, 2001
    23 years ago
  • Date Issued
    Tuesday, May 11, 2010
    14 years ago
Abstract
An XML (eXtensible Markup Language) schema to define a list of terms to be recognized as semantic categories is disclosed. Using an instance of the XML schema, a user may easily define terms to be recognized as semantic categories without having to compile a complete recognizer plug-in. The XML schema may be written using any editing tool or XML schema editing tool to create an instance of the schema. An instance of the semantic category list schema is also referred to as a semantic category list file. Typically, the user completes all of the properties of the XML schema and specifies the list of terms to recognize directly in the XML schema. In addition to specifying terms directly in the XML schema, the user may use a binary representation of the list of terms because of size constraints. The user may also define actions in the semantic category list file.
Description
TECHNICAL FIELD

This invention relates to a method and system for defining semantic categories to recognize in electronic documents and defining actions for those semantic categories.


BACKGROUND OF THE INVENTION

Electronic documents typically include semantic information that would be helpful if the information was recognized as such. Recognition and use of this semantic information could result in increased interoperability between desktop software applications and other desktop applications and/or web-based applications. Recognition of this semantic information may also provide benefits in electronic commerce. Independent third parties should also be able to easily develop list of terms for recognition without the need to create compiled dynamic link libraries (DLLs).


Independent software developers and individual users are often in the best position to determine the semantic information that needs to be recognized in electronic documents. For example, a corporation's IT department knows the format of part numbers, employee numbers, and other semantic information that may be important to individuals in their corporation. Thus, there is a need for a system and method that allows users to define the format of semantic information to be recognized and to provide actions based on the defined semantic information. There is a further need to make this method and system as simple as possible so that the population of developers is increased and so that those who are best able to define semantic information are able to do so.


SUMMARY OF THE INVENTION

The present invention is used in association with a method and system for semantically labeling strings and providing actions for those semantically labeled strings. A string is defined as a data structure composed of a sequence of characters usually representing human-readable text. Strings are recognized and annotated, or labeled, with a semantic category, in particular a type label. After the strings are annotated with a type label, application program modules may use the type label and other metadata to provide users with a choice of actions. If the user's computer does not have any actions associated with that type label, the user may be provided with the option to surf to a download Uniform Resource Locator (URL) and download action plug-ins for that type label.


The present invention, in one embodiment, uses an XML (eXtensible Markup Language) schema to define a list of terms to be recognized as semantic categories. The XML schema in an embodiment of the present invention is also referred to herein as a semantic category list schema. Using an instance of the XML schema, a user may easily define a recognizer to recognize semantic categories without having to compile a complete recognizer plug-in. The XML schema may be written using any editing tool or XML schema editing tool to create an instance of the schema. An instance of the semantic category list schema is also referred to herein as a semantic category list file.


Typically, to prepare the semantic category list file, the user completes all of the properties of the XML schema and specifies the list of terms to recognize directly in the XML schema. In addition to specifying terms directly in the XML schema, the user may use a binary representation of the list of terms because of size constraints. The user may also define actions in the semantic category list file.


These and other features, advantages, and aspects of the present invention may be more clearly understood and appreciated from a review of the following detailed description of the disclosed embodiments and by reference to the appended drawings and claims.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a block diagram of a computer that provides the exemplary operating environment for the present invention.



FIG. 2 is a block diagram illustrating an exemplary architecture for use in conjunction with an embodiment of the present invention.



FIG. 3 is a flow chart illustrating a method for semantically labeling strings during creation of an electronic document.



FIG. 4 is an illustration of a display of a semantic category and its associated dropdown menu.



FIG. 5 is a flowchart illustrating a method for creating a semantic category list file in accordance with an embodiment of the present invention.



FIG. 6 is a flowchart illustrating a method for performing an update of a semantic category list file with a semantic category list update file in accordance with an embodiment of the present invention.



FIG. 7 is a flowchart illustrating a method for downloading semantic category terms in accordance with an embodiment of the present invention.



FIG. 8 is a block diagram illustrating an exemplary architecture for use in conjunction with an embodiment of the present invention.





DETAILED DESCRIPTION

The present invention is used in association with a method and system for semantically labeling strings and providing actions for those semantically labeled strings. A string is defined as a data structure composed of a sequence of characters usually representing human-readable text. Strings are recognized and annotated, or labeled, with a semantic category, in particular a type label. After the strings are annotated with a type label, application program modules may use the type label and other metadata to provide users with a choice of actions. If the user's computer does not have any actions associated with that type label, the user may be provided with the option to surf to a download Uniform Resource Locator (URL) and download action plug-ins for that type label.


The present invention, in one embodiment, uses an XML (eXtensible Markup Language) schema to define a list of terms to be recognized as semantic categories. The XML schema in an embodiment of the present invention is also referred to herein as a semantic category list schema. Using an instance of the XML schema, a user may easily define a recognizer to recognize semantic categories without having to compile a complete recognizer plug-in. The XML schema may be written using any editing tool or XML schema editing tool to create an instance of the schema. An instance of the semantic category list schema is also referred to herein as a semantic category list file.


Typically, to prepare the semantic category list file, the user completes all of the properties of the XML schema and specifies the list of terms to recognize directly in the XML schema. In addition to specifying terms directly in the XML schema, the user may use a binary representation of the list of terms because of size constraints. A binary representation of the list has at least two advantages: the size of the XML schema file is smaller because a Trie structure is used to compress the list of terms and searching the binary file is faster than searching a non-binary file. The user may also define actions in the semantic category list file.


Having briefly described an embodiment of the present invention, an exemplary operating environment for the present invention is described below.


Exemplary Operating Environment


Referring now to the drawings, in which like numerals represent like elements throughout the several figures, aspects of the present invention and the exemplary operating environment will be described.



FIG. 1 and the following discussion are intended to provide a brief, general description of a suitable computing environment in which the invention may be implemented. While the invention will be described in the general context of an application program that runs on an operating system in conjunction with a personal computer, those skilled in the art will recognize that the invention also may be implemented in combination with other program modules. Generally, program modules include routines, programs, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Moreover, those skilled in the art will appreciate that the invention may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, cell phones, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.


With reference to FIG. 1, an exemplary system for implementing the invention includes a conventional personal computer 20, including a processing unit 21, a system memory 22, and a system bus 23 that couples the system memory to the processing unit 21. The system memory 22 includes read only memory (ROM) 24 and random access memory (RAM) 25. A basic input/output system 26 (BIOS), containing the basic routines that help to transfer information between elements within the personal computer 20, such as during start-up, is stored in ROM 24. The personal computer 20 further includes a hard disk drive 27, a magnetic disk drive 28, e.g., to read from or write to a removable disk 29, and an optical disk drive 30, e.g., for reading a CD-ROM disk 31 or to read from or write to other optical media. The hard disk drive 27, magnetic disk drive 28, and optical disk drive 30 are connected to the system bus 23 by a hard disk drive interface 32, a magnetic disk drive interface 33, and an optical drive interface 34, respectively. The drives and their associated computer-readable media provide nonvolatile storage for the personal computer 20. Although the description of computer-readable media above refers to a hard disk, a removable magnetic disk and a CD-ROM disk, it should be appreciated by those skilled in the art that other types of media which are readable by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, and the like, may also be used in the exemplary operating environment.


A number of program modules may be stored in the drives and RAM 25, including an operating system 35, one or more application programs 36, a word processor program module 37 (or other type of program module), program data 38, and other program modules (not shown).


A user may enter commands and information into the personal computer 20 through a keyboard 40 and pointing device, such as a mouse 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are often connected to the processing unit 21 through a serial port interface 46 that is coupled to the system bus, but may be connected by other interfaces, such as a game port or a universal serial bus (USB). A monitor 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the monitor, personal computers typically include other peripheral output devices (not shown), such as speakers or printers.


The personal computer 20 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 49. The remote computer 49 may be a server, a router, a peer device or other common network node, and typically includes many or all of the elements described relative to the personal computer 20, although only a memory storage device 50 has been illustrated in FIG. 1. The logical connections depicted in FIG. 1 include a local area network (LAN) 51 and a wide area network (WAN) 52. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets and the Internet.


When used in a LAN networking environment, the personal computer 20 is connected to the LAN 51 through a network interface 53. When used in a WAN networking environment, the personal computer 20 typically includes a modem 54 or other means for establishing communications over the WAN 52, such as the Internet. The modem 54, which may be internal or external, is connected to the system bus 23 via the serial port interface 46. In a networked environment, program modules depicted relative to the personal computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.



FIG. 2 is a block diagram illustrating an exemplary architecture 200 for use in conjunction with an embodiment of the present invention. The architecture includes an application program module 205, such as word processor program module 37 (FIG. 1). The application program module 205 is able to communicate with a recognizer dynamic-link library 210 (hereinafter recognizer DLL) and an action dynamic-link library 215 (hereinafter action DLL) as a user is creating, editing, viewing, etc. an electronic document. The recognizer DLL 210 controls a number of recognizer plug-ins 220. The action DLL 215 controls a number of action plug-ins 225. The action DLL also controls a type-action database 230.


In a preferred embodiment, the action plug-ins and recognizer plug-ins are Automation Servers. Automation Servers are well-known software components which are assembled into programs or add functionality to existing programs running on the Microsoft WINDOWS® operating system. Automation Servers may be written in a variety of computing languages and may be un-plugged from a program at run time without having to recompile the program. It should also be understood that, in a preferred embodiment, the action DLL and recognizer DLL are merged into a single DLL.


The recognizer DLL 210 handles the distribution of strings from the electronic document running on the application program module 205 to the individual recognizer plug-ins 220. The recognizer plug-ins 220 recognize particular strings in an electronic document, such as a word processing document, a spreadsheet document, a web page, etc.


The recognizer plug-ins 220 may be packaged with the application program module 205 or they may be written by third parties to recognize particular strings that are of interest. Typically, the recognizer DLL 210 passes strings to the recognizer plug-ins 220 in one paragraph or cell value increments.


As part of recognizing certain strings as including semantic information, the recognizer plug-ins 220 determine which strings are to be labeled and how they are to be labeled. After receiving these results from the various recognizer plug-ins 220, the recognizer DLL 210 sends semantic categories to the application program module. In a preferred embodiment, a semantic category comprises the recognized string, a type label, and a download URL. A semantic category may also comprise metadata. The recognizer plug-ins 220 each run separately and the recognizer DLL 210 is responsible for handling the asynchronicity that results from different recognizer plug-ins returning results with different delays.


After a string is labeled by a recognizer plug-in 220 and a semantic category is sent to the application program module 205, the user of the application program module 205 will be able to execute actions that are associated with the type label of the semantic category. The action DLL 215 manages the action plug-ins 225 that are run to execute the actions. As with the recognizer plug-ins 220, the action plug-ins 225 may be packaged with the application program module 205 or written by third parties to perform particular actions that are of interest to the third party. The action plug-ins provide possible actions to be presented to the user based upon the type label associated with the string. The action DLL 215 determines what type label the semantic category includes and cross-references the type label in the type-action database 230 with a list of actions to determine what actions to present to the user. It should be understood that, in a preferred embodiment, the type-action database is not used. Instead, the list of actions is dynamically generated for each type by looking in the registry to determine which actions are installed and then querying the action DLLs to determine which types they apply to.


After the user chooses an action, the action DLL 215 manages the appropriate action plug-ins 225 and passes the necessary information between the action plug-ins and the application program module 205 so that the action plug-in may execute the desired action. Typically, the application program module sends the action DLL an automation request to invoke the action the user has selected.


As described above, the combination of the recognized string, type label, metadata and download URL is referred to herein as a semantic category. The type label is a semantic information label. The semantic category may also comprise metadata, which are hidden properties of the semantic category. An example of a semantic category may clarify the definition. Suppose a user enters the text “Gone With the Wind” into an electronic document. The string “Gone With the Wind” may be identified as a semantic category of type label “Book Title” and of type label “Movie Title”. In addition, metadata such as the ISBN number may be returned by the recognizer plug-in to the application program module as part of the semantic category. A download URL may be provided with the type labels “Book Title” and “Movie Title” in case the user's machine has not stored action plug-ins for these type labels. For example, an action for the type label “Book Title” may be “Buy this Book” from an online retailer. If the user does not have the action plug-in DLL 225 corresponding to “Buy this book”, then the download URL may be used to navigate the user's web browser to an appropriate website to download this action plug-in. In other implementations of the invention, multiple download URLs may be provided for a single type label.


It should also be understood that the present invention, in a preferred embodiment, also recognizes sequences of capitalized words that contain function words, and which are likely to be special, but for which there is no type label information. These strings are typically labeled by a grammar checker program module.


The actions provided for a semantic category may utilize both the type label and the text of the recognized string. For example, a word processor program module may use a grammar checker as a recognizer plug-in to label strings that are person names. After a string has been labeled as a person's name, the word processor program module may, through a standard user interface mechanism, allow users to execute pertinent actions, such as looking up the person's name in the contacts folder in a personal information manager program module, sending electronic mail, or searching for the person's name in an HR database.


Having described an exemplary architecture, an exemplary method 300 for semantically labeling strings during document creation will be described below in reference to FIGS. 2 and 3.


Method for Semantically Labeling Strings During Document Creation



FIG. 3 is a flow chart illustrating a method 300 for semantically labeling strings during creation of an electronic document. Those skilled in the art will appreciate that this is a computer-implemented process that is carried out by the computer in response to input from the user and instructions provided by a program module.


Referring to FIG. 3, the method 300 begins at start step 305 and proceeds to step 310 when a user opens an electronic document in application program module 205. In a preferred embodiment, the electronic document is a word processing document or a spreadsheet document. However, the method is not limited to either of these specific types of electronic documents.


At step 310, the application program module 205 receives a new string, such as when the user enters a new paragraph into the electronic document or edits a previously entered paragraph. The method 300 then proceeds to step 315.


At step 315, the paragraph containing the new string is passed from the application program module 205 to the recognizer DLL 210. The recognizer DLL is responsible for communicating with the application program module, managing the jobs that need to be performed by the recognizer plug-ins, receiving results from the recognizer plug-ins and sending semantic category information to the application program module. At boot time, the recognizer DLL communicates with its recognizer plug-ins to determine what languages it supports, what types it can apply, etc. It should be understood that, in a preferred embodiment, a paragraph is passed to the recognizer DLL at step 315. However, in alternative embodiments, a sentence, the contents of a spreadsheet cell, a section of the document, the entire document, etc. may be passed to the recognizer DLL. In other words, the present invention is not limited to simply passing a paragraph to the recognizer DLL. The method 300 then proceeds to step 320.


Still referring to step 315, the application program module 205 typically sends one paragraph at a time to the recognizer DLL. In addition, in a preferred embodiment, a grammar checker program module sends all semantic categories (without type labels) to the recognizer DLL that have been identified by the grammar checker program module. Passing these semantic categories (without type labels) to the recognizer DLL is important because doing so saves each recognizer plug-in from needing to decide whether something is a capitalized string interspersed with function words (a task that would require writing a number of regular expressions: Cap Cap Unc Cap; Cap Unc Cap; etc.). If a label is applied by a recognizer plug-in to a string the grammar checker program module labeled, the grammar checker label will then be removed.


At step 320, during idle time, the paragraph (and information from the grammar checker program module) is passed to the recognizer plug-ins. The method then proceeds to step 325.


It should be understood that, in a preferred embodiment, the recognizer DLL 210 maintains a job queue. If before the recognizer DLL 210 sends the paragraph to the recognizer plug-ins 220 the user edits the paragraph, then the job containing the edited paragraph is deleted and is not sent to the recognizer plug-ins. Then, a new job enters the queue at step 315 after the edited paragraph is received at step 310. This job deletion is necessary to prevent the recognizer plug-ins from performing unnecessary work on a paragraph that has been edited.


At step 325, the recognizer plug-ins are executed on the paragraph to recognize keywords or perform other actions defined by the recognizer plug-in. As part of executing the recognizer plug-in, the paragraph may be broken into sentences by the recognizer plug-in. However, each recognizer plug-in is responsible for its own sentence-breaking. After the keywords are found at step 325, then the method proceeds to step 330.


At step 330, the results from each of the recognizer plug-ins are received by the recognizer DLL. The method then proceeds to decision step 335.


At decision step 335, it is determined whether the paragraph that has been reviewed by the recognizer plug-ins has been edited after the paragraph was sent to the recognizer DLL. If so, then the method 300 returns to step 315 and the edited paragraph is received by the recognizer DLL from the application program module. If not, then the method proceeds to step 340.


At step 340, the results from the recognizer plug-ins are compiled into semantic categories by the recognizer DLL and the semantic categories are sent to the application program module. At step 345, the application program module displays the semantic categories to the user in the electronic document. The method 300 then ends at step 399.


As should be understood from the above description, the architecture for recognizing semantic categories permits third parties to develop recognizer plug-ins to identify strings of one or more particular types. The recognizer plug-ins communicate with the application program module and receive a string from the application program module. The recognizer plug-ins may apply recognition algorithms to the string and communicate the identity of recognized strings back to the application program module.


After a string is labeled with a particular type label, the user will be able to execute action plug-ins pertaining to that type label. The action plug-ins preferably are COM objects that are executed via communication between the application program module and the action DLL. Parameters necessary to execute the action (the HTML of the string labeled as being of a particular type, the HTML of the string representing the current selection) will be passed from the application program module to the action DLL and, in turn, passed to the action plug-in.


Actions Assigned to Type Labels


An architecture for identifying and executing a set of actions associated with a semantic category may also be provided. This architecture comprises actions that apply to a particular type label (e.g. an action for book titles may be “Buy this book from shop.Microsoft.com”) and executing those actions when the user so desires. An action is a user-initiated function applied to a typed string. For example, adding a name to the contacts folder is one action possible for a type label “Person name”.


There is power and flexibility that results from allowing third party vendors, such as IT professionals, to design and write recognizer plug-ins and action plug-ins for deployment within an organization or for deployment on the World Wide Web. Some example actions that may be executed include:

    • Schedule a meeting
    • Create task
    • Display calendar
    • Add to contacts folder


Look up in contacts folder, address book, Windows Address Book (WAB), Global

    • Address List (GAL), etc.
    • Insert address into document
      • Send mail to
    • Display EXPEDIA map
    • Stock quote lookup
    • Send instant message to


Different actions may be assigned to different type labels and these type label-action assignments may be stored in the type-action database 230. Table 1 below illustrates some possible type label-action pairings.












TABLE 1







Type Labels
Actions









Person name
Show contact info




Add to contacts




E-mail




Insert address into document




Send instant message to



Date
Show calendar for that day




New task with that due date




Schedule meeting that day



Place
Display EXPEDIA map




Add to contacts



Address
Add to contacts



Phone number
Add to contacts



E-mail
Add to contacts



Date
Schedule a meeting



Task
Schedule a task



Meeting
Schedule a meeting










For each type label, the type-action database 230 may store a download URL specified by the creator of the type label that users who do not have action-plug-ins or recognizer plug-ins for that semantic category type can go to in order to get action plug-ins and/or recognizer plug-ins. For example, the download URL for the type label “Book Title” might be microsoft.com/semanticcategories.asp. Once at that web page, a user may be offered downloads of various action plug-ins and recognizer plug-ins. There may also be an option on the user interface to navigate to the download URL so that recipients of documents with semantic categories can easily get the action plug-ins for those semantic categories.


Storing Semantic Categories


Semantic categories may be stored as part of the electronic document along with other document information and may be available when a document is transmitted from one computer to another computer. In a preferred embodiment, storing semantic categories in an electronic document is controlled by an “Embed semantic categories” checkbox. The checkbox is on by default. Turning it off will prevent semantic categories in the document from being saved. The state of the checkbox is per document. The same checkbox controls saving for both .htm and .doc documents.


Checking a “Save semantic categories as XML properties” checkbox (off by default) will write out the text of all of the semantic categories in the document and their labels in the header of the HTML file in XML (that is using the same tags as are used inline, but surrounded by <xml> And </xml>) for easy identification and parsing by search engines and knowledge management systems.


Semantic categories may be saved as a unique namespace plus a tag name. A namespace is an XML construct for uniquely identifying a group of XML tags that belong to a logical category. Thus, every semantic category is uniquely identified by its nametag (e.g., “streetname”) in addition to its namespace (e.g., “schemas-microsoft-com:outlook:contact”).


Although the method 300 described above is one method for identifying semantic categories, there may be other mechanisms for identifying semantic categories. One mechanism is a grammar checker program module (not shown) connected to word processor program module 37. Another mechanism is receiving a semantic category from another electronic document. For example, when text containing a semantic category is copied from one electronic document and pasted into another electronic document of the word processor program module 37, the information identifying the semantic category is preserved and copied along with the copied text.


Displaying Semantic Categories to the User


Referring now to FIG. 4, an illustration of a display of a semantic category 400 and its associated dropdown menu 405 will be described. It should be understood that FIG. 4 is an illustration of a semantic category 400 and dropdown menu 405 as displayed to a user by the application program module 205.


The string 410 associated with semantic category 400 is the string “Bob Smith”. As shown in FIG. 4, the string 410 of a semantic category 400 may be identified to the user by brackets 415. Of course, many other devices such as coloring, underlining, icons, etc. may be used to indicate to the user that a particular string is a semantic category.


In a preferred embodiment, when the user hovers a cursor over the string 410 or places the insertion point within string 410, then dropdown menu 405 is displayed to the user. The dropdown menu may display a list of actions associated with a semantic category. The dropdown menu may appear above and to the left of the semantic category string.


Typically, the first line of the dropdown menu indicates which string is the semantic category string (Bob Smith in FIG. 4) and what type the semantic category is (Person name in FIG. 4). Listed below the first line are actions 420 available for the semantic category type, such as “Send mail to . . . ”, “Insert Address”, and “Display contact information . . . ”.


The first item on the drop down menu below the separator line is “Check for new actions . . . ” 425. “Check for new actions . . . ” 425 will appear only for semantic categories whose download URL is available to the application program module. If selected, “Check for new actions . . . ” 425 uses the semantic category download URL to navigate the user's web browser to the homepage for the semantic category type applied to the string. For example, suppose new actions have been defined for the semantic category type “person name”. If so, then new actions will be downloaded to the user's computer after selecting “Check for new actions . . . ” 425. “Check for new actions . . . ” 425 will be grayed out if a download URL is unavailable for the semantic category.


If selected, the “Remove this semantic category” item 430 deletes the semantic category label from the string. If selected, the “Semantic categories” item 435 navigates the user to the semantic categories tab of the autocorrect dialog.


It should be understood that the application program module sends a request to the action DLL to determine which actions are shown with each semantic category type.


Actions Performed in Association with Semantic Categories


There are a number of functions that users perform on typed data that preferred word processor program module 37 and semantic categories will make easier. The functions fall into three primary categories:

    • 1) interacting with personal information manager contacts, tasks, meetings, and mail;
    • 2) interacting with properties on the World Wide Web or a corporate intranet; and
    • 3) interacting with other applications on the client machine.


A single string may be associated with multiple semantic categories. Every semantic category has a type label with one or more action plug-ins defined for the type label. For example, the “Address” type label may have the “Open in MapPoint”, “Find with Expedia Maps” and “Add to my Address Book” actions associated with it and each of these actions may have a different action plug-in to execute the action.


The actions assigned to type labels also depend on the computer that the application program module is running on. Thus, if a computer has three actions registered for the type label “Address”, then all strings with an “Address” type label will be assigned to three actions. However, if one of these semantic categories is sent to a computer which has only two actions registered for the “Address” type label, then the user will only be exposed to two actions for this semantic category.


Nesting of Semantic Categories


In an embodiment of the present invention, semantic categories may be nested inside each other. For example, the string “George Washington” may include a semantic category with type label “Person Name” for the span “George Washington State” and a semantic category with type label “State” for the span “Washington”. Moreover, two semantic categories may cover exactly the same span. For example, the string “George Washington” may include a semantic category with type label “Person Name” and a semantic category with type label “President”.


Because the preferred application program module 37 will support labeling a single string with multiple type labels (e.g. Bob Smith could be a semantic category labeled as a “Person Name” and labeled as a “Microsoft employee”), the preferred application program module 37 will use cascade menus on the dropdown menu if multiple semantic category types are assigned.


For example, the cascade menu may include a list of the type labels included in the recognized string. This list may include a type label “Person Name” and a type label “Microsoft employee”.


It should be understood that a cascade menu may be used to allow the user to select which type label the user is interested in and to further select an action after selecting the type label.


In-document User Interface to Indicate Semantic Categories


As described above with reference to FIG. 4, the application program module may include the option to display an in-document user interface to indicate the location of semantic categories. This in-document user interface may use a colored indication to indicate the location of a semantic category, such as the brackets 415 in FIG. 4. The in-document user interface will also be able to show nesting of semantic categories. For example, if Michael Jordan is labeled as a semantic category with type label “Person Name”, Michael is a semantic category with type label “First Name” and Jordan is a semantic category with type label “Last Name”, the document may look like this with the brackets indicating semantic categories:

    • [[Michael][Jordan]]


Of course, the in-document user interface may be any sort of indication. For example, in the “EXCEL” spreadsheet application program, the interface comprises a triangle in the lower right hand portion of a cell to indicate that one or more semantic categories are present in the cell.


Although the present invention has been described as implemented in a word processing program module, it should be understood that the present invention may be implemented in other program modules, including, but not limited to, HTML authoring programs and programs such as the “POWERPOINT”® presentation graphics program and the “OFFICE” program module, both marketed by Microsoft Corporation of Redmond, Wash.


As described above, the semantic category may also include metadata returned by the recognizer plug-ins. For example, a recognizer plug-in that recognizes the titles of books may return as metadata an ISDN book number when it recognizes the title of a book. The ISDN book number metadata may then be used to provide actions. Metadata may also be used to disambiguate for actions and searches. For example, suppose a recognizer DLL is linked to a corporate employee database to recognize names. When the recognizer DLL recognizes “Bob Smith”, it may store “employeeID=12345” as metadata in the background. Then, when an action is fired, the text in question will be known to reference Bob Smith, employee no. 12345 rather than Bob Smith, employee no. 45678. Also, the metadata may allow searches to be performed independent of the actual text in a document. So, a search may be conducted on “Robert Smith” by looking for employee 12345 in the employee databases and by performing a search on the metadata for employee number 12345 to find documents with “Bob Smith” in them. There are also numerous other functions for metadata. For instance, DHTML could be inserted so special tricks may be performed within a web browser. Additionally, data used by other actions may be inserted such as someone's e-mail address that could be used by the send-mail-to action, a normalized version of the date could be stored to easily interact with a personal information manager, etc.


Defining a List of Terms to be Recognized


The present invention, in one embodiment, uses an XML (eXtensible Markup Language) schema to define a list of terms to be recognized as semantic categories. The XML schema in an embodiment of the present invention is also referred to herein as a semantic category list schema. Using an instance of the XML schema, a user may easily define terms to be recognized as semantic categories without having to compile a complete recognizer plug-in. The XML schema may be written using any editing tool or XML schema editing tool to create an instance of the schema. An instance of the semantic category list schema is also referred to herein as a semantic category list file.


Typically, in preparing the semantic category list file, the user completes all of the properties of the XML schema and specifies the list of terms to recognize directly in the XML schema. In addition to specifying terms directly in the XML schema, the user may use a binary representation of the list of terms because of size constraints. A binary representation of the list has at least two advantages: the size of the XML schema file is smaller because a Trie structure is used to compress the list of terms and searching the binary file is faster than searching a non-binary file. The user may also define actions in the semantic category list file.


Semantic Category List File


In one embodiment of the present invention, each semantic category list file adheres to a semantic category list schema. The schema specifies the XML tag names allowed or required in the list file and their syntax. The semantic category list file in Table 2 below is for a fictitious company called “A.Datum Corporation”. The list file comprises “medical condition” terms to be recognized. Note that the “FL” in the list file of Table 2 is used to map the semantic categories to an XML namespace declaration at the top of the semantic category list file. In one embodiment of the invention, “FL” is required and must map to the appropriate namespace for semantic categories, such as “urn:schemas-microsoft-com:smarttags:list”.











TABLE 2









<FL:smarttaglist xmlns:FL=“urn:schemas-microsoft-com:smarttags.list”>



<FL:name>Medical Condition Terms</FL:name>



<FL:lcid>1033</FL:lcid>



<FL:description>A list of medical conditions for recognition, as well







as a set of actions that work with them.</FL:description>









<FL:moreinfourl>http://www.adatum.com/moreinfo</FL:moreinfourl>



<FL:updateable>true</FL:updateable>



<FL:autoupdate>true</FL:autoupdate>



<FL:lastcheckpoint>100</FL:lastcheckpoint>



<FL:lastupdate>0</FL:lastupdate>







<FL:updateurl>http://www.adatum.com/smarttags/listupdate.xml</FL:updateu


rl>









<FL:updatefrequency>20160</FL:updatefrequency>



<FL:smarttag type=“urn:schemas-adatum-com:medical#condition”>



<FL:caption>A. Datum Corporation</FL:caption>



<FL:terms>



<FL:termlist>allergy, cough, arthritis, headache, migraine, heartburn,







high blood pressure, digestive disorder, diarrhea, cold, thyrotoxicosis,


thalassemia, bloating, nausea, bronchitis</FL:termlist>









</FL:terms>



<FL:actions>









<FL:action id=“CompanyInfo”>



<FL:caption>&amp;A. Datum Corporation Company Reports</FL:caption>



<FL:url>http://www.adatum.com</FL:url>



</FL:action>



<FL:action id=“CompanyHomePage”>



<FL:caption>View A. &amp; Datum Website</FL:caption>









<FL:url>http://www.adatum2.com/home.asp?String={TEXT}</FL:url>









</FL:action>









</FL:actions>



</FL:smarttag>



</FL:smarttaglist>










The elements of the exemplary semantic category list file of Table 2 will be described below.


Semantic Category List Schema


The semantic category list file of Table 2 adheres to a semantic category list schema in accordance with an embodiment of the present invention. The elements of the semantic category list schema are individually described below.

  • smarttaglist—a schema namespace declaration.
  • name—a user-friendly name for this semantic category recognizer.
  • lcid—a comma separated list of “LocaleIDs” or language identifiers of languages in which items in the list will be recognized. If the value of this tag is *, 0 or is not specified, it is assumed that the list works in all locales. Sometimes the host application does not specify the lcid and, in that case, the value is ignored. In some applications, language auto-detection determines what the language is.
  • description—a longer string that describes this semantic category.
  • moreinfourl—a URL for more information on this recognizer.
  • updateable—a Boolean flag that specifies whether this list is updateable. If this element is not specified, it is assumed that the list is not updateable.
  • autoupdate—a Boolean flag that specifies whether this recognizer should auto-update. If this element is not specified, it is assumed that the list does not auto-update.
  • lastcheckpoint—an ID specifying the last semantic category list update. It is an integer that serves as the “version number” for the last update. If the server has a higher version number than the lastcheckpoint ID when an auto-update is performed, an update occurs. If the operation is successful, lastcheckpoint is updated to the higher version number.
  • lastupdate—an integer that specifies the time when the last update occurred. It is a long integer that represents the number of minutes since 1970. Normally, this value is initially set to zero. The lastupdate value is used to determine whether it is time to check for updates. For example, it may be inefficient to check for an update if the last update was obtained one day ago.
  • updateurl—a URL to check for updates to the list of terms to be recognized. If this element is not specified, the list is not designed to be updateable.
  • updatefrequency—an integer that specifies in minutes how often a list should be updated. If this element is not specified, assume a default value of 10080 (7 days).
  • updatefrequency is used in conjunction with lastupdate.
  • smarttag type—a unique namespace, specified as namespaceURI#tagname. The namespaceURI ensures that the smarttag type is globally unique and unambiguous. Two semantic categories with the same tag name can therefore be differentiated using namespaces. For example, two booksellers may use the tagname “Books” as long as they use different namespace URIs.
  • caption—specifies the title caption for the semantic category to be displayed.
  • terms—a collection of terms to recognize.
  • termfile—A link to the binary file that includes terms to be recognized (not shown in Table 2).
  • termlist—the contents of this element should be a comma-separated list of terms to be recognized.
  • property—a name and value pair to be attached to the property bag if the term is recognized. The property element allows users to attach metadata to the semantic category. For example, for a semantic category entitled “Books”, uniform metadata such as Booktype=fiction may be attached using the property element.
  • actions—a list of new or revised action identifiers.
  • action—this element has one attribute called id. id is a required alphanumeric string that uniquely identifies the action that applies to a particular type label.
  • url—specifies the URL to activate for an action. The URL supports a number of tokens that serve as parameters to the HTTP fire. These tokens are described in further detail below.
  • caption—an action caption.


One of the more important properties defined in the semantic category list file is the tag name (the tag name is the attribute contained within the “FL:smarttag type” tag). Here, the list provider can specify which tag name they would like to recognize the terms in. This gives list providers the flexibility to define custom semantic categories that they can mix and match with different actions.


Semantic category terms can be defined literally in the semantic category list file within the <FL:termlist> tag. Semantic category terms can also be encoded into a custom binary file format optimized for parsing speed and memory overhead. Binary semantic category terms can be “pointed at” with the <FL:termfile> tag.


For literal semantic category terms, the schema allows list creators to recognize terms in a case insensitive manner (e.g., either “cold” or “ColD” are recognized) or a case sensitive manner (e.g., only “cold” is recognized). In order to recognize terms in a case sensitive manner, the list creator can encapsulate terms within quotes. Terms not encapsulated within quotes are recognized in a case insensitive manner.


In other embodiments, the schema may provide broader support for defining recognition in a semantic category list file. Recognition need not occur via strict lists of terms that are recognized in case sensitive or insensitive fashion. In one embodiment, recognition is specified via context-free grammars (CFGs). CFGs provide a mechanism for specifying a text pattern that items being recognized can match or not match. If text patterns meet a specified CFG, then they are recognized. For example, the recognition might occur via XML plug-ins to a CFG recognition engine.


Semantic Category List Actions


To be useful to the broadest range of end users, semantic categories placed into documents should be associated with some actions. In recognition of this, the invention, in one embodiment, not only makes it easy to specify lists of terms that should be recognized but also makes it easy to supply actions to be associated with those recognized terms.


In one embodiment, the present invention allows a creator to specify multiple actions within the semantic category list file. For example, an action to open a web browser program module and navigate to a particular URL may be specified in the semantic category list file. The semantic category list file may also be used to define an action that is defined in a separate action plug-in.


More specifically, the present invention, in one embodiment, allows users to specify web page navigation actions in a semantic category list file. More than one action can be supplied per semantic category list file by adding more than one <FL:action> tag within the <FL:actions> collection.


In one embodiment, the present invention may replace tokens in the supplied URL with data that is specific to the semantic category being acted upon. In effect, it enables parameterized URLs to be used. The tokens are URL encoded so as to work in most browsers. In one embodiment, the present invention supports the following tokens:

  • {TEXT}—this token is replaced with the semantic category value. For example, for a stock ticker symbol {TEXT} might be “MSFT”;
  • {TAG}—this token is replaced with the tag name for the semantic category;
  • {PROP:VALUE}—this token is replaced with meta data from a semantic category property bag. VALUE is the name of a property bag key. So, if the property bag for a semantic category contains a property called “Company” with a value of “Microsoft”, the token {PROP:Company} will be replaced with Microsoft or Company=Microsoft, for example; and
  • {LCID}—an integer corresponding to the user's current UI language lcid.


    Creating a Semantic Category List File


Referring now to FIG. 5, a method 500 for creating a semantic category list file in accordance with an embodiment of the present invention will be described. At step 505, the user begins by launching an editing tool such as the “NOTEPAD” tool marketed by Microsoft Corporation of Redmond, Wash. The user may save the file he is working on as a text file which will be the source file (the semantic category list file). The source file can be returned to and updated as necessary. The method then proceeds to step 510.


At step 510, the semantic category list schema is completed using the values for different elements determined by the user to form the semantic category list file. The method then proceeds to step 515.


At step 515, the semantic category list file is stored in a directory. In one embodiment of the invention, the directory is one of a few specific directories which are searched to find semantic category list files. These directories are described below.


Deploying Semantic Category List Files in a Directory


As described above, in one embodiment, the present invention requires a semantic category list file conforming to a specific semantic category list schema in order to implement user-defined recognition of terms and/or actions. In one embodiment, the present invention searches for these semantic category list files by looking for .XML files located in one of three directories on a file system. One directory is located in a per-machine location, another directory is located in a per-user location, and another directory may be defined by the user by writing a registry key that points to a custom location.


Taken together, these directories give the user the flexibility to install semantic category list files that affect just one user or all users on a given machine. The user may use the custom directory functionality to specify a department or corporate file share which contains common semantic category list files.


Updating Semantic Category List Files


Semantic category terms may change over time. Stock ticker symbols, for example, change as companies enter and leave stock exchanges. Thus, if a semantic category list file to recognize stock ticker symbols is created then it will need to be periodically updated. For this reason, in one embodiment, the present invention provides support for communicating with a server that supports Hypertext Transfer Protocol (HTTP) to determine if a new update is necessary.


The server is given an opportunity to define whether a new update exists and when it should be downloaded. It does this by defining an update description file, using an XML schema instance referred to herein as the semantic category list update file. An exemplary semantic category list update file is illustrated in Table 3 below:











TABLE 3









<FLUP:smarttaglistupdate xmlns:FLUB=“urn:schemas-microsoft-







com:smarttags:listupdate”>









<FLUP:checkpoint>400</FLUP:checkpoint>



<FLUP:smarttaglistdefinition>foo.xml</FLUP:smarttaglistdefinition>









</FLUP:smarttaglistupdate>










The exemplary semantic category list update file of Table 3 indicates that semantic category terms exist on the server with a checkpoint value of 400. It also specifies which list of semantic category terms should be downloaded: either the XML file that represents the list, its binary list representation, or both. In the example of Table 3, the semantic category terms entitled foo.xml are to be downloaded.


Central to the notion of an update is the checkpoint value. The checkpoint value can be considered a version number for the current list definition stored on the server. If the checkpoint is greater than the lastcheckpoint of the currently installed semantic category terms, then the newer files are downloaded via HTTP to replace their existing counterparts. Then, the semantic category terms are updated to match what it received from the server.


To place a semantic category list update file on a server, a file with the “FLUP” (or some other namespace shorthand alias) is placed on the appropriate server. The new semantic category terms may also be stored in the same directory along with the semantic category list update file.


This semantic category list update file is named to match the example pointed to by the updatedurl element in the semantic category list file. For example, referring to Table 2, the semantic category list update file would need to be stored as “listupdate.xml” to match the updateurl element in the semantic category list file.


The semantic category list update file adheres to an XML schema (the semantic category list update schema) as will be described below. The elements in one embodiment of the semantic category list update schema are described below:

  • smarttaglistupdate—contains the update schema namespace declaration.
  • checkpoint—this value has to be greater than the lastcheckpoint value for an update to occur. It is also the new version number (lastcheckpoint value) to record in the semantic category list file if any files are updated.
  • smarttaglistdeflnition—points to the new semantic category terms to be downloaded to replace the existing (old) semantic category terms on the client's computer. For every smarttaglistdefinition element supplied by the semantic category list file, corresponding replacement semantic category terms are downloaded.


Referring now to FIG. 6, a method 600 for performing an update of a semantic category list file with a semantic category list update file in accordance with an embodiment of the present invention will be described. At step 602, the method begins as the application program module is booted and begins running.


At decision step 605, it is determined whether the user has initiated an action associated with a semantic category list file. If so, then the method proceeds to decision step 610. Performing an update check when action code of a semantic category list file is called eliminates the need to have a separate background process that periodically checks for updates. Also, in one embodiment, the present invention only checks for updates when actions fire to ensure that updating is performed only for users who use an action. For example, all possible users do not need updates from a web server if only a small minority of users would want to use a particular action functionality. Checking for updates when actions fire ensures that only people who actively use the action incur the overhead of checking for updates. Of course, in other embodiments, updates may be periodically triggered or triggered by the user.


At decision step 610, it is determined whether the interval specified by the updatefrequency element in the semantic category list file on the client's computer has elapsed. If it has, the method proceeds to step 615. If not, the method 600 returns to decision step 605. Typically, at decision step 610, the difference between the current time and the last update value is determined and if the difference is greater than the update frequency, then the method proceeds to decision step 612. However, if the difference is less than the update frequency, then the method returns to decision step 605.


At decision step 612, it is determined whether the web server is available so that the user does not have to wait on the web server. If the web server is available, the method proceeds to step 615. If not, then the method ends at step 699.


At step 615, the URL specified in the semantic category list file in the updateurl element is called. The method then proceeds to step 620.


At step 620, the web server's update manifest file is retrieved and the lastcheckpoint value is determined. The method then proceeds to decision step 625.


At decision step 625, it is determined whether the checkpoint value of the semantic category update file is greater than the lastcheckpoint value of the semantic category list file. If not, then the method proceeds to step 627 where the lastupdate value is set equal to the current time. If the checkpoint value of the semantic category update file is greater than the lastcheckpoint value of the semantic category list file, then the method 600 proceeds to step 630. Otherwise, the method ends at step 699.


At step 630, the semantic category terms from the semantic category update file are downloaded to replace the existing semantic category terms in the semantic category list file. An embodiment for replacing the existing semantic category terms is described in reference to FIG. 7. The method then proceeds to step 640.


At step 640, the lastcheckpoint value in the semantic category list file is updated to be equal to the checkpoint value of the semantic category update file. The lastupdate value of the semantic category list file may also be set to the current time. The method then returns to step 602.


Referring now to FIG. 7, a flowchart illustrating a method for downloading semantic category terms (step 630 in FIG. 6) in accordance with an embodiment of the present invention will be described.


At step 631, the update manifest file is retrieved from the web server. The method 630 then proceeds to step 632.


At step 632, the non-semantic category list files (i.e. those marked by “<smarttaglistfile>” in the update manifest) are determined. Step 632 is performed first such that these auxiliary files are in place before the semantic category list file(s) are updated (which typically reference these auxiliary files). Also these auxiliary files tend to be larger, so they are slightly more likely to fail in downloading.


At step 633, each <smarttaglistfile> found at step 632 is downloaded into the same directory as the semantic category list file which triggered this update.


At decision step 634, it is determined whether the download was successful. If not, the method ends. If the download was successful, then the method proceeds to step 635.


At step 635, all listed semantic category list files (i.e. those marked by “<smarttaglistdefinition>” in the update manifest) are determined and downloaded.


Although not shown in FIG. 7, at step 640 (FIG. 6) for each file, the lastcheckpoint value is updated to match that in the update manifest.


Semantic Category List Tool


Referring now to FIG. 8, a block diagram illustrating an exemplary architecture 700 for use in conjunction with an embodiment of the present invention will be described. Many of the elements are the same as in FIG. 2 and are indicated by the same numerals. The architecture 700 includes a semantic category list tool 705 connected to the action DLL 215 and recognizer DLL 210. In one embodiment, the semantic category list tool 705 is a wrapper recognizer DLL and action DLL. A wrapper is essentially a class (for example a C++ class) that contains an object to which the class provides an interface. A wrapper class is so called because it encapsulates, or “wraps,” the code involved in certain tasks, such as getting and releasing interface pointers and working with strings. In one embodiment of the invention, the semantic category list tool wraps around the ISmartTagRecognizer and ISmartTagAction APIs which implement semantic category recognition and actions so that the single plug-in may be used to implement both recognition and action.


The semantic category list tool 705 provides several services including maintaining lists of terms associated with any number of type labels, acting as a recognizer that works with multiple lists of terms, providing HTTP-based actions that work with any number of type labels, using HTTP-based communications to keep in contact with a web server and update its list of terms and actions, etc.


The semantic category list tool 705 may be used by any individual or organization to maintain their own list of terms and HTTP-based actions for those recognized terms. Users generate semantic category list files 710 and store them in one of a number of predefined directories. The semantic category list tool searches these directories and reads the contents of the semantic category list files. If the list files conform to the semantic category list schema, then the semantic category list tool is able to use these list files to generate the appropriate APIs and populate the fields of the API with the values from the semantic category list file. Thus, the semantic category list tool is able to use the semantic category list files to perform user-defined recognition and actions. The files are parsed and the data specified for each element of the schema is used.


The semantic category list tool 705 is also able to communicate with server 715 to update the semantic category list files 710 using the semantic category list update files 720, the semantic category terms 725, and the semantic category term file 730.


It should be understood from the foregoing description that for use in international settings, the semantic category list file in accordance with an embodiment of the invention may be written in Unicode. This allows any extended character to be specified in the termlist.


It should be understood that in one embodiment of the invention the XML Data Interchange Format is used to define a semantic category list schema and file. However, other languages and formats known to those skilled in the art may also be used in other embodiments of the invention.


It should also be understood that the present invention may be used to define actions that work in conjunction with a recognizer plug-in developed using another method. For example, an XML list may be used to define simple actions that work in conjunction with a recognizer plug-in developed using another method. The converse is also true: an XML list may be used to define a list of terms to recognize and actions may be defined using a more complicated tool such as Visual Basic.


Although the present invention has been described above as implemented in preferred embodiments, it will be understood that alternative embodiments will become apparent to those skilled in the art to which the present invention pertains without departing from its spirit and scope. Accordingly, the scope of the present invention is defined by the appended claims rather than the foregoing description.

Claims
  • 1. In a computer system for recognizing a semantic category in an electronic document, a semantic category list file stored in a directory associated with the computer system, the file comprising: a semantic category list tool for creating the semantic category list file that includes elements of a semantic category list schema, wherein the semantic category is utilized to present a user with choices of actions that are executed in reference to a text and a type label of a string in the electronic document labeled with the type label as belonging to the semantic category, wherein the semantic category list schema is an XML schema and wherein the XML schema includes a binary representation of the elements;a list of terms in the semantic category list file, wherein the terms are strings in the electronic document that are recognized as belonging to the semantic category, wherein the strings in the electronic document are labeled with a type label associating each string with a semantic category, and wherein an updated list of terms for the semantic category list file is stored in a semantic category term file; anda list of a plurality of actions in the semantic category list file, wherein the plurality of actions are actions performed in reference to each term in the list of terms in the semantic category list file,wherein the list of terms and the list of a plurality of actions in the semantic category list file are defined according to an Extensible Markup Language (XML) schema,wherein the XML schema is utilized to define a recognizer for recognizing the strings belonging to the semantic category,wherein the list of the plurality of actions is utilized to present a user with choices of actions that are executed in reference to each term, based on the type label associated with a text of each string,wherein an update Universal Resource Locator (URL) of a web server is called to locate a semantic category update file,wherein a lastcheckpoint value of the semantic category list file is sent to the web server,prior to the updated list of terms for the semantic category list file is stored in a semantic category term file, determine whether a new update exists wherein a checkpoint value of the semantic category update file is greater than the lastcheckpoint value of the semantic category list file, and, if so, then download a plurality of semantic category terms from the semantic category update file to replace a plurality of semantic category terms in the semantic category list file,if no update exists, leave the semantic category list file unchanged, andstore the updated semantic category list file in the directory.
  • 2. The semantic category file of claim 1 wherein the semantic category term file is a compressed binary file.
  • 3. The semantic category file of claim 1 wherein the semantic category file further comprises a localeID identifying a language in which the terms are to-be recognized.
  • 4. The semantic category file of claim 3, wherein the lastcheckpoint value identifies a version number of the last update of the semantic category file.
  • 5. The semantic category file of claim 3 further comprising a lastupdate value, wherein the last update value identifies a time of the last update of the semantic category file.
  • 6. The semantic category file of claim 5 wherein the update URL is a website address to check for updates to the list of terms.
  • 7. The semantic category file of claim 6 further comprising an update frequency value, wherein the update frequency value specifies how often the list of terms are updated.
  • 8. The semantic category file of claim 1 further comprising an action identifier uniquely identifying the action that applies to the semantic category.
  • 9. The semantic category file of claim 8 further comprising an action URL specifying the URL to activate for the action.
  • 10. A computer-implemented method for creating a semantic category list file for recognizing a semantic category in an electronic document, the method comprising: using a semantic category list tool for creating the semantic category list file that includes elements of a semantic category list schema, wherein the semantic category is utilized to present a user with choices of actions that are executed in reference to a text and a type label of a string in the electronic document labeled with the type label as belonging to the semantic category, wherein the semantic category list schema is an XML schema and wherein the XML schema includes a binary representation of the elements;calling an update Universal Resource Locator (URL) of a web server to locate a semantic category update file;sending a lastcheckpoint value of the semantic category list file to the web server;determining whether a new update exists prior to performing the update by determining whether a checkpoint value of the semantic category update file is greater than the lastcheckpoint value of the semantic category list file, and, if so, then downloading a plurality of semantic category terms from the semantic category update file to replace a plurality of semantic category terms in the semantic category list file;if no update is available, leaving the semantic category list file unchanged; andstoring the semantic category list file in a directory.
  • 11. The computer-implemented method of claim 10 wherein the semantic category list file comprises the following elements: a list of terms, wherein the terms are strings that are recognized as the semantic category; anda plurality of actions, wherein the plurality of actions are actions that are performed in reference to the semantic category.
  • 12. A computer-implemented method for performing an update to a semantic category list file, comprising: using a semantic category list tool determining whether to proceed with the update to the semantic category list file, wherein each semantic category in the semantic category list file is utilized to present a user with choices of actions that are executed based on a text and a type label of a string in an electronic document belonging to each semantic category, and wherein each referenced string in the electronic document is labeled with the type label associating the string with a semantic category;calling an update Universal Resource Locator (URL) of a web server to locate a semantic category update file;sending a lastcheckpoint value of the semantic category list file to the web server;determining whether a new update exists prior to performing the update by determining whether a checkpoint value of the semantic category update file is greater than the lastcheckpoint value of the semantic category list file, and, if so, then downloading a plurality of semantic category terms from the semantic category update file to replace a plurality of semantic category terms in the semantic category list file;if no update is available, leaving the semantic category list file unchanged; andstoring the updated semantic category list file in a directory.
  • 13. The method of claim 12 further comprising updating the lastcheckpoint value in the semantic category list file that is equal to the checkpoint value of the semantic category update file.
  • 14. The method of claim 13 further comprising setting a lastupdate value of the semantic category list file to a current time setting.
  • 15. The method of claim 12 wherein determining whether to proceed with an update comprises determining whether the user has initiated an action associated with a semantic category list file, and, if so, then determining to perform an update.
  • 16. The method of claim 12 wherein determining whether to proceed with an update comprises determining whether an interval of time specified in the semantic category list file has elapsed, and, if so, then determining to perform an update.
  • 17. The method of claim 16 wherein the interval of time comprises an update frequency element.
REFERENCE TO RELATED APPLICATIONS

This is a continuation-in-part of U.S. patent application Ser. No. 09/588,411, entitled “METHOD AND SYSTEM FOR SEMANTICALLY LABELING STRINGS AND PROVIDING ACTIONS BASED ON SEMANTICALLY LABELED STRINGS”, filed Jun. 6, 2000, which is incorporated by reference herein.

US Referenced Citations (353)
Number Name Date Kind
4674065 Lange et al. Jun 1987 A
4868750 Kucera et al. Sep 1989 A
5020019 Ogawa May 1991 A
5082253 Suzuki et al. Jan 1992 A
5128865 Sadler Jul 1992 A
5159552 van Gasteren et al. Oct 1992 A
5267155 Buchanan et al. Nov 1993 A
5287448 Nicol et al. Feb 1994 A
5297039 Kanaegami et al. Mar 1994 A
5317546 Balch et al. May 1994 A
5337233 Hofert et al. Aug 1994 A
5341293 Vertelney et al. Aug 1994 A
5351190 Kondo Sep 1994 A
5386564 Shearer et al. Jan 1995 A
5392386 Chalas Feb 1995 A
5418902 West et al. May 1995 A
5446891 Kaplan et al. Aug 1995 A
5522089 Kikinis et al. May 1996 A
5535323 Miller et al. Jul 1996 A
5541836 Church et al. Jul 1996 A
5546521 Martinez Aug 1996 A
5581684 Dudzik et al. Dec 1996 A
5596700 Darnell et al. Jan 1997 A
5617565 Augenbraun et al. Apr 1997 A
5625783 Ezekiel et al. Apr 1997 A
5627567 Davidson May 1997 A
5627958 Potts et al. May 1997 A
5634019 Koppolu et al. May 1997 A
5640560 Smith Jun 1997 A
5657259 Davis et al. Aug 1997 A
5685000 Cox Nov 1997 A
5708825 Sotomayor Jan 1998 A
5715415 Dazey et al. Feb 1998 A
5717923 Dedrick Feb 1998 A
5752022 Chiu et al. May 1998 A
5761689 Rayson et al. Jun 1998 A
5764794 Perlin Jun 1998 A
5765156 Guzak et al. Jun 1998 A
5781189 Holleran et al. Jul 1998 A
5781904 Oren et al. Jul 1998 A
5794257 Liu et al. Aug 1998 A
5799068 Kikinis et al. Aug 1998 A
5802262 Van De Vanter Sep 1998 A
5802299 Logan et al. Sep 1998 A
5802530 van Hoff Sep 1998 A
5805911 Miller Sep 1998 A
5809318 Rivette et al. Sep 1998 A
5815830 Anthony Sep 1998 A
5818447 Wolf et al. Oct 1998 A
5821931 Berquist et al. Oct 1998 A
5822539 van Hoff Oct 1998 A
5822720 Bookman et al. Oct 1998 A
5826025 Gramlich Oct 1998 A
5832100 Lawton et al. Nov 1998 A
5845077 Fawcett Dec 1998 A
5855007 Jovicic et al. Dec 1998 A
5859636 Pandit Jan 1999 A
5872973 Mitchell et al. Feb 1999 A
5875443 Nielsen Feb 1999 A
5877757 Baldwin et al. Mar 1999 A
5884266 Dvorak Mar 1999 A
5892919 Nielsen Apr 1999 A
5893073 Kasso et al. Apr 1999 A
5893132 Huffman et al. Apr 1999 A
5895461 De La Huerga et al. Apr 1999 A
5896321 Miller et al. Apr 1999 A
5900004 Gipson May 1999 A
5907852 Yamada May 1999 A
5913214 Madnick et al. Jun 1999 A
5920859 Li Jul 1999 A
5924099 Guzak et al. Jul 1999 A
5933139 Feigner et al. Aug 1999 A
5933140 Strahorn et al. Aug 1999 A
5933498 Schneck et al. Aug 1999 A
5940614 Allen et al. Aug 1999 A
5944787 Zoken Aug 1999 A
5946647 Miller et al. Aug 1999 A
5948061 Merriman et al. Sep 1999 A
5956681 Yamakita Sep 1999 A
5974413 Beauregard et al. Oct 1999 A
5987480 Donohue et al. Nov 1999 A
5991719 Yazaki et al. Nov 1999 A
5995756 Hermann Nov 1999 A
6006265 Rangan et al. Dec 1999 A
6006279 Hayes Dec 1999 A
6014616 Kim Jan 2000 A
6018761 Uomini Jan 2000 A
6028605 Conrad et al. Feb 2000 A
6029135 Krasle Feb 2000 A
6029171 Smiga et al. Feb 2000 A
6031525 Perlin Feb 2000 A
6052531 Waldin et al. Apr 2000 A
6061516 Yoshikawa et al. May 2000 A
6067087 Krauss et al. May 2000 A
6072475 Van Ketwich Jun 2000 A
6073090 Fortune et al. Jun 2000 A
6085201 Tso Jul 2000 A
6088711 Fein et al. Jul 2000 A
6092074 Rodkin et al. Jul 2000 A
6108640 Slotznick Aug 2000 A
6108674 Murakami et al. Aug 2000 A
6112209 Gusack Aug 2000 A
6121968 Arcuri et al. Sep 2000 A
6122647 Horowitz et al. Sep 2000 A
6126306 Ando Oct 2000 A
6137911 Zhilyaev Oct 2000 A
6141005 Hetherington et al. Oct 2000 A
6151643 Cheng et al. Nov 2000 A
6154738 Call Nov 2000 A
6167469 Safai et al. Dec 2000 A
6167523 Strong Dec 2000 A
6167568 Gandel et al. Dec 2000 A
6173316 De Boor et al. Jan 2001 B1
6182029 Friedman Jan 2001 B1
6185550 Snow et al. Feb 2001 B1
6185576 McIntosh Feb 2001 B1
6199046 Heinzle et al. Mar 2001 B1
6199081 Meyerzon et al. Mar 2001 B1
6208338 Fischer et al. Mar 2001 B1
6219698 Iannucci et al. Apr 2001 B1
6246404 Feigner et al. Jun 2001 B1
6262728 Alexander Jul 2001 B1
6272074 Winner Aug 2001 B1
6272505 De La Huerga Aug 2001 B1
6282489 Bellesfield et al. Aug 2001 B1
6291785 Koga et al. Sep 2001 B1
6292768 Chan Sep 2001 B1
6295061 Park et al. Sep 2001 B1
6297822 Feldman Oct 2001 B1
6300950 Clark et al. Oct 2001 B1
6308171 De La Huerga Oct 2001 B1
6311152 Bai et al. Oct 2001 B1
6311177 Dauerer et al. Oct 2001 B1
6311194 Sheth et al. Oct 2001 B1
6320496 Sokoler et al. Nov 2001 B1
6323853 Hedloy Nov 2001 B1
6336125 Noda et al. Jan 2002 B2
6336131 Wolfe et al. Jan 2002 B1
6338059 Fields et al. Jan 2002 B1
6339436 Amro et al. Jan 2002 B1
6339755 Hetherington et al. Jan 2002 B1
6347398 Parthasarathy et al. Feb 2002 B1
6349295 Tedesco et al. Feb 2002 B1
6353926 Parthesarathy et al. Mar 2002 B1
6381742 Forbes et al. Apr 2002 B2
6382350 Jezewski et al. May 2002 B1
6392668 Murray May 2002 B1
6396515 Hetherington et al. May 2002 B1
6401067 Lewis et al. Jun 2002 B2
6408323 Kobayashi et al. Jun 2002 B1
6413100 Dickmeyer et al. Jul 2002 B1
6415304 Horvitz Jul 2002 B1
6421678 Smiga et al. Jul 2002 B2
6424979 Livingston et al. Jul 2002 B1
6434567 De La Huerga Aug 2002 B1
6438545 Beauregard et al. Aug 2002 B1
6441753 Montgomery Aug 2002 B1
6442545 Feldman et al. Aug 2002 B1
6442591 Haynes et al. Aug 2002 B1
6456304 Angiulo et al. Sep 2002 B1
6470091 Koga et al. Oct 2002 B2
6473069 Gerpheide Oct 2002 B1
6477510 Johnson Nov 2002 B1
6480860 Monday Nov 2002 B1
6493006 Gourdol et al. Dec 2002 B1
6498982 Bellesfield et al. Dec 2002 B2
6510504 Satyanarayanan Jan 2003 B2
6516321 De La Huerga Feb 2003 B1
6519557 Emens et al. Feb 2003 B1
6519603 Bays et al. Feb 2003 B1
6546433 Matheson Apr 2003 B1
6553385 Johnson et al. Apr 2003 B2
6556972 Bakis et al. Apr 2003 B1
6556984 Zien Apr 2003 B1
6564264 Creswell et al. May 2003 B1
6571241 Nosohara May 2003 B1
6571253 Thompson et al. May 2003 B1
6591260 Schwarzhoff et al. Jul 2003 B1
6595342 Maritzen et al. Jul 2003 B1
6601075 Huang et al. Jul 2003 B1
6604099 Chung et al. Aug 2003 B1
6615131 Rennard et al. Sep 2003 B1
6618733 White et al. Sep 2003 B1
6622140 Kantrowitz Sep 2003 B1
6623527 Hamzy Sep 2003 B1
6625581 Perkowski Sep 2003 B1
6629079 Spiegel et al. Sep 2003 B1
6631519 Nicholson et al. Oct 2003 B1
6636880 Bera Oct 2003 B1
6643650 Slaughter et al. Nov 2003 B1
6654734 Mani et al. Nov 2003 B1
6654932 Bahrs et al. Nov 2003 B1
6658623 Schilit et al. Dec 2003 B1
6687485 Hopkins et al. Feb 2004 B2
6694307 Julien Feb 2004 B2
6697824 Bowman-Amuah Feb 2004 B1
6697837 Rodov Feb 2004 B1
6708189 Fitzsimons et al. Mar 2004 B1
6715144 Daynes et al. Mar 2004 B2
6717593 Jennings Apr 2004 B1
6718516 Claussen et al. Apr 2004 B1
6728679 Strubbe et al. Apr 2004 B1
6732090 Shanahan et al. May 2004 B2
6732361 Andreoli et al. May 2004 B1
6741994 Kang et al. May 2004 B1
6742054 Upton, IV May 2004 B1
6745208 Berg et al. Jun 2004 B2
6766326 Cena Jul 2004 B1
6795808 Strubbe et al. Sep 2004 B1
6802061 Parthasarathy et al. Oct 2004 B1
6826726 Hsing et al. Nov 2004 B2
6829631 Forman et al. Dec 2004 B1
6845499 Srivastava et al. Jan 2005 B2
6857103 Wason Feb 2005 B1
6859908 Clapper Feb 2005 B1
6868525 Szabo Mar 2005 B1
6874125 Carroll et al. Mar 2005 B1
6874143 Murray et al. Mar 2005 B1
6880129 Lee et al. Apr 2005 B1
6883137 Girardot et al. Apr 2005 B1
6898604 Ballinger et al. May 2005 B1
6901402 Corston-Oliver et al. May 2005 B1
6904560 Panda Jun 2005 B1
6925457 Britton et al. Aug 2005 B2
6925470 Sangudi et al. Aug 2005 B1
6944857 Glaser et al. Sep 2005 B1
6948133 Haley Sep 2005 B2
6950831 Haley Sep 2005 B2
6950982 Dourish Sep 2005 B1
6957385 Chan et al. Oct 2005 B2
6963867 Ford et al. Nov 2005 B2
6964010 Sharp Nov 2005 B1
6975983 Fortescue et al. Dec 2005 B1
6976090 Ben-Shaul et al. Dec 2005 B2
6976209 Storisteanu et al. Dec 2005 B1
6981212 Claussen et al. Dec 2005 B1
6986104 Green et al. Jan 2006 B2
6990654 Carroll, Jr. Jan 2006 B2
7003522 Reynar et al. Feb 2006 B1
7013289 Horn et al. Mar 2006 B2
7024658 Cohen et al. Apr 2006 B1
7028312 Merrick et al. Apr 2006 B1
7032174 Montero et al. Apr 2006 B2
7039859 Sundaresan May 2006 B1
7051076 Tsuchiya May 2006 B2
7082392 Butler et al. Jul 2006 B1
7100115 Yennaco Aug 2006 B1
7113976 Watanabe Sep 2006 B2
7146564 Kim et al. Dec 2006 B2
7216351 Maes May 2007 B1
7237190 Rollins et al. Jun 2007 B2
7281245 Reynar et al. Oct 2007 B2
7302634 Lucovsky et al. Nov 2007 B2
7305354 Rodriguez et al. Dec 2007 B2
7392479 Jones et al. Jun 2008 B2
7421645 Reynar Sep 2008 B2
7454459 Kapoor et al. Nov 2008 B1
20010029605 Forbes et al. Oct 2001 A1
20010041328 Fisher Nov 2001 A1
20010042098 Gupta et al. Nov 2001 A1
20010049702 Najmi Dec 2001 A1
20010056461 Kampe et al. Dec 2001 A1
20020002590 King et al. Jan 2002 A1
20020003469 Gupta Jan 2002 A1
20020003898 Wu Jan 2002 A1
20020004803 Serebrennikov Jan 2002 A1
20020007309 Reynar Jan 2002 A1
20020023113 Hsing et al. Feb 2002 A1
20020023136 Silver et al. Feb 2002 A1
20020026450 Kuramochi Feb 2002 A1
20020035581 Reynar et al. Mar 2002 A1
20020038180 Bellesfield et al. Mar 2002 A1
20020065110 Enns et al. May 2002 A1
20020065891 Malik May 2002 A1
20020066073 Lienhard et al. May 2002 A1
20020078222 Compas et al. Jun 2002 A1
20020091803 Imamura et al. Jul 2002 A1
20020099687 Krishnaprasad et al. Jul 2002 A1
20020100036 Moshir et al. Jul 2002 A1
20020103829 Manning et al. Aug 2002 A1
20020104080 Woodard et al. Aug 2002 A1
20020110225 Cullis Aug 2002 A1
20020111928 Haddad Aug 2002 A1
20020120685 Srivastava et al. Aug 2002 A1
20020129107 Loughran et al. Sep 2002 A1
20020133523 Ambler et al. Sep 2002 A1
20020149601 Rajarajan et al. Oct 2002 A1
20020156774 Beauregard et al. Oct 2002 A1
20020156792 Gomboez et al. Oct 2002 A1
20020169802 Brewer et al. Nov 2002 A1
20020175955 Gourdol et al. Nov 2002 A1
20020178008 Reynar Nov 2002 A1
20020178182 Wang et al. Nov 2002 A1
20020184247 Jokela et al. Dec 2002 A1
20020188941 Cicciarelli et al. Dec 2002 A1
20020196281 Audleman et al. Dec 2002 A1
20020198909 Huynh et al. Dec 2002 A1
20030002391 Biggs Jan 2003 A1
20030005411 Gerken Jan 2003 A1
20030009489 Griffin Jan 2003 A1
20030014745 Mah et al. Jan 2003 A1
20030025728 Ebbo et al. Feb 2003 A1
20030046316 Gergic et al. Mar 2003 A1
20030050911 Lucovsky et al. Mar 2003 A1
20030051236 Pace et al. Mar 2003 A1
20030056207 Fischer et al. Mar 2003 A1
20030081791 Erickson et al. May 2003 A1
20030083910 Sayal et al. May 2003 A1
20030084138 Tavis et al. May 2003 A1
20030097318 Yu et al. May 2003 A1
20030101190 Horvitz et al. May 2003 A1
20030101204 Watson May 2003 A1
20030101416 McInnes et al. May 2003 A1
20030106040 Rubin et al. Jun 2003 A1
20030115039 Wang Jun 2003 A1
20030121033 Peev et al. Jun 2003 A1
20030126136 Omoigui Jul 2003 A1
20030140308 Murthy et al. Jul 2003 A1
20030154144 Pokorny et al. Aug 2003 A1
20030158841 Britton et al. Aug 2003 A1
20030158851 Britton et al. Aug 2003 A1
20030167445 Su et al. Sep 2003 A1
20030172343 Leymaster et al. Sep 2003 A1
20030177341 Devillers Sep 2003 A1
20030182391 Leber et al. Sep 2003 A1
20030192040 Vaughan Oct 2003 A1
20030195937 Kircher et al. Oct 2003 A1
20030212527 Moore et al. Nov 2003 A1
20030220795 Arayasantiparb et al. Nov 2003 A1
20030229593 Raley et al. Dec 2003 A1
20030233330 Raley et al. Dec 2003 A1
20040002939 Arora et al. Jan 2004 A1
20040003389 Reynar et al. Jan 2004 A1
20040006564 Lucovsky et al. Jan 2004 A1
20040006741 Radja et al. Jan 2004 A1
20040024875 Horvitz et al. Feb 2004 A1
20040039990 Bakar et al. Feb 2004 A1
20040044959 Shanmugasundaram et al. Mar 2004 A1
20040068694 Kaler et al. Apr 2004 A1
20040083218 Feng Apr 2004 A1
20040133846 Khoshatefeh et al. Jul 2004 A1
20040143581 Bohannon et al. Jul 2004 A1
20040165007 Shafron Aug 2004 A1
20040199861 Lucovsky Oct 2004 A1
20040201867 Katano Oct 2004 A1
20040236717 Demartini et al. Nov 2004 A1
20050050164 Burd et al. Mar 2005 A1
20050055330 Britton et al. Mar 2005 A1
20050094850 Nakao May 2005 A1
20050108195 Yalovsky et al. May 2005 A1
20050120313 Rudd et al. Jun 2005 A1
20050187926 Britton et al. Aug 2005 A1
20060173674 Nakajima et al. Aug 2006 A1
Foreign Referenced Citations (40)
Number Date Country
2 246 920 Mar 2000 CA
ZL200410005390 Oct 2008 CN
0 364 180 Apr 1990 EP
0481784 Apr 1992 EP
0598511 May 1994 EP
0 872 827 Oct 1998 EP
0810520 Dec 1998 EP
1093058 Apr 2001 EP
1280068 Jan 2003 EP
1361523 Nov 2003 EP
1376392 Jan 2004 EP
1 447 754 Aug 2004 EP
1 452 966 Sep 2004 EP
64-88771 Apr 1989 JP
05-174013 Jul 1993 JP
08-272662 Oct 1996 JP
09-138636 May 1997 JP
10-171827 Jun 1998 JP
2000-222394 Aug 2000 JP
2000-231566 Aug 2000 JP
2001-014303 Jan 2001 JP
2001-125994 May 2001 JP
2001-522112 Nov 2001 JP
2001-0350464 Dec 2001 JP
2002-041353 Feb 2002 JP
2002163250 Jun 2002 JP
2002-222181 Aug 2002 JP
2003-141174 May 2003 JP
WO 9507510 Mar 1995 WO
WO 9917240 Apr 1999 WO
WO 0054174 Sep 2000 WO
WO 0067117 Nov 2000 WO
WO 0073949 Dec 2000 WO
WO 0118687 Mar 2001 WO
WO 0137170 May 2001 WO
WO 01186390 Nov 2001 WO
WO 0299627 Jan 2002 WO
WO 0215518 Feb 2002 WO
WO 0242928 May 2002 WO
WO 2004012099 Feb 2004 WO
Related Publications (1)
Number Date Country
20020029304 A1 Mar 2002 US
Continuation in Parts (1)
Number Date Country
Parent 09588411 Jun 2000 US
Child 09907418 US