The present disclosure is directed generally to modeling and, more particularly, to constructing activity models (prototypes) from the automated (unsupervised) review of textual documents describing those activities.
Modeling human activities is useful for building a variety of intelligent systems, such as common-sense driven search (Liu et al. 2002) and human daily activity monitoring (Wyatt et al. 2005). A human activity can be defined as consisting of a number of possibly sequenced steps for achieving a certain goal. Being able to model activities provides the opportunity for computers to assist humans in the activity. For example, if the activity is accurately modeled, and the person performing the activity is on step 3, a computer could infer that step 4 is next and provide the materials or instrumentalities needed for step 4. Computers could be used to monitor the elderly or infirm to determine if they are performing an activity correctly. Many other possibilities are found in the literature.
Activity models have been studied from the early days of AI and common sense knowledge systems in the forms of frames and scripts (e.g., Minsky 1975; Schank and Abelson, 1977). Both models promote the use of relatively large and prototypical structures for representing activities as a type of common sense knowledge. To deal with the knowledge acquisition bottleneck, recently, researchers have gone to the Web for common sense knowledge acquisition, relying either on public input (Singh et al. 2002; Matuszek et al. 2005) or on particular genres of Web documents (Perkowitz et al. 2004; Wyatt et al. 2005).
Recent research on constructing or extracting activity models from text builds upon the assumption that there is a mapping between human activities and textual descriptions of these activities, and thus models of human activities can be constructed or extracted from text. The process of constructing or extracting is sometimes referred to as mining. In the prior art, all activities are assumed to have similar structures and their models are assumed to be amenable to similar methods of construction. Our empirical analysis of textual activity descriptions shows that descriptions of activities are not all alike; for instance, they vary in the sequencing characteristics of the steps.
The disclosed method and apparatus are directed to the automated, or unsupervised, construction of activity prototypes (i.e. models of activities comprised of a number of steps) from a plurality of textual documents. One embodiment of the method is comprised of: extracting prototypical steps from a plurality of textual documents; sequencing the extracted steps; aligning the sequenced steps; and storing the aligned steps. In an alternative embodiment, the steps may be labeled. In another embodiment, a model is built from the stored, aligned steps. The model may take the form of a step vs. position matrix. The matrix may identify the prototypical steps that make up the activity and provide the probability of each step occupying each position within the activity. The model thus constitutes common sense knowledge that encodes the stereotypical steps of an activity and the stereotypical sequencing of the steps.
According to another aspect of the present invention, an apparatus is disclosed for performing the method of the present invention.
For the present invention to be readily understood and easily practiced, various embodiments will now be described, for purposes of illustration and not limitation, in conjunction with the following figures, wherein:
An activity consists of steps that can be described in text in a variety of ways. Some documents concentrate on the steps comprising the activity, while other documents provide more background and elaboration along with the description of the steps.
An activity prototype (model), consists of the prototypical steps of an activity and the prototypical sequencing of the steps. While variant activity descriptions may vary in content and style, the activity prototype (model) captures the commonality of the variant descriptions.
Certain definitions will now be introduced. The following definitions are not intended to be the only manner in which an activity prototype may be defined or expressed, but are provided as one embodiment of a definition and expression of the activity prototype.
An activity sequence s may consist of a sequence of k steps s: {tl, . . . , tk} in a specific order, where k is the length of s.
Multiple sequence alignment: Let T be a finite set of steps. Let the character “-” represent inserted gaps. Let sl, . . . , sk be k sequences over T with lengths nl, . . . , nk. A multiple sequence alignment of Sl, . . . , sk is a matrix k×l with the following four properties:
A[i][j[εT∪{“-”} l≦i≦k, l≦j≦l
The ith row without blanks equals s,
No column consists entirely of blanks
As an illustration, a multiple sequence alignment of eight activity sequences (with the letters A through I denoting the steps sauté onion (A), add ingredients (B), heat/boil (C), simmer (D), blend/puree (E), add cream (F), heat (G), season (H), serve (I)) for the activity “making pumpkin soup” may be represented as follows:
Activity Prototype (P): Let T be a finite set of m steps T: {tl, . . . , tm} including the character “-” representing inserted gaps. Let A be a multiple sequence alignment of length l over k sequences, e.g., l is the number of positions in the global alignment and k is the number of documents (sequences). The prototype P of A is a matrix of dimension m×l with the following properties:
For the examples shown above, the prototype for “making pumpkin soup” is as follows:
This definition of an activity prototype is based on a multiple sequence alignment of the activity sequences, where each cell in the matrix represents the probability of observing a certain step at a particular location in the global alignment. An ideal profile has one cell with probability 1.0 in each column, while a perfectly useless profile has all cells of equal probabilities.
Given an activity, the process of constructing its prototype 19 from a corpus of textual documents 20 involves several steps as shown in
After the classifier 13 is built, the web 12 is searched at 16 to retrieve a large number of documents 15. The documents 15 are reviewed at step 18 by the classifier 13, and those documents that are determined to be relevant are added to the corpus of textual documents 20. The manually retrieved documents 11 from step 10 can also be added to the corpus 20.
As is known, text descriptions of the same activity can vary in style and in content. Some texts are more concise, while others include more background and elaboration. We anticipate that a candidate prototypical step of an activity should be a step that is distributed/described in many different documents and is a step that is represented in different documents by semantically similar text units.
Returning to
The foregoing procedure is illustrated in
Once the documents are partitioned into candidate steps, we use at step 30 in
where X and Y represent the set of key words in two sentences. Clustering can be based on complete link, single link, or average link. A similarity threshold can be used for stopping linking of clusters with similarity scores below the threshold. A variety of features can be used as term features for clustering, such as simplex NPs, included sub-terms, verbs, and adjectives (excluding stopwords).
It is desirable for sentences to cluster together based on word overlap that is due to genuine semantic relatedness. Noise can be caused, however, by word overlap from spurious, idiosyncratic word choice of individual authors. We introduce two measures to nominate clusters as candidate prototype steps.
The first measure is Diversity (d) which captures the number of documents that are covered by the cluster. A prototype step needs to cover more than d documents (e.g., d>3).
The second measure is ClusterSize (g, h): A prototype step should have between g and h items in the cluster, discarding clusters that are too small or too big. Values for g and h are a function of the number of documents and the average number of sentences per step.
The following table illustrates a segment of auto-extracted prototype steps with d>2, g>2 and h=∞ for the “making pumpkin soup” activity.
Optionally, the clusters 32, 34 through n can be labeled for ease of interpretation. We used a “most frequent words” label, but many alternative techniques are available (e.g., Treeratpituk & Callan 2006). For example, in
In general, accurate sequencing 44 of activity steps 21 can require complex temporal reasoning about time points and intervals, such as when activities are described in a narrative style. Because we restrict the genre to “how-to” texts, we simplify by equating the order of the steps in the text to their sequence.
In the procedure illustrated in Table 2, we represent each document with a sequence of cluster labels that is ordered by the appearance of the clusters' constituent sentences in the original document text. For example, in
For the alignment step (see step 45,
We use, for example, the T-COFFEE MSA software to compute alignment scores and visualize the prototype steps. The reader is referred to Notredame et al. (2000) for details of alignment computation.
In
After the steps are aligned at 45 (
Prototypes or models may be categorized into four types or topologies depending upon whether all steps are required and whether steps need to be critically ordered, as shown below:
Sequential instructions comprise a series of steps that must be performed in order. An example is a standard recipe, like this one for pumpkin soup:
Order is critical, and all steps are important for activity completion.
Non-sequential instructions consist of steps that must all be performed, but whose order is unimportant. An example is this set of instructions for performing 50,000-mile maintenance on a car:
1. Perform a general tune-up—check the plugs, plug wires, belts, coolant, filters and timing.
2. Change the oil and oil filter.
3. Check the tires for wear. Replace as necessary.
4. Inspect the brakes. Service as necessary.
5. Change windshield wiper blades.
6. Touch up any scratched paint or minor body damage
7. Check for rust.
While every step is necessary, the steps can be performed in any order. There is no logical reason that the oil must be changed before the tires or brakes are inspected.
Escalating instructions involve steps that should be followed in order, but only until success. For example, here are some instructions for shutting off a car alarm (abbreviated to save space):
6. As a last resort, disconnecting the battery's negative terminal will stop the alarm, but it will also keep your car from starting.
While Steps 3 through 5 here are sequential, Step 1, Step 2, the sequence of Steps 3-5, and Step 6 constitute alternatives. Try Step 1 first (Step 1 here is actually a preventive step—this is something you should do before the situation arises). If Step 1 is successful, there is no need to try any additional steps; but if it is unsuccessful, you should try Step 2. If Step 2 is successful, there is no need to go on; if it is unsuccessful, you should try the sequence of Steps 3 through 5. If that is successful, there is no need to go on; if unsuccessful, you should try Step 6. The steps are usually ordered from the easiest/safest alternative to the most difficult/risky.
Non-sequential suggestions need not be performed in order, nor is it necessary to complete all of the steps. A person can pick and choose whichever “steps” seem easiest or most promising. For example, here are “instructions” for teaching a child to clean his or her room:
A parent might be successful in this endeavor using only steps 2 and 3. If the parent is successful, there is no need to follow the remaining steps.
A given set of instructions may not fall neatly into a single category. Sequential or non-sequential instructions may have optional steps, often towards the end. Some lists may appear to be escalating instructions for some sub-sequences but non-sequential suggestions for others; also, a reader may reorder escalating instructions if he or she disagrees with the writer's assessment of which steps are more difficult and risky. This knowledge of topologies is not required for practicing the method set forth in
We manually constructed prototypes of 8 activities as Gold Standard (GS) prototypes from the text descriptions of activities—2 different activities for each type based on the typology described above. For a given activity, first, we collected 4-8 different “how-to” Web pages. Then the Web pages were manually aligned with labels denoting activity steps that represented similar prototypical actions (e.g., sautéing ingredients) across the multiple descriptions. Then, we filtered out all steps that did not occur in at least two descriptions of the activity. Finally, we discarded background, clarification, or elaboration sentences, leaving only the central sentences in each step. The GS prototype of an activity thus consists of a set of clusters representing activity steps, each of which consists of sentences from different documents representing the step. The following discussion and the evaluation results reported below are based on the 8 activities with a GS.
Table 3 provides the statistics of the corpus and the GS prototypes. On average, a transformation from general text descriptions of an activity to its prototype involves 73.9% reduction in content. This reduction rate is comparable to existing multi-document summarization work (Goldstein et al. 1999).
Our analysis shows that although most activity steps are described in text by more than one sentence, the steps can be sufficiently represented or summarized by single sentences; most other sentences only provide background, elaboration, and clarification. In the manually prepared Gold Standards, more than 75% of the steps are represented by single sentences from texts.
We evaluate the clustering results against the manual classification of the activity steps in the GS. The first measure is the F-measure. Suppose there are k classes in GS. Suppose there are m clusters extracted by the system, ni is the number of sentences of a particular class Li, nr is the number of sentences of a particular cluster Sr. Suppose nir is the number of sentences of gold standard class Li in Sr. Then the F score of this class and cluster is defined to be:
where R(Li, Sr) is the recall value defined as nir/ni and P(Li, Sr) is the precision value defined as nir/nr for the cluster Sr against the class Li. The F score of the cluster Sr is the maximum F score value attained against all classes:
The F score of the entire clustering solution is the sum of the individual cluster F scores weighted according to the cluster size (n is the total number of sentences):
To evaluate whether semantically similar sentences are grouped into clusters, we use the purity metric, often used in evaluations of clustering:
Intuitively, a cluster whose items come from few GS classes will have higher purity than a cluster that mixes many GS classes.
We evaluated our procedure over the activity corpus described above. We compared four runs for clustering: All-GS and NP-GS (using all features Simplex NP+Verb+Adj and only Simplex NP features respectively over sentences from GS); All-Sys and NP-Sys (using all features and NP features respectively over all sentences from corpus). The cluster size was set to between g>2 and h=∞. As it was not clear from the experiments what the optimal diversity was, the results were based on the averages from diversity d ranging from l to the number of the total number of documents of an activity.
For alignment, the Manual baselines were computed according to the human labeled step sequences. All other alignments were computed based on sequences built upon their respective step clusters.
When clustering is applied to the GS sentences for automatically grouping them into activity steps, we have observed that purity and F scores are ordered in the sequence NI>EI>NS>SI (
Find the oil drain plug [oil drain plug]
Place the drain pan underneath the plug [drain pan, plug]
Using your wrench unscrew the drain plug [wrench, drain plug]
Screw the plug back in [plug]
Contrast this with an excerpt from a “winterizing car” description (NI):
Check antifreeze mixture [antifreeze mixture]
Carry an emergency kit inside the car [emergency kit, car]
Inspect the wipers and wiper fluid [wipers, wiper fluid]
Check the battery [battery]
Change the engine oil and adjust the viscosity grade [engine oil, viscosity grade]
As we can see, SI type instructions impose strong sequencing constraints and semantic coherence constraints; thus the semantic distances between subsequent steps are small and harder for clustering to separate. In contrast, in NI and EI type instructions, the steps are generally quite independent, thus the semantic distances between the steps are quite large and easy for separation via clustering.
Turning to
As mentioned earlier, we compute MSA using default T-COFFEE settings. T-COFFEE computes an alignment metric (Notredame & Abergel 2003) that can be used to assess the quality of MSA. First, with the alignment metric, we can see that some types of activities generally align better than others; MSA over the gold standard produces higher alignment scores for sequential and escalating instructions than for non-sequential instructions and suggestions: SI>EI>NI>NS. It is not surprising that the latter two activities, where the order of steps is not critical, align less well. When clustering is used for extracting steps automatically, it is as expected that the alignment scores suffer as noise is introduced into the step clusters. Also observe that, with automated clustering, the alignment scores decrease significantly with the complete corpus (All-Sys, NP-Sys) compared with those with the GS corpus (All-GS, NP-GS) respectively (α<0.001 for both). This suggests that improving clustering is the first imperative step in achieving better step alignment.
See
In evaluating both clustering and alignment, we have compared using two types of features: All (including simplex NP, verbs, adjectives) and NP (simplex NPs only). With the F, purity, and alignment scores, there are overall no significant differences statistically between the two types of features. This validates empirically the observation by Perkowitz et al. (2004) that activity steps can be effectively modeled based on the set of objects involved at the respective steps.
Residing within computer 112 is a main processor 124 which is comprised of a host central processing unit 126 (CPU). Software applications 127, such as the method of the present invention, may be loaded from, for example, disk 128 (or other device), into main memory 129 from which the software application 127 may be run on the host CPU 126. The main processor 124 operates in conjunction with a memory subsystem 130. The memory subsystem 130 is comprised of the main memory 129, which may be comprised of a number of memory components, and a memory and bus controller 132 which operates to control access to the main memory 129. The main memory 129 and controller 132 may be in communication with a graphics system 134 through a bus 136. Other buses may exist, such as a PCI bus 137, which interfaces to I/O devices or storage devices, such as disk 128 or a CDROM, or to provide network access.
While the present invention has been described in conjunction with preferred embodiments thereof, those of ordinary skill in the art will recognize that many modifications and variations are possible. For example, the present invention may be implemented in connection with a variety of different hardware configurations. Various extraction, sequencing, labeling, and alignment techniques, among others, may be used and still fall within the scope of the present invention. Such modifications and variations fall within the scope of the present invention which is limited only by the following claims.