System and method for automatic segmentation of ASR transcripts

Information

  • Patent Application
  • 20090067719
  • Publication Number
    20090067719
  • Date Filed
    September 07, 2007
    17 years ago
  • Date Published
    March 12, 2009
    15 years ago
Abstract
Text segmentation based on topic boundary detection has been an industry problem in automating information dissemination to targeted users. A system for automatic segmentation of ASR output text involves boundary identification based on “topic” changes. The proposed approach is based on building a weighted graph to determine dependency in input sentences based on bi-directional analysis of the input sentences. Furthermore, the input sentences are segmented based on the notion of segment cohesiveness and the segmented sentences are merged based on preamble and postamble analyses.
Description
FIELD OF THE INVENTION

The present invention relates to text analysis in general, and more particularly, text analysis of automated speech recognizer outputs. Still more particularly, the present invention related to a system and method for analyzing input text to determine a dependency graph based on bidirectional analysis and to segment and merge input text based on the determined dependency graph.


BACKGROUND OF THE INVENTION

Identification of coherent sections of sentences is a form of text segmentation Processing of output from an automatic speech recognition (ASR) system is a widely applicable scenario for such text segmentation. In a variety of applicable scenarios, the plain text does not contain any title or annotation to hint about the subtopics discussed. Further, there is a need to segment ASR transcripts to determine a group of sentences wherein such a group need not have to have temporal cohesiveness. Text segmentation has been widely applied in topic identification, text summarization, categorization, information retrieval and dissemination.


Consider a scenario of broadcast news packaging for registered users. The users' profile provides information about the kind of news packages that need to be delivered to the various users. As news is broadcast, it is required to analyze the generated ASR transcripts, identify news segments, and combine multiple segments as a package of audio and video for delivery. Another scenario of interest is scene based segmentation of a video. While it is interesting to determine scenes based on video analysis, it is not completely error-free. In order to complement such an approach, it is useful to analyze the associated audio and convert the same to text form using an ASR system, and the segmentation of the generated text could assist in scene segmentation.


DESCRIPTION OF RELATED ART

U.S. Pat. No. 6,928,407 to Ponceleon; Dulce Beatriz (Palo Alto, Calif.), Srinivasan; Savitha (San Jose, Calif.) for “System and method for the automatic discovery of salient segments in speech transcripts” (issued on Aug. 9, 2005 and assigned to International Business Machines Corporation (Armonk, N.Y.)) describes a system and associated method to automatically discover salient segments in a speech transcript and focus on the segmentation of an audio/video source into topically cohesive segments based on Automatic Speech Recognition (ASR) transcriptions using the word n-grams extracted from the speech transcript.


U.S. Pat. No. 6,772,120 to Moreno; Pedro J. (Cambridge, Mass.), Blei; David M. (Oakland, Calif.) for “Computer method and apparatus for segmenting text streams” (issued on Aug. 3, 2004 and assigned to Hewlett-Packard Development Company, L.P. (Houston, Tex.)) describes a computer method and apparatus for segmenting text streams based on computed probabilities associated with a group of words with respect to a topic selected from a set of predetermined topics.


U.S. Pat. No. 6,529,902 to Kanevsky; Dimitri (Ossining, N.Y.), Yashchm; Emmanuel (Yorktown Heights, N.Y.) for “Method and system for off-line detection of textual topical changes and topic identification via likelihood based methods for improved language modeling” (issued on Mar. 4, 2003 and assigned to International Business Machines Corporation (Armonk, N.Y.)) describes a system (and method) for off-line detection of textual topical changes that includes at least one central processing unit (CPU), at least one memory coupled to the at least one CPU, a network connectable to the at least one CPU, and a database, stored on the at least one memory, containing a plurality of textual data set of topics. The CPU executes first and second processes in first and second directions, respectively, for extracting a segment having a predetermined size from a text, computing likelihood scores of a text in the segment for each topic, computing likelihood ratios, comparing them to a threshold, and defining whether there is a change point at the current last word in a window.


U.S. Pat. No. 6,104,989 to Kanevsky; Dimitri (Ossining, N.Y.), Yashchm; Emmanuel (Yorktown Heights, N.Y.) for “Real time detection of topical changes and topic identification via likelihood based methods” (issued on Aug. 15, 2000 and assigned to International Business Machines Corporation (Armonk, N.Y.)) describes a method for detecting topical changes and topic identification in texts in real time using likelihood ratio based methods.


“LEXTER, a Natural Language Tool for Terminology Extraction” by Bourigault D., Gonzalez I., and Gros C. (appeared in Proceedings of the seventh EURALEX International Congress, Goteborg, Sweden, 1996), describes the use of natural language processing to extract phrases by means of syntactical structures.


“Word Association Norms, Mutual Information and Lexicography” by Church, K. and Hanks, P. (appeared in Computational Linguistics, Volume 16, Number 1, 1991), describes the use of statistical occurrence measures for the purposes of phrase extraction.


“TextTiling: Segmenting Text into Multi-Paragraph Subtopic Passages” by Hearst, M. (appeared in Computational Linguistics, Volume 23, Number 1, 1997), “Advances in Domain Independent Linear Text Segmentation” by Choi F. (appeared in Proceedings of the North American Chapter of ACL, 2000), “SeLeCT: A Lexical Cohesion Based News Story Segmentation System” by Stokes N., Carthy J., and Smeaton A. F. (appeared in Journal of AI Communications, Volume 17, Number 1, 2004) describe the methodologies based on linguistic techniques such as lexical cohesion for text segmentation.


“Query Expansion Using Local and Global Document Analysis” by Xu J. and Croft W. B. (appeared in Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1996), and “Text Segmentation by Topic” by Ponte J. M. and Croft W. B. (appeared in Proceedings of the First European Conference on Research and Advanced Technology for Digital Libraries, 1997) describe the approaches based on local context analysis.


“Segmenting Conversations by Topic, Initiative, and Style” by Ries K. (appeared in Proceedings of ACM SIGIR'01 Workshop on Information Retrieval Techniques for Speech Applications, Louisiana, 2001) describes the segmentation of speech recognizer transcripts based on speaker initiative and style to achieve topical segmentation.


“Automatic extraction of key sentences from oral presentations using statistical measure based on discourse markers” by Kitade T., Nanjo H., and Kawahara T. (appeared in Proceedings of International Conference on Spoken Language Processing (ICSLP), 2004) describes the use of discourse markers at the beginning of sections in presentations for detecting section boundaries.


“Domain-independent Text Segmentation Using Anisotropic Diffusion and Dynamic Programming” by Ji X. and Zha H. (appeared in Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2003) describes a domain-independent text segmentation method that identifies the boundaries of topic changes in long text documents and/or text streams based on anisotropic diffusion technique applied to an image representation of sentence-distance matrix.


“Minimum Cut Model for Spoken Lecture Segmentation” by Malioutov I. and Barzilay R. (appeared in Proceedings of the 21 st International Conference on Computational Linguistics of the Association for Computational Linguistics, 2006) describes the task of unsupervised lecture segmentation and applies graph partitioning to identify topic sentences. The similarity computation presented is based on exponential cosine similarity.


The known systems do not address the various issues related to text segmentation including the dependence on lexicon for enforcing syntactic and semantic structures, determining of inter-sentence relationship based on bi-directional (forward and reverse) analysis, assessing of segment cohesiveness, and merging of related segments. The present invention provides a system for addressing these issues in order to achieve more effective text segmentation.


SUMMARY OF THE INVENTION

The primary objective of the invention is to determine a plurality of sentence segments given an input text of plurality of sentences.


One aspect of the present invention is to determine a weighted graph based on bi-directional analysis of a plurality of sentences.


Another aspect of the present invention is to determine cohesiveness of a sentence segment.


Yet another aspect of present invention is to determine a plurality of sentence segments based on segment cohesiveness.


Another aspect of the present invention is to determine a plurality of preamble segments given a plurality of sentence segments.


Yet another aspect of the present is to determine a plurality of postamble segments given a plurality of sentence segments.


Another aspect of the invention is to merge a preamble segment with a sentence segment of a plurality of sentence segments.


Yet another aspect of the invention is to merge a postamble segment with a sentence segment of a plurality of sentence segments.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 depicts an overview of text segmentation system.



FIG. 2 depicts an illustrative input text.



FIG. 3 depicts an illustrative frequency matrix.



FIG. 4 provides an algorithm for dependency graph generation.



FIG. 5 depicts an illustrative dependency graph



FIG. 6 provides an algorithm for cohesiveness based graph segmentation.



FIG. 6
a provides an algorithm for segment cohesiveness analysis.



FIG. 6
b provides an algorithm for segment grouping.



FIG. 7 depicts an illustrative graph segments.



FIG. 8 depicts an illustrative text segments.



FIG. 9 depicts an illustrative merged text segments.





DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Text segmentation has been widely applied in topic identification, text summarization, categorization, information retrieval and dissemination. The plain text under consideration does not contain any title or annotation to hint about the subtopics discussed. It is assumed that sentences of the plain text are separated by periods; however there are no paragraph demarcations. Each sentence needs to be parsed, in an iterative manner, to check if some incoherence exists between sentences. Continuity of a topic, discussed in consecutive sentences, can be identified by means of certain frequency measures of the constituent words across the sentences. The task is analogous to shot detection in video.



FIG. 1 depicts an overview of text segmentation system. The main objective of the present invention is to analyze an input text to divide the same into a cohesive text segments. It further envisaged to achieve this objective without a lexicon providing information about syntactic and semantic substructures. The text under consideration is a set of sentences. An important first step is tokenization (100). In the tokenization phase, the input sentences are decomposed into tokens that are either words or atomic terms. The noise words are filtered out based on a list of stop-words. This list is customized to exclude pronoun related words. The second step is related to stemming (102). In order to eliminate duplicate words, stemming is performed resulting in the root words. Gramming is done to correct spelling mistakes or errors due to speech-to-text conversion. This is performed by evaluating trigrams or sets of three consecutive characters. The third step is to build Frequency Matrix (104). The Frequency Matrix consists of the frequencies of tokens in different sentences. The sequence of sentences is maintained in the same order as they occur in the given text. The tokens are clubbed together based on the (syntactic) relationship of words.


Boundary detection involves in constructing graph representation (106) of a given set of sentences with edge weights that depict syntactic and semantic relationship among sentences. And the dependency graph is analyzed and segmented (108) based on temporal characteristics and where appropriate, the identified segments are grouped (110) based on spatial characteristics.



FIG. 2 depicts an illustrative input text. Note that input text is a general news related text and is based on the text available at the site:


http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html


The sentences in the input text are demarcated by a period.



FIG. 3 depicts an illustrative frequency matrix. The depicted Frequency Matrix is related to the input text depicted in FIG. 2. Please note that only a subset of tokens is provided for illustrative purposes.



FIG. 4 provides an algorithm for dependency graph generation. Graphically, each sentence is represented by a node. The weights on directed edges between the nodes indicate the degree of coherence between the corresponding sentences. These values are derived from the frequency matrix m X n, where m is the number of sentences and n is the number of tokens or filtered words in the given text. Given m sentences, there could be at most (m−1) boundaries. These are initially referred to as candidate boundaries. We assess the strengths of these boundaries B1, B2, . . . Bm−1 by means of linking succeeding sentences with similar tokens. This is based on neighborhood effect and is defined based on FM(k)*log(M/Dj))/Ti wherein FM(k) is the frequency of token k in a sentence, M is the total number of sentences, Dj is the distance to similar token in a subsequent sentence, and Ti is the number of tokens in the sentence. The intuition behind the above formulation is that similar tokens present in neighboring sentences should be given higher weightage than those in sentences that are far apart. This is achieved by incorporating the distance based on the sequence of a sentence in the text. In the above approach, the distance measured by linking forward needs to be unidirectional; otherwise the effect would be reduced. Reverse linking is considered separately, by replacing Dj with the distance to a preceding sentence with the token. Typically, sentences containing anaphoric references are assigned higher weighted links to preceding sentences with the entities, while those with cataphoric references are assigned higher weighted links to the succeeding sentences. Hence, directions are important in distinguishing the segments. The final weight of an edge is computed based on these two forward and reverse linkings.



FIG. 5 depicts an illustrative dependency graph for the input text depicted in FIG. 2. Observe that the edge weights are normalized and almost the set of sentences of the input text appear as a single connected graph.



FIG. 6 provides an algorithm for cohesiveness based graph segmentation. In order to identify segment boundaries, it is required to cut the single connected graph so that multiple segments present in the input text can be determined. The graph segmentation leads directly to input text segmentation as each node in the graph represents an input sentence. While edge weights of a graph play a role in the segmentation process, it is required to assess the cohesiveness of the sentences represented by the graph in order to take a decision whether the graph needs to be further segmented or not. Successive segmentation of the graph leads to smaller and smaller subgraphs, and finally, the sentences represented by each subgraph that remains forms a cohesive text segment.



FIG. 6
a provides an algorithm for segment cohesiveness analysis. The assessment of cohesiveness of a graph is based on the notion of the extent of support each sentence of represented by a node of the graph provides for the rest of the sentences represented by the graph. This support is computed based on the shortest path between two nodes in the graph and the edge weights of this shortest path. And, the cohesiveness is computed based on the normalized pair-wise overall weight of the shortest weighted path across all of the nodes of the graph.



FIG. 6
b provides an algorithm for segment grouping. The need for segment grouping is on account of the observation that there are intersegment relationships that are not based on segmental neighborhood properties. A distinct kind of segment grouping that has practical applications is based on identifying the three portions in the text input: Preamble Text Segments (also called as header segments), Body Text Segments (also called as main segments), and Postamble Text Segments (also called as footer segments): associate a preamble segment with one or more body segments; and similarly, associate a postamble segment with one or more body segments. With reference to FIG. 6b, preamble identification is based on the observation that the successive segments in the preamble segment are of similar size (that is, number of sentences in a segment) and differ drastically with respect to the segments in the body. A similar distinction remains between postamble segments and body segments as described in FIG. 6b. FIG. 6b also describes spatial merging in which a preamble or postamble segment is merged with one or more body segments based on the term co-occurrence between two segments under consideration. Note that as this spatial merging is a special case of general segment merges, an underlying assumption is that a preamble or postamble segment gets merged with at least one body segment.



FIG. 7 depicts an illustrative graph segments. Observe how the header, main, and footer segments stand out after the process of segmentation. One of the header segments involves sentences 1 and 2. And, sentences 8 through 13 form a main segment.



FIG. 8 depicts an illustrative text segments. In the illustration, header, main, and footer segments are also indicated.



FIG. 9 depicts an illustrative merged text segments. Note that a header segment, Segment H1, is merged with a main segment, Segment A. Similarly, Segment H2 is merged with Segment B, Segment F1 is merged with Segment A, and Segment F2 is merged with Segment C.


Thus, a system and method for text based analysis of automated speech recognizer transcripts is disclosed. Although the present invention has been described particularly with reference to the figures, it will be apparent to one of the ordinary skill in the art that the present invention may appear in any number of systems that perform textual analysis for segmentation and segment merging. It is further contemplated that many changes and modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the present invention.

Claims
  • 1. A text segmentation system, TSS, for segmenting a plurality of sentences, said system comprising: (a) Dependency Graph Construction Element for determining a weighted-graph based on bi-directional analysis of said plurality of sentences;(b) Graph Segmentation Element for determining a plurality of sentence segments of said plurality of sentences based on segment cohesiveness; and(c) Graph Merging Element for grouping said plurality of sentence segments.
  • 2. The system of claim 1, wherein said Dependency Graph Construction Element comprises a procedure to compute proportional edge weight between two nodes (first node and second node) of a graph with respect to a token K, wherein said first node is associated with sentence I of said plurality of sentences and said second node is associated with sentence J of said plurality of sentences, said computing comprises: computing FMIK as the number of occurrences of said token K in said sentence I, computing FMJK as the number of occurrences of said token K in said sentence J, computing M as the number of sentences in said plurality of sentences, computing TI as the number of tokens in said sentence I, and computing proportional edge weight PEIJ as FMIK*LOG(M*|I−J|)/TI if FMJK is >0.
  • 3. The system of claim 2, wherein said Dependency Graph Construction Element further comprises a procedure to compute proportional forward edge weight with respect to a node I of a graph, a token K, and a plurality of nodes of said graph, said computing comprises: computing sum of proportional edge weight with respect to said node I and each of node J of said plurality of nodes of said graph, wherein J is >I.
  • 4. The system of claim 2, wherein said Dependency Graph Construction Element further comprises a procedure to compute proportional reverse edge weight with respect to a node I of a graph, a token K, and a plurality of nodes of said graph, said computing comprises: computing sum of proportional edge weight with respect to said node I and each of node J of said plurality of nodes of said graph, wherein J is <I.
  • 5. The system of claim 2, wherein said Dependency Graph Construction Element further comprises a procedure to compute forward edge weight with respect to a node I of a graph, said computing comprises: computing sum of proportional forward edge weight with respect to each of token K of sentence I associated with said node I, and a plurality of nodes of said graph.
  • 6. The system of claim 2, wherein said Dependency Graph Construction Element further comprises a procedure to compute reverse edge weight with respect to a node I of a graph, said computing comprises: computing sum of proportional reverse edge weight with respect to each of token K of sentence I associated with said node I, and a plurality of nodes of said graph.
  • 7. The system of claim 2, wherein said Dependency Graph Construction Element further comprises a procedure to compute edge weight between a node I and a node J of a graph, said computing comprises: computing FIJ as forward edge weight between said node I and said node J, computing RIJ as reverse edge weight between said node I and said node J, computing M as the number of sentences in said plurality of sentences, computing MX as the maximum number of tokens in any sentence of said plurality of sentences, computing MN as the minimum number of tokens in any sentence of said plurality of sentences, computing MIN as LOG(M/(M−1))/MX, computing MAX as LOG(M)/MN, and computing edge weight between said node I and said node J as (FIJ+RIJ)/(2*(MAX−MIN)).
  • 8. The system of claim 1, wherein said Graph Segmentation Element comprises a procedure to compute cohesiveness of a sentence segment of said plurality of sentence segments, wherein said computing comprises: determining G as a graph associated said sentence segment, computing M as the number of sentences in said plurality of sentences, computing TW as the sum of edge weights of said G, computing W as the sum of edge weights of the weighted shortest path between a vertex of said G and another vertex of said G, computing RS as the sum of W between vertex I of said G and each of vertex J (>I) of said G, computing TS as sum of RS associated with each vertex V of said G divided by TW, and computing cohesiveness of said G as TS divided by M.
  • 9. The system of claim 8, wherein said Graph Segmentation Element further comprises a procedure to segment a sentence segment of said plurality of sentence segments, wherein said segmenting comprises: computing the cohesiveness of said sentence segment, determining a plurality of edges of a graph associated with said sentence segment with least edge weights, and partitioning said graph based on said plurality of edges if the cohesiveness of said graph is<a predefined threshold.
  • 10. The system of claim 1, wherein said Graph Merging Element comprises a procedure to determine a plurality of preamble segments of said plurality of sentence segments, wherein said determining comprises: determining the first segment of said plurality of segments and making the first segment part of said plurality of preamble segments if the size of the first segment is less than a predefined threshold, determining the last segment PS1 of said plurality of preamble segments, determining SIZE1 as the number of sentences in said last segment PS1, determining a segment PS2 of said plurality of segments that is successor to said segment PS1, determining SIZE2 as the number of sentences in said segment PS2, and making said segment PS2 part of said plurality of preamble segments if |SIZE2−SIZE1|<a predefined threshold.
  • 11. The system of claim 10, wherein said Graph Merging Element further comprises a procedure to determine a plurality of postamble segments of said plurality of sentence segments, wherein said determining comprises: determining the last segment of said plurality of segments and making the last segment part of said plurality of postamble segments if the size of said last segment is less than a predefined threshold, determining the first segment PS1 of said plurality of postamble segments, determining SIZE1 as the number of sentences in said segment PS1, determining a segment PS2 of said plurality of segments that is predecessor to said segment PS1, determining SIZE2 as the number of sentences in said segment PS2, and making said segment PS2 part of said plurality of postamble segments if |SIZE2−SIZE1|<a predefined threshold.
  • 12. The system of claim 10, wherein said Graph Merging Element further comprises a procedure to merge a preamble segment with a sentence segment of said plurality of sentence segments, wherein said merging comprises: computing co-occurrence frequency count of said preamble segment with said sentence segment of said plurality of sentence segments, and merging said preamble segment with said sentence segment of said plurality of sentence segments if the normalized said frequency count exceeds a predefined threshold.
  • 13. The system of claim 10, wherein said Graph Merging Element comprises a procedure to merge a postamble segment with a sentence segment of said plurality of sentence segments, wherein said merging comprises: computing co-occurrence frequency count of said postamble segment with said sentence segment of said plurality of sentence segments, and merging said postamble segment with said sentence segment of said plurality of sentence segments if the normalized said frequency count exceeds a predefined threshold.