The present invention relates to a computer-controlled method of capturing and structuring information from a meeting, and a computer-controlled system programmed for performing such a method.
A diagram-based method of capturing an integrated design information space is described in Aurisicchio M, Bracewell R, 2013, Capturing an integrated design information space with a diagram-based approach, Journal of Engineering Design, Vol:24, ISSN:0954-4828, Pages:397-428 (hereinafter referred to as “Aurisicchio”). Various diagrams are described, each comprising a plurality of nodes connected by links.
Conventional methods of capturing information from a meeting include taking minutes (where detail and context is often lost), transcription, or a recording. A transcription and a recording do not apply any structure to the captured information and require a reviewer to watch/listen to the whole meeting to retrieve information.
A first aspect of the invention provides a method according to claim 1. A further aspect of the invention provides a system according to claim 10.
The invention provides a computer-controlled method of capturing and structuring information from a meeting. Audio data is captured with one or more microphones and optionally also with one or more cameras. The information is then structured by storing information timestamps associated with the audio data, generating a diagram on the basis of diagram inputs which reflects the content of the meeting, and generating event timestamps associated with the diagram inputs. The diagram provides the means to structure the otherwise unstructured audio/text as a form of knowledge model, which describes what the audio/text means and where the data fits in the context of the overall meeting. The timestamps provide a means to enable a reviewer to use the diagram as a tool to find relevant parts of the meeting (i.e. to contextualise unstructured data) which are of interest to him without having to listen to the whole meeting, and also to extract information and knowledge from the discussion for future re-use.
Each information timestamp indicates a time associated with the audio data. For instance a stream of information timestamps may be generated automatically as the audio data is recorded. Alternatively the audio data is partitioned into distinct utterances, and each information timestamp is an utterance timestamp indicating a time of receipt of a respective utterance (for instance the beginning or end of the utterance).
The diagram is generated in accordance with a series of diagram inputs received from a human operator via an input device. These diagram inputs are typically received during the course of the meeting, the human operator being a participant in the meeting. Alternatively the diagram inputs may be received after the meeting, the human operator using the audio data to listen to what was said in the meeting and creating the diagram accordingly.
Each event timestamp indicates a time associated with a diagram input which creates, edits or deletes the node or link. If the diagram inputs are received during the course of the meeting, then each event timestamp may indicate a real time of receipt of an associated diagram input. If the diagram inputs are received after the meeting, then each event timestamp may indicate a virtual time of receipt of an associated diagram input within the virtual timeframe of the meeting being played back to the human operator.
The diagram may comprise one of the diagrams described in Aurisicchio, or any other diagram comprising a plurality of nodes connected by links.
Various preferred but non-essential features of the invention are set out in the dependent claims.
Embodiments of the invention will now be described with reference to the accompanying drawings, in which:
A speech-to-text engine 9 is programmed to automatically convert the audio data 6 captured by the microphone array 2 into text data 10 which provides a text transcription of the audio data 6 which is also stored on the data server 8. This automatic text conversion may be performed in real-time during the meeting phase, or after the meeting.
The engine 9 not only converts the audio data 6 into text, but also automatically partitions the text data 10 into distinct blocks or “utterances”, each utterance containing text from only a single one of the participants 5. The engine 9 generates and stores in the server 8 a single information timestamp for each utterance, indicating a time of receipt of the start of the utterance. An information timestamp associated with an utterance is referred to below as an “utterance timestamp”.
The speech-to-text engine 9 uses a speaker diarisation technique which enables each utterance to be attributed to a single one of the participants 5. This can be done through the use of beamforming techniques, as described for example in WO-A-2013/132216 and Zwyssig et al (On the effect of snr and superdirective beamforming in speaker diarisation in meetings, Erich Zwyssig, Steve Renals and Mike Lincoln, ICASSP, page 4177-4180. IEEE, (2012)). Each utterance starts when a new participant starts to speak, and ends when another participant starts to speak.
In an alternative embodiment, the text transcription and partitioning of the text data into utterances may be performed manually by a human (rather than automatically by the engine 9) either during or after the meeting phase.
One, or possibly more than one, of the human participants 5 acts as a draftsman, providing diagram inputs to the user machine 4 during the course of the meeting in order to generate a diagram reflecting the issues discussed in the meeting. The diagram is generated by the user machine 4, stored in the server 8, and displayed on client viewers 11 as it is created during the meeting. An example of a diagram is shown in
The diagram displayed by the data server on the client viewers 11 changes during the course of the meeting phase in response to the diagram inputs to the user machine 4 so that it has a plurality of intermediate forms and a final form. The snapshot shown in
Node 40 is a “problem” node with a graphic element 41 (indicating that the node is a “problem” node) and a text element 42. Node 43 is connected to node 40 by a link 44. Node 43 is a “solution” node with a graphic element 45 (indicating that the node is a “solution” node) and a text element 46. Node 47 is connected to node 40 by a link 48. Node 47 is also “solution” node with a graphic element 49 (indicating that the node is a “solution” node) and a text element 50.
Node 51 is a “pro” node indicating an advantage associated with the solution node 43, to which it is connected by a link 52. Node 51 has a graphic element 53 (indicating that the node is a “pro” node) and a text element 54. Node 55 is a “pro” node indicating an advantage associated with the solution node 47, to which it is connected by a link 56. Node 55 has a graphic element 57 (indicating that the node is a “pro” node) and a text element 58.
Node 60 is a “con” node indicating a disadvantage associated with the solution node 43, to which it is connected by a link 61. Node 60 has a graphic element 62 (indicating that the node is a “con” node) and a text element 63. Node 64 is a “con” node indicating a disadvantage associated with the solution node 47, to which it is connected by a link 65. Node 64 has a graphic element 66 (indicating that the node is a “con” node) and a text element 67.
In a retrieval phase after the meeting phase, a review tool shown in
The scroll bar 33 has a slider 37 which can be moved by a user up and down the scroll bar in order to move in time to a particular point in the virtual timeframe of the meeting. The diagram snapshot that is shown for that point in time is synchronised with the audio/video playback and the display of the speech transcription. In
During the course of the meeting, the diagram evolves through various forms, and
The text transcript pane 32 displays text to a human reviewer via the client viewer 11 in a manner which will now be described in further detail with reference to
If the reviewer is interested in the problem node 40 then he selects that node by clicking on it via the diagram pane 31, and the text transcript pane is updated as shown in
The node 40 has two diagram inputs associated with it: a creation event with an event timestamp of 00:00:06, and an edit event with an event timestamp of 00:34:20. The text transcript pane 32 shown in
The displayed text only gives the reviewer a rough idea of the utterances since it only displays an extract of the text from the utterance. If the reviewer is interested in more information about the node, then he can either click on a selected one of the utterances displayed in the text transcript pane 32 (to be presented with a full transcript of the selected utterance via the pane 32, and/or a video recording of that utterance via the video/audio pane 30, and/or an audio recording of that utterance via the loudspeaker) or he can click a play button 38 on the video/audio pane 30. If he clicks the play button 38 then the video/audio pane 30 sequentially outputs the video data 7 and/or the audio data 6 associated with all seven utterances shown in
If the reviewer is interested in the solution node 46 then he clicks on that node and is then presented with the text transcript pane shown in
If the reviewer is interested in the “con” node 60 then he clicks on that node and is presented with the text transcript pane shown in
Thus the review tool of
In the examples given above a reviewer has clicked on a node to be presented with information associated that node. Alternatively the reviewer can click on a link to be presented with information associated with that link.
If the reviewer is interested in the problem node 40 then he clicks on that node and is presented with the text transcript pane shown in
The displayed text may only give the reviewer a rough idea of the speech, and if he is interested in more information about the node, then he can either click on one of the text boxes displayed in the text transcript pane 32 (to be presented with a full transcript of that five second section of text via the pane 32 and/or or a video of that five second section via the video/audio pane 30) or he can click a play button 38 on the video/audio pane 30. If he clicks the play button 38 then the video/audio pane 30 sequentially displays the video data and audio data associated with all twenty seconds shown in
Another way of utilising the review tool is to move the slide bar 37 to the right so that the diagram displayed in the diagram pane 31 follows a sequence of intermediate forms of the diagram as shown in
The text transcript pane 32 now displays utterances with utterance timestamps close to the selected point in time. For instance
In the example above the reviewer has selected a point in time by using the slider 37, rather than selecting a node. Alternatively the reviewer can use the slide bar 37 to select a node rather than a point in time as follows. If the slide bar 37 is frozen at a point in time after the “con” node 60 has been created or edited but before the next diagram input, then the reviewer is deemed to have selected the currently displayed intermediate form of the diagram (and the “con” node 60 which is associated with it). So rather than displaying a transcript pane associated with a selected point in time, the transcript pane 32 instead displays all utterances associated with that selected “con” node 60 as shown in
Although the invention has been described above with reference to one or more preferred embodiments, it will be appreciated that various changes or modifications may be made without departing from the scope of the invention as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
1406070.1 | Apr 2014 | GB | national |