In modern business environments, presentations that cover even the most interesting subject matter can often be presented in a manner that appears dry and uninteresting to the audience. This can be especially true when the audience feels disengaged from the presenter and has little or no control over the content or flow of the subject matter being presented. In these instances, the presenter may appear to drone on and on with his or her monologue while the audience drifts off, paying less and less attention to the presentation as time goes on. This represents a significant waste of time and resources for both the audience as well as the presenter.
When audience members are free to ask questions of the presenter, a more lively discussion can result. However, especially when larger audiences are present, it is not always easy for all of the audience members to hear the questions being asked of the presenter. Thus, the ability to interact can make the presentation livelier for some audience members, but it does not benefit those audience members who cannot hear the questions, as even the most credible answers are meaningless without having heard the questions. This problem can be partially solved by placing microphones at strategic locations around the room; however, this requires some audience members to wait in line while other audience members monopolize the microphones.
At other times, multiple audience members may try to simultaneously speak, especially during more controversial portions of the presentation. The resulting unintelligible stream of voices can preclude any type of meaningful communication between the members of the audience and the presenter. This loss of control over the audience represents another source of dismay for both the audience members and the presenter.
In the context of the present invention, the term “content” encompasses a broad range of information constructs having meaning to at least some portion of audience 110 or to session manager 100. Thus, content may include predetermined material assembled by the session manager, such as slides that contain bulleted verbal information, bar graphs, charts, and so forth. Content may also include video clips accompanied by audio and other multimedia information. Content may also include any type of information posted on a publicly-available Internet website, or posted to a website available to only individuals within a particular commercial or government enterprise. Finally, content may also include text that corresponds to questions and comments spoken by the session manager or by one or more of members of audience 110. In the embodiment of
In the embodiment of
It is contemplated that one or more of a variety of wired or wireless interfaces may exist between microphones 120 and speaker recognition device 140. Thus, in some embodiments, each one of microphones 120 is mapped to a unique logical or physical communications channel by which the microphone conveys voice inputs to speaker recognition device 140. Thus, in an embodiment wherein each one of microphones 120 is wired to a particular input channel of speaker recognition device 140, the presence of a signal on the particular input channel may be sufficient for speaker recognition device 140 to determine that a particular member of audience 110 has begun speaking.
In another embodiment, each of microphones 120 transmits bandlimited audio that represents the audience member's voice using frequencies in the range of 300 to 3000 Hz. This allows each microphone to be assigned a unique nonaudible signal (for example, less than 300 Hz or greater than 3000 Hz) that associates the audience member to speaker recognition device 140. The nonaudible signal can be a pure tone or a combination of tones. The unique nonaudible tone accompanies the voice transmission and is therefore present as long as the speaker's microphone continues to transmit.
In another embodiment, speaker recognition device 140 possesses a number of logical addresses, with each logical address being assigned to a particular one of microphones 120. In another embodiment, speaker recognition device 140 analyzes an incoming voice signal and determines the speaker's identity by determining the audience member's Mel Frequency Cepstral Coefficients with an appropriate database. In another embodiment, other attributes of the audience member's voice (e.g. spectrum, frequency range, pitch, cadence, interphoneme pause length, and so forth) are compared a database that contains the attributes of the voices of all of the members of audience 110.
In the embodiment of
In another example, it may be desirable that each audience member be allowed to add content at least one time during the presentation. In this example, speaker priority manager 150 assigns initially assigns all audience members an equal level of relative privilege. As each audience member adds content, the level of relative privilege of the audience member is reduced so that all of the audience members who have not yet spoken have priority over those members that have spoken. In a variation of this example, the relative priorities of two or more audience members engaged in a healthy debate may have their levels of relative privilege alternately raised and lowered as each member takes a turn to respond to each other's questions or comments.
In another example, a member of audience 110 who has not previously spoken may be assigned a higher level of relative privilege (such as 0.75) until that member has spoken. In the event that the member's questions or comments lose relevance, that member's relative privilege level may be reduced, thus allowing other members of audience 110 to ask questions and provide comments. In another example, in which a presentation is being given to a charitable organization, those members who have recently made donations to the organization are given higher relative privilege than those members who have not made donations. Therefore, in the event that both a donating and non-donating member speaks simultaneously, the donating member's content will be converted to text and presented to audience 110, while the non-donating member's content is not presented until the donating audience member has finished speaking.
In another embodiment, the dynamic reassignment of levels of relative privilege is the result of direct influence by the session manager. For example, in the event that an audience member, who has initially been assigned a high level of relative privilege, becomes unruly or has attempted to steer the presentation in a counterproductive direction, the session manager 100 may manually reduce the audience member's level of relative privilege to preclude the member from adding any type of content.
Session manager 100 retains the highest level of relative privilege throughout the entire presentation, although nothing prevents the dynamic reassignment of the relative privilege of the session manager at one or more times during the presentation. This may be beneficial in those instances where inputs from certain audience members are deemed to be more important than the comments of the session manager, whose role might be more in line with facilitating a discussion among audience members, rather than presenting content.
In the embodiment of
In addition to members of audience 110 being allowed to present content in the form of questions and comments that are converted to text and displayed to the audience, a member of audience 110 may also redirect content manager 200 to import content from content repository 170 and an Internet server (not shown) by way of network interface 180. To enable this feature, content manager 200 may include VoiceXML technology (outlined at http://www.w3.org/Voice/Guide/) to permit the audience member to import content from either content repository 170, or from a server interfaced to network 190 by way of network interface 180. Other embodiments of the invention may make use of Speech Application Language Tags (http://www.microsoft.com/speech/evaluation/) to provide the ability to redirect content manager 200. Thus, for example, in event that breaking news is relevant to the presentation, session manager 100 or an audience member having sufficient relative privilege can redirect content manager 200 to import content from an appropriate website. The feature can be invoked by the audience member by merely speaking the URL, or its grammar semantic attachment (for example “google dolphins” is substituted with “http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=dolphins”) to display the website at which the content resides.
The relative privilege of the various members of audience 110 can also be used to determine those members who can import content versus those who are not allowed to do so (see
In a previously mentioned example, as the time allocated for the presentation grows shorter and shorter, the relative privilege of all audience members may be reduced, so that the session manager has sufficient uninterrupted time to complete all of the material in the presentation. In a related example, the ability of the audience to import content from content repository 170 or from network interface 190 may also be affected as the presentation nears the end of the allocated time, thus allowing a particular audience member to quickly add content in the form of a voice input without importing an entire slide, which might take several minutes to discuss.
As the embodiment of
In one embodiment, an archive function is implemented using frame/sound capture device 230 reading picture elements (i.e. pixels) from the memory array within a frame buffer (not shown) of display device 120. These picture elements are transmitted to a data converter (not shown) where the data converter converts the picture elements to a standardized format such as a Joint Photographic Experts Group (JPEG), a graphical interchange format (GIF), or a bitmapped file (BMP). The audio recorded during the presentation can be stored as well. Frame/sound capture device 230 can also be implemented using a digital camera or camcorder, which, under the control of content manager 200, occasionally photographs and archives the image presented by way of display device 220 as well as the accompanying sound files.
The microphone of
Microphone 120 additionally includes “next slide” button 302, and “previous slide” button 303. These allow an audience member having sufficient relative privilege to take control of a portion of the presentation. Thus, for example, during a presentation on worldwide sales, several presenters from various sales regions may each wish to present the content that represents the results from each presenter's region. Microphone 120 may also include additional user interfaces for controlling the content and the way in which the content is presented.
In another embodiment, the functions performed by “next slide” button 302 and “previous slide” button 303 are instantiated using by way of voice commands from the audience member in which either of these two commands leads to an immediate interrupt of the voice-to-text conversion process. Additional control commands can also be implemented, thus allowing the session manager or an audience member to jump forward or backward to a particular section of the presentation. For some applications the use of voice commands can bring about a larger command repertoire than would be possible if each command were implemented by way of a discrete button of switch on microphone 120.
In
In another embodiment, a speaker priority manager (150) gradually reduces the privilege levels of the audience member are gradually reduced as the member speaks. Thus, when audience member Ed begins speaking, his level of relative privilege is gradually reduced as time progresses. Thus, as Ed's relative privilege decreases to a level below that of Dave (to 0.75 for example), Dave may be able to interrupt. This provides Ed with some opportunity to add content without allowing Ed to monopolize the presentation.
The table of
The method of
In the event that the decision of step 440 indicates that a command to import content is not present in the voice inputs, step 470 is executed in which content in the form of text that corresponds to the received voice inputs is displayed to the audience. In step 470, the content is displayed in a predetermined location of a slide presented to the audience, such as near the bottom of the slide as shown in
Some embodiments of the invention may include only a few steps of the method of
The method continues at step 520 in which the relative privilege levels of the first and second audience members are determined. Step 530 is then executed, in which content is presented to the audience from one of the first and second audience members depending on the determined relative privilege of the first and second audience members.
In conclusion, while the present invention has been particularly shown and described with reference to the foregoing preferred and alternative embodiments, those skilled in the art will understand that many variations may be made therein without departing from the spirit and scope of the invention as defined in the following claims. This description of the invention should be understood to include all novel and non-obvious combinations of elements described herein, and claims may be presented in this or a later application to any novel and non-obvious combination of these elements. The foregoing embodiments are illustrative, and no single feature or element is essential to all possible combinations that may be claimed in this or a later application. Where the claims recite “a” or “a first” element of the equivalent thereof, such claims should be understood to include incorporation of one or more such elements, neither requiring nor excluding two or more such elements.