Communication can be challenging for many people, especially in pressure situations like public speaking, interviewing, teaching, and debates. Further, some people find communication more difficult in general because of a language difference, a personality trait, or a disability. For example, a nervous person may often use filler words, such as “umm” and “uhh” instead of content rich language during the communication or may speak very quickly. Other people may have a speech impediment that requires practice or may have a native language accent when they wish to communicate with others of a differing native language. Even skilled public speakers without physical or personality barriers to communication tend to develop communication habits that can be damaging to the success of the communication. For example, some people use non-inclusive language or “up talk” (raise the tone of their voices at the end of a statement rather than a question).
Because communication is such a critical skill for success across all ages and professions, some people choose to engage with communication improvement tools such as communication or speech/speaker coaches or skill improvement platforms to help them improve their communication skills. These tools tend to track metrics like pace, voice pitch, and filler words but lack an ability to drive real skill specific growth. Rather, they tend to be good at helping users rehearse specific content but not at improving their underlying communication skills. Such coaches and platforms tend to be communication event specific—rehearsing for a speech, for example—rather than targeting improvement in a particular communication skill People who engage with these coaches and platforms find they improve their presentation for their intended specific purpose but lack the growth they would like to enjoy by improving the foundational skills that are ubiquitous to all flood communication.
What is needed in the industry is a tool for improving communication skills that allows users to enhance their foundational communication abilities.
Non-limiting and non-exhaustive embodiments of the invention are described with reference to the following drawings. In the drawings, like reference numerals refer to like parts throughout the various figures, unless otherwise specified, wherein:
The subject matter of embodiments disclosed herein is described here with specificity to meet statutory requirements, but this description is not necessarily intended to limit the scope of the claims. The claimed subject matter may be embodied in other ways, may include different elements or steps, and may be used in conjunction with other existing or future technologies. This description should not be interpreted as implying any particular order or arrangement among or between various steps or elements except when the order of individual steps or arrangement of elements is explicitly described.
The disclosed systems and methods train users to improve their communication skills. Communication is critical to every facet of success in life so it touches all human beings whether they communicate with small groups or in front of large crowds. People suffer from various factors that substantially affect their ability to communicate effectively including stage fright, medical conditions, language barriers, and the like. Some people who wish to improve their communication skills hire expensive communications coaches or spend hours in groups designed to help improve an aspect of communication, such as public speaking. Often, these people who engage in the hard work to improve their communications skills tend to have a particular event in mind for which they wish to prepare. That results in an event-specific outcome for those people.
For example, a person hires a communication coach to help them prepare for an important speech. They practice with the coach for months, working on the structure and content of the speech itself, nervous ticks, bad speaking habits or posture, and the like. At the end of this work, the person has a more polished speech ready to give because of the intense, repetitive practice they did specific to the particular speech to be given and the venue at which it is to be given. The person also might enjoy some incremental improvement in their general communication skills as a result of the immense amount of practice. However, that person was never focused on improving the communication skill itself, but instead was focused on improving the quality of a single speech or communication event. The person might receive feedback from the communication coach that they say filler words or hedging words too often, slouch their shoulders when they become tired, or speak too quickly when they are nervous. However, the coach is unable to given them tangible, data-driven feedback that is focused on verbal, visual, and vocal content of the person's communications skills rather than a single performance.
The disclosed systems and methods provide users with feedback over time on the verbal, visual, or vocal content of their communication skills. Verbal content includes the words actually spoken by the person—the content and its organization. For example, verbal content includes non-inclusive language, disfluencies (e.g., filler words or hedging words), specific jargon, or top key words. Specifically, disfluencies are any words or phrases that indicate a user's lack of confidence in the words spoken. Filler words such as “umm” or “uhhh” and hedging words such as “actually,” basically,” and the like tend to indicate the user is not confident in the words they are currently speaking. Any type of disfluency can be included in verbal content or a grouping of disfluencies multiple types or as a whole category can also be included as verbal content. Visual content includes the body language or physical position, composure, habits, and the like of the user. For example, visual content includes eye contact, posture, body gesture(s), and user background(s)—the imagery of the audience view of the user, the user's motion(s) and movement(s), and their surroundings or ambient environment. Vocal content includes features or characteristics of the user's voice, such as tone, pitch, volume, pacing, and the like. The disclosed system and methods can be powered by artificial intelligence (AI) that compares a current input content to previously stored content—either user stored content or content from a sample, such as a speaker that is excellent in a desired skill of focus for user. Standard AI techniques can be used to compare a current content sample to the existing content. When the current content sample is compared to a user's prior content, the user can begin to learn where they are improving (or not) over time. Their progress can be tracked, and they can set goals and standards they wish to meet based on the comparison of their content to past content.
In the example in which the user's current content is compared to a speaker that has a good communication skill the user wishes to learn, emulate, or adopt, the user's current content can be compared to the exemplary speaker in at least one feature or characteristic, such as tone, up talk, physical presence or position, filler or hedging word rate, or any other verbal, visual, or vocal characteristic.
The user, third parties, or a content analysis algorithm provide feedback to the user on the content provided. The user can input feedback about their own content by replaying the content or adding notes into the disclosed system. Third parties can do the same. The content analysis algorithm also generates feedback from the user's content. This feedback can be asynchronous with or in real-time during the communication event. In some systems, some of the feedback is asynchronous and other feedback is output in real-time to the user. For example, the content analysis algorithm provides real-time feedback to the user while the user reviews the content after the event concludes. Third party mentors and friends can provide their feedback in both real-time and asynchronously in this example.
Turning now to
The system maintains a user profile for each user. In this example, the system creates a new user profile 110 if the user communication relates to a user that is not already stored in the existing system library of user profiles. The system makes this determination is any conventional manner, such as comparing user identification information to user communication data stored for multiple users that have already input user communication data. The system can store any suitable number of user profiles, as needed. When the system determines that received user communication relates to an existing user profile, it updates the user profile 110 with the new user communication in the respective category—verbal content, visual content, vocal content, or some combination of these types of content (correlating with the type(s) of information that was received in the user communication). The update 110 allows the AI algorithm to incorporate the analyzed user communication into the user profile so the system can generate empowered feedback. AI algorithms of any kind can be used for this purpose—any AI technique that is able to discern differences between the existing data set in the user profile and the new data set in the analyzed user communication can be used. Over time, the AI algorithm can discern between increasingly smaller differences between the existing user profile data set and the analyzed data set to fine tune the generated feedback.
After the AI algorithm produces differences between the analyzed data and the existing data set for the user profile, the system then generates either real-time feedback 112 or receives or generates asynchronous feedback 114. The real-time feedback 112 is generated by the system and then output 116 to the user during a live communication event. The real-time feedback 112 can also be received from third parties and integrated with the algorithm feedback in another example. Third parties can include human coaches or other audience members and third party algorithms. The third party data can be output to the user in real-time 116 either integrated or compiled with the algorithm data or as separately output data. In an alternative example, the algorithm is not triggered to active or analyze any user communication data, but instead the third party data is received or analyzed by the system and output to the user in real-time 116.
The asynchronous feedback 114 is generated by the AI algorithm or received from a third party in a similar way to the real-time feedback but is instead output to the user after the communication event ends 118. In this example, the third party feedback may not be analyzed by the system and could simply be passed through and compiled with the AI algorithm feedback or simply output to the user in the form in which it was received by the system.
The user can also input asynchronous feedback to the system about their own communication event, such as a self-reflection or notes for growth or edits to content, for example. In this example, the system can ingest any one or multiple of AI algorithm analyzed data and feedback, third party analyzed data and feedback, or user analyzed data and feedback relating to the user's communication event. Like the real-time feedback, in an example in which asynchronous feedback is received from multiple sources—the AI algorithm, third parties, or the user—the feedback can be analyzed and output 118 separately or can be integrated and analyzed in groups or sub-groups, as needed.
In some example systems, the system can output both real-time 116 and asynchronous feedback 118 to the user in any of the forms of data that was received or analyzed. Here, the system would output the real-time feedback 116 during the communication event and the asynchronous feedback after the communication event 118. The real-time feedback during the communication event can differ from the type and substance of the asynchronous feedback after the event because of the source of the received data (AI algorithm, third party, or user) and the depth or type of analysis performed by the system on the received data.
The user communication detection module 202 can also include or integrate with third party systems that ingest user data that is transmitted to the communication skills training system 200 shown in
The server 206 of the communication skills training system 200 has a memory 218, a processor 220, and a transceiver 234. The memory 218 stores various data relating to the user, third party feedback, a library of comparison data relating to communications skills training, the algorithms applied to any data received, and any other data or algorithms relating to or used to analyze data regarding training users on communication skills. For example, the memory 218 includes a user communication profile 222 in the system shown in
The user communication profile 222 also includes algorithm analyzed feedback 230, as shown in
The memory 218 also includes a communication skills library 232 that can include skilled communicator examples that include data relating to one or more video or image segment(s) of skilled communicators. They can be used to train a user by simply allow a user to replay a video of a skilled communicator, such as a famous person or an expert. This library content 232 can also be used as a comparison tool to evaluate against a communication event of the user. The library content can also include examples of poor communication skills, if desired, to show or evaluate a user's performance on defined objective or created subjective measurements of skill level or improvement or growth.
The processor 220 of the communication skills training system 200 shown in
For example, the verbal content module 238 can identify top key words or generate a transcript of the communication event. For example, the verbal content module 238 can identify certain words like hedging words (e.g., basically, very, actually, or basically) or non-inclusive words and provide real-time and post-event asynchronous feedback on such metrics. Still further, the verbal content module 238 can identify words that the user emphasizes by pausing or changing the pace of the word as it is spoken, for example. Such verbal metrics can be mapped to a substantive structure of a user's communication event that is either predetermined or generated post-event.
A user could, in an example, upload an outline of key points to address in the communication event. The verbal content module 238 can then map key words it identifies during the communication event to each key point in the uploaded outline and provide metrics to the user either in real-time or post-event regarding the frequency, depth, and other measures relating to the user addressing the key points of the outline. This can also be blended with the verbal content module 238 tracking filler words, such as “uhhh” or “ummm,” either as a standalone metric or in combination with the key points of the outline to see during which of the key outline points the user said more filler words. The verbal content module 238 can measure and analyze any data relating to the content spoken by the user.
The verbal content module 238 can also output reminders in response to tracking the verbal, spoken content and word choice. Output reminders can be generated and output to the user in real-time during the communication event. For example, if a user is repeating themselves over a particular allowable threshold—identified in similarity by techniques such as natural language processing or keyword detection—the system 200 then triggers an output to the user during the event that the user should progress to the next topic or point in the communication. In another example, the verbal content module 238 can identify a missed point the user wished to make during the communication event based on a pre-defined set of points the user wanted to address during the communication event. If a missed point is identified by the verbal content module 238, then it generates a user prompt to note the missed point and optionally suggest to the user a time or way to bring up the missed point later during the communication event. The suggestion could be timed based on a similarity of the missed point to another point the user wished to make during the communication event that would be part of the pre-defined set of points the user wanted to address.
Even further, the verbal content module 238 can track a user's point for introduction, topics and sub-topic points, supporting evidence or explanation, and conclusion. This tracking can be done by either comparing the verbal content received with the pre-defined content the user inputs or against common words used for introductions, argument or point explanatory development, and conclusions, for example. The tracking can also be used to help prompt a user to move on to the next phase of the point—move from introduction to explaining detail for a first topic, for example. The system can start by identifying key words typically associated with introductions. If the system tracks that the user speaks too many sequential sentences that include typical introduction key words, then the verbal content module 238 can generate a user prompt to encourage the user to progress to the next portion of the point. This can be accomplished by detecting a number of introduction sentences that exceed a threshold, for example, such as three or more sentences identified as introduction content. When the system detects that the user has exceeded the threshold number of introduction sentences, it triggers a user prompt to progress the content to the next portion of the point.
Still further, the user's pre-defined content, such as speaking notes for example, can be mapped to the user's real-time verbal content. The communication skills training system 200 can display an outline of the pre-defined content that is visually shown as having been addressed or not yet addressed during a communication event. Each point in the pre-defined content can be marked addressed or not addressed during the communication event, which appears on the display seen by the user. The display of this tracking of pre-defined content gives the user a visual cue on the remaining content to discuss during the communication event.
In an example, the verbal content module 238 creates a real-time or post-event transcript of the user's verbal content—the precise, ordered words spoken—during a communication event. If the verbal content module 238 creates a real-time transcript, it can also display it for the user or third parties during the communication event. For the post-event transcript example, the transcript can be edited by the user or a third party and can be optionally displayed in simultaneous play with a video capture replay of the communication event. In some examples, the communication skills training system 200 creates both a real-time and a post-event transcript.
The visual content module 240 can identify visual features or parameters of the user during the communication event, which can include the user's position within a speaking environment for example. The user's position can be on a screen if the communication event occurs virtually or can be within a particular ambient environment for the user during a live event. The visual features or parameters can also include body language and position, such as gestures, head tilt, crossed arms or legs, shoulder shrug, body angling, movements typically associated with a nervous demeanor (i.e., foot or hand tapping, rapid eye movement, etc.), and the like. The visual content module 240 can compare captured frames received from the user communication detection module 202 with prior frames of a similar or time-mapped segment of a prior user communication event. Alternatively or additionally, the visual content module 202 can track visual content throughout the entire communication event and compare it to a prior event, an expert event, or a famous person's prior communication event.
The user communication module 226 stores the communication event data 231 feedback produced by the content algorithm 236 and the third party feedback analysis module 244. Users or third parties can access the stored communication event data 231 about any one or more communication events. For example, the stored communication event data 231 can be video and transcripts of multiple communication events. The user and any authorized third parties can access that stored communication event data 231 to analyze it for feedback. Some examples allow the user or third parties to manipulate the stored communication event data 231 by applying edits or changes to any of the stored communication event data 231 when it is replayed or reviewed, such as removing or decreasing filler words, increasing or decreasing the speed of the user's speech, adding or removing pauses, and the like.
The communication skills training system 200 can also include a simulated interactive engagement module 246. The simulated interactive engagement module 246 includes a simulated person or group of people with whom the user can simulate a live interaction during the communication event. For example, the simulated person could be an avatar or a simulated audience. The content analysis algorithm 236 includes a feature in one or more of its verbal content module 238, visual content module 240, or vocal content module 242 that detects spoken language cues or body language that the system then equates with a likelihood that another person, group of people, or an audience would react in a positive, constructive, or negative manner. For example, if the user is talking too fast (measuring speech speed) or repeating the same point several times (key word detection), the verbal content module 238 would detect the speed of the user's speech or the key word frequency is above a threshold rate or value. If the speed or key word frequency breeches the threshold, the verbal content module 238 generates an avatar or members of a simulated audience, for example, to appear to be confused or disengaged. If the user is instead maintaining the speed of their speech within an optimal range and mentioning key words at an optimal frequency, the verbal content module 238 generates the avatar or members of the simulated audience to appear engaged and curious.
The same concept can be applied to the visual content module 240 and the vocal content module 242. The simulated avatar or audience can appear to react in a manner that correlates to the analyzed data relating to the user's body language, position, and movements and also to the users' vocal features and parameters like the user's voice volume, pauses, tone, and the like.
This same simulated interactive engagement module 246 can be useful for training users in multiple types of communication events. The user may wish to practice for an interview, for example with one or more other people. The communication skills training system 200 can receive input from a user about an interview, such as a sample list of topics or interview questions. The simulated interactive engagement module 246 poses the list of questions or topics to the user in a simulated live communication event. As the user progresses through the list of sample questions or topics, the simulated interviewer(s) can be instructed by the simulated interactive engagement module 246 to respond differently depending on the user's metrics in a pervious question or topic. For example, the simulated interactive engagement module 246 tracks key words that a user selected to answer a first question. If the user exceeded a threshold value of the number of times or the variation of the key words used, for example, the simulated interviewer(s) could respond with a pleasant smile or an approving nod.
The transceiver 234 of the server 206 permits transmission of data to and from the server 206. In the example shown in
The communication skills training system 200 also includes a user interface 208 that has a display 246, an audio output 248, and user controls 250 in the example shown in
In an example, the communication skills training system can help a user align themselves with on a display. The display can be a virtual display in some examples or can be a communication event live environment. The virtual display can include a screen in some examples.
Some portions of the display may have different criterion than others. For example, a user may ideally want their head to be centered in a central portion of the display, not positioned too far to the left or right within the display. A camera or other optical imaging device detects the user's position within the central portion of the display. A sub-optimal user position in the central portion would include, for example, the user's body not appearing or only partially appearing (below a measured threshold) in the central portion. The criterion to be met by the user in the central portion is that the user is detected to be physically located in greater than a majority or certain percentage (e.g., −80%) of the central portion. If the user is positioned properly, then the comparison of the user's position in the central portion to the sub-optimal position would not be determined to meet the criterion. However, if the user is not centrally positioned and is instead askew to the right or left exceeding the respective display portion threshold, then the criterion is met. In this example, a position adjustment recommendation is generated 308 based on the comparison meeting the criterion—the user is not centrally positioned. The position adjustment recommendation is then output 310 to the user, a third party, the system content analysis algorithm, or some combination. The output can be in real-time in some examples or asynchronously in other examples.
In the example shown in
An optimal or sub-optimal position can be determined by any objective or subjective criteria. An optimal position, for example, could be set as a position of the user within a central portion of the display with 70% of the user's face positioned in the central display portion 412. The optical imaging device could identify the user's face and its detected perimeter to discern between the user's face and hair. It then then determines if at least 70% of the user's face is positioned within the central display portion 412. The user's face is positioned askew of the central portion 412 in
Notably, the example shown in
Additionally or alternatively, multiple display portions could be evaluated for user position. For example, the user in
In
The communication skills training system can also receive input user data regarding the user's size, shape, contour, or other image characteristics or parameters. For example, the user can input text to describe themselves or can input a sample photo or video of the user from which the system calculates baseline data about the user's general physical features. The system then adjusts ratios based on the parameters or characteristics of the input data, such as the ratio of face to hair, the size of the user's face, and the like.
In the example system shown in
Values are discussed above in reference to
Turning now to
The characteristic of the verbal content segment can be determined that is does not meet a criterion 608. The criterion can be a set value, such a threshold, or a range within which the measured characteristic ideally should be. The method 600 generates recommendation output based on the characteristic of the verbal content segment being determined not to meet a criterion 608. The output can relate to suggested or recommended improvements to the characteristic of the verbal content segment or a related characteristic. The recommendation output is then output 612, such as to the user or a third party. The recommendation output can be transmitted to a display or a third party, for example.
Turning now to
The parameter or characteristic of the visual or vocal content is compared to a visual or vocal content communication standard or goal 706. The visual or vocal communication standard or goal can be determined by the user or a third party like a coach or mentor. The visual or vocal communication standard or goal can also be determined by an objective measure, such as a comparison to a communicator who is skilled in a particular communication skill related to the standard or goal or an objective goal or standard defined by an expert communicator.
The characteristic of the visual or vocal content segment can be determined that it does not meet a criterion 708. The criterion can be a set value, such a threshold, or a range within which the measured characteristic ideally should be. The method 700 generates recommendation output based on the characteristic of the visual or vocal content segment being determined not to mee a criterion 708. The output can relate to suggested or recommended improvements to the characteristic of the visual or vocal content segment or a related characteristic. The recommendation output is then output 712, such as to the user or a third party. The recommendation output can be transmitted to a display or a third party, for example.
In some examples, the user data can include both visual content and vocal content. The visual content and vocal content are compared against respective characteristics of visual and vocal content standards or goals. If one or both of those comparisons are determined not to meet a criterion, then the improvement output is generated that is based on the determination that the comparison of one or both of the visual content or the vocal content did not meet their respective criterion.
Though certain elements, aspects, components or the like are described in relation to one embodiment or example, such as an example diagnostic system or method, those elements, aspects, components or the like can be including with any other diagnostic system or method, such as when it desirous or advantageous to do so.
The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the disclosure. However, it will be apparent to one skilled in the art that the specific details are not required in order to practice the systems and methods described herein. The foregoing descriptions of specific embodiments are presented by way of examples for purposes of illustration and description. They are not intended to be exhaustive of or to limit this disclosure to the precise forms described. Many modifications and variations are possible in view of the above teachings. The embodiments are shown and described in order to best explain the principles of this disclosure and practical applications, to thereby enable others skilled in the art to best utilize this disclosure and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of this disclosure be defined by the following claims and their equivalents.
This application is related to U.S. Non-Provisional application Ser. No. ______, entitled, ______,” filed ______, which are incorporated herein by reference in their entirety for all purposes.