The present invention relates to an intelligent keyboard that provides suggested phrases to start conversations on computing devices. Electronic communication via texting, iMessage, email, Slack, Microsoft Teams, dating applications and many other applications has become extremely common place in today’s virtual world. While word suggestions, auto-complete and auto-correction are well known in today’s mobile devices such as Apple’s iPhone, Google’s Android devices and various laptop/desktop computer application, today’s devices do not suggest entire phrases to start conversations based on the category of conversation and intent of the user. The present invention relates to intelligent keyboards for installation on mobile operating systems like Apple iOS and Google Android, or on desktop/laptop operating systems like Apple MacOS, Microsoft Windows, Linux and Unix.
The present invention is an intelligent keyboard for mobile devices. This intelligent keyboard provides the user with suggestions of relevant words or phrases that can be used to start or continue a conversation on text message, email and/or various web applications. The intelligent keyboard provides conversation suggestions that are appropriate for given application contexts, categories and conversation types. The intelligent keyboard uses user generated content from application users, usage history, profile data, dialogue data, platform generated content from the system managers/owners, content collected from various websites/integrations and natural language content generated by artificial intelligence. Content is ranked by preference, contextual suitability and performance. Content is further tagged for application context. User behavior, user data and artificial intelligence models continuously update the system so that the relevance and performance of keyboard content is optimized.
The following detailed description refers to the preferred embodiment of the disclosed invention as shown in the attached figures and in the below description. This detailed description is not meant to limit the scope of the invention in any way but is intended to disclose the preferred embodiment/best mode of the invention at the time of filing this application.
Phrases on the platform are also organized by Category. Categories include Favorites, Advice, Openers, Banter, Connect/Disconnect and others. Under each Category, there are several conversation types, known as Intents. These Intents include, but are not limited to, curious, ridiculous, flirty, dog, nature, challenge, basic, date at home and today. These are examples of Categories and Intents included in the preferred embodiment, but any number of additional categories or types are anticipated in the invention. Under each Category, several complete phrases are provided to start or continue specific conversations.
The keyboard is designed to be an intent-based communication assistant. The purpose of the keyboard is to assist a user who has an idea of what they want to say, i.e. an intent, but not the specific words to make the statement. In this situation, the user can tap an intent and be presented with a list of content to choose from, and messages to send. The user can then choose the content they prefer and send the message populated with the chosen content.
The message lifecycle, i.e. the cognitive steps for sending a message, is generally the same for each message. The user’s cognitive steps for each message are: (1) determine intent for the message, (2) develop specific words that express the intent, (3) type those letters/words, (4) edit and (5) send the message. The intelligent keyboard simplifies steps 2 through 5 of the process. Using the intelligent keyboard, the user can start with an intent and skip directly to sending an appropriate message
User generated content 13 includes content generated by users on the platform. The platform collects and tags content generated by all the users on the platform and utilizes ranking algorithm 14 to determine the top performing phrases across the platform. Phrase performance on the platform is determined by a number of factors, including category views, interaction frequency for each category, phrase views, phrase sends, peak usage times (by day of week and time of day), user voting on content by an up or down vote and many other factors. Top performing phrases are published across the platform so that any user can access the phrase from the keyboard as a conversation suggestion.
For purposes of this application, message data means any amount of data, no matter how large or small, that forms all or part of a message. Message data stored in human generated content 2 is stored in structured databases well known in the art. These structured databases allow easy search, location and access to message data utilized on the platform. Part or all of these databases may be stored on remote servers or on the user’s local device, depending on the configuration of the software and other factors such as frequency of access by user, and speed of data access required.
The type of artificial intelligence used in this system is utilized for language tasks in natural language processing, such as classification, part-of-speech tagging and text generation. Specifically, when a set of words or a phrase is sent to the AI API, one or more Artificial Intelligence Models will return the most appropriate set of words or phrases in response to what is sent relative to the configurations set in AI configuration 15. In this way, the AI can generate sentences that, in context, mimic a normal human conversation. This AI generated content is constantly tagged, categorized and saved. The present system uses this natural language capability to generate conversation content and responses that are presented in the intelligent keyboard. The system further uses AI in content tagging 16 to tag keyboard phrases with various keywords that allow the system to present the most effective phrases for a given context on a user keyboard.
Artificial Intelligence Models API 17 provides an API connection for the platform to communicate with Artificial Intelligence Models 3. Artificial Intelligence Models API 17 receives specific configurations and content. They return generated text, classifications, tags and other content. For example, if “How are you?” is sent to the Artificial Intelligence Models API, it might respond “I am doing well, thank you.” The Artificial Intelligence Models use content to optimize such natural language responses so that they seem as close to human conversation as possible. The intelligent keyboard uses such natural language AI responses to generate phrases for the keyboard.
Generally an application programming interface (API) is queried by sending data structured in a specific format to the API. Upon receipt of such an appropriately formed API request, an API will process the request and return data as an output. Content received by the Artificial Intelligence Models API can include configuration data that is needed by Artificial Intelligence Models 3 to generate a response, including dialogue history, profile data, and other data used by the Artificial Intelligence Models 3 to generate predictions. Predictions can take the form of: 1) classifications, 2) text responses such as single words, phrases, sentences or paragraphs, or 3) vectors, tensors and other mathematical objects.
Message data stored in artificial intelligence models 3 is stored in structured databases well known in the art. These structured databases allow easy search, location and access to message data utilized on the platform. Part or all of these databases may be stored on remote servers or on the user’s local device, depending on the configuration of the software and other factors such as frequency of access by user, and speed of data access required.
Content scraping 18 includes the code and logic to collect and process content from the user’s mobile device. Content scraping 18 is designed to take collected content from various different applications and integrations, then structure and tag the content in a consistent manner so it can be utilized across the intelligent keyboard application. The logic in content scraping 18 is developed with the specific applications/integrations disclosed in this preferred embodiment but could be extended to include nearly any application or integration that is added to the intelligent keyboard product in the future.
Email integration 19 allows the system to collect conversation from the user’s email application to determine their language style. Such email applications include the native iOS mail application as well as other third-party email applications such as Gmail. Email is an excellent location to collect data about a user’s conversational style and content, so it is a primary data source for the intelligent keyboard application. While email tends to be longer form conversation, iMessage or SMS tends to show a user’s conversational style for shorter form conversation. For this purpose, iMessage/SMS integration 20 collects conversational data from the user’s Apple iMessage and/or text message accounts. This conversational data is vitally important for the intelligent keyboard application because it reveals the user’s style and content for shorter form content.
Calendar 21, maps 22, search 23 and contacts 24 collect data from the calendar app, map app, search engines and contacts applications respectively. These apps provide useful structural data to enhance the system’s understanding of the user’s conversational style and content, but also location data, search history and frequent contacts. In the preferred embodiment, the intelligent keyboard is able to use the enhanced context provided by calendar 21, maps 22, search 23 and contacts 24 to add events to a user’s preferred calendar application without leaving the intelligent keyboard.
Third-party apps 25 allows the system to collect data from specific third-party apps such as Yelp, Foursquare, Opentable, Calendly, Instagram, or Facebook to provide hyper specific content, recommendations, opportunities or offers relevant to the user. While the preferred embodiment of the disclosed invention includes integrations with these third-party apps, many other could be included and are consistent with the invention disclosed here.
Message data stored in integrations 4 is stored in structured databases well known in the art. These structured databases allow easy search, location and access to message data utilized on the platform. Part or all of these databases may be stored on remote servers or on the user’s local device, depending on the configuration of the software and other factors such as frequency of access by user, and speed of data access required.
Content scraping 26 is a code block designed to capture, process and tag content collected via public API from key websites. The key function is to capture unstructured content data, then process it into a structured dataset that can be utilized to populate the intelligent keyboard with useful content for given user contexts. In the preferred embodiment, this functionality is optimized for sites like Reddit 27, Twitter 28 and Instagram 29, but also includes other web content 30 and third-party apps 31. Many websites with or without public APIs are envisioned by the present invention. Reddit, Twitter and Instagram all support public APIs that allow the system to make structured data requests of those platforms so that conversational and other data can be captured and processed for the intelligent keyboard product. Web content 5 is connected to the internet 6. As described in relation to other message data, web content 5 is stored in databases well known in the art.
Activity tracking 37 monitors the activity of the user to build a stronger profile of the user. This information is used to build, improve and optimize the user profile described above. Activity tracking 37 specifically tracks the type of content engaged with by the user and in what contexts. Activity tracking 37 tracks the following interactions: which Categories and Intents are most and least used by a user, when users are most or least active during the day and week, how often a user accesses the intelligent keyboard during a period of time, engagement patterns over time including peaks and troughs throughout the year, and various internal improvements, such as the effect on user engagement of releasing new features or device notifications. This allows the system to more accurately predict the conversational phrases preferred by the user, but also to display additional equivalent and previously unused/under used, opening and response options. Thus activity tracking 37 can be utilized to build the behavior criteria utilized by the platform to suggest content. Configuration block 38 allows the user to control key features of the intelligent keyboard including keyboard activation, notifications and keyboard content including favorites, openers, banter, connect, disconnects and key questions. The intelligent keyboard configuration options are discussed in more detail in
Keyboard toggle 41 allows the user to toggle between keyboard types in iOS, including the intelligent keyboard, standard keyboard and other enabled keyboards. Intelligent keyboard content 42 is a listing of the suggested content for a specific category 40. The user will tap one of these content bubbles 45 to populate the text input window 39 with words and phrases. Help button 43 provides conversational guidance from the intelligent keyboard. Back button 44 inputs the most recently viewed content into the text input window 39 in reverse chronological order.
The intelligent keyboard has two modes: 1) Pro mode and 2) AI mode. In Pro mode, the keyboard content is written for the platform owners by experts in platform generated content 12 and stored in application database 10. In AI mode, all keyboard content is generated by artificial intelligence models 3.
In
Once the intelligent keyboard is activated, the user chooses the keyboard mode: Pro mode or AI mode. In step 63, the user selects Pro mode. In step 71, the user could select Al mode, which is shown in detail in
While the user is selecting content in the above steps, the intelligent keyboard is monitoring the user’s context, i.e. the location the keyboard is used, such as iMessage/SMS/text, email, dating apps, and others. The intelligent keyboard is also monitoring the keyboard content tapped, selected or sent by the user (user interaction). This information is passed to the system to optimize the content presented to the present user as well as other users of the intelligent keyboard. This first occurs at step 62 when the intelligent keyboard is enabled. At this step, all known configurations for the intelligent keyboard are loaded. These configurations are stored on the local user application 8, the servers 7 and application database 10. These configurations are updated in real time based on utilization of the intelligent keyboard by the present user and all users of the platform across the internet. Further, as a user interacts with content category step 65, intent step 66, text box step 67 and send step 68, the platform tracks this interaction and sends to ranking algorithm 14 and activity tracking 37. As this data is tracked, it is passed to database 10, web app 9 and to the system’s proprietary algorithm 74. The activity data is processed and used to further optimize the content displayed to users in specific contexts, categories and intents. The system also monitors if any of the stored metadata is changed/updated/affected in step 69. These updates to metadata are generally due to user activity such as storing a favorite or interacting with content. These changes are routed through the system elements such as the user’s local application storage 8, server 7, database 10, proprietary algorithm 74 and web application 9.
System software typically includes operating system 207, which in this case is generally Apple’s iOS mobile operating system or Google’s Android operating system. Any suitable operating system could be used, including Microsoft Window’s, Apple MacOS, LINUX, Unix or any other operating system known in the art. Intelligent keyboard application 208 is the local version of the software running on a user’s device. This version of the software displays the intelligent keyboard and stores the user’s preferences as discussed previously in relation to
The intelligent keyboard system 1 shows the remove servers related to the functions of the intelligent keyboard. These individual modules and databases have been discussed in depth in above, but this representation shows the remote nature of the server based software modules and databases. The software modules and databases can be executed on any suitable server connected to the internet, but in the preferred embodiment, servers hosted on Amazon Web Services are utilized. The intelligent keyboard application 208 communicates through network 209 to servers that host intelligent keyboard system 1. These servers further communicate through network 209 with the various internet services discussed prior including Reddit 27, Twitter 28, Instagram 29, other suitable web content 30 and 3rd party apps 31.
Third party user devices 211 also connect through network 209. These are other user devices that receive communications from the user of the intelligent keyboard application. While the figure indicates that third party user devices 211 are mobile phones, such devices could also be desktop or mobile computers receiving communication.
Lastly, web store/application store 210 is connected to network 209 and thus to user device 200. In the preferred embodiment, web store/application store 210 is an online store for downloadable applications or other downloadable content such as the Apple App Store or Google Play Store. While this aspect of the intention is described in relation to these existing web store/application store 210, any suitable type of online marketplace could be utilized. Here, the user could download the intelligent keyboard application 208 or paid extensions to the intelligent keyboard application 208. It is anticipated that libraries of keyboard content in particular voices could be sold in app for purchase and download through these web store/application stores 210.
A screenshot in this context is any image captured of the screen of a computing device, including any mobile device, smartphone, tablet computer, laptop/notebook computer and/or desktop computer. Many devices include built-in functionality to capture such an image, i.e. a screenshot. Further, many software applications exist to capture screenshots of a computing device screen. A screenshot may include many different types of data including without limitation, name, user name, profile name, emojis, date/time stamps, conversation text, images, message status, and many others. This is not an exhaustive list and any data point that may be captured in a screenshot could potentially be utilized as an input for the intelligent keyboard.
Screenshot 311 is an input screenshot from a computing device. This screenshot is passed to OCR segmentation 312. OCR segmentation 312 draws boxes around messages and other segments of text, and then recognizes the text in each such box. Segmentation classifier 313 includes an advanced neural network (discussed in
User specification 345 relates to specific instructions provided by the user about the type or tone of response to be generated. App classifier 346 identifies the application utilized for the textual communication. Examples include iMessage, Facebook Messenger, Tinder, other dating apps, whatsapp, snapchat, SMS/text messaging or any other communication platform utilized on a computing device including any smartphone or tablet.
Compilation and contextualization 306 module generates input for generative AI model 307. The disclosed preferred embodiment, compilation and contextualization module uses persona-based generation which is described in detail in relation to
The Compilation component is reconstruction of a conversation from a list of segments of text and information about who sent them and when. The Contextualization component describes adding instructions, prompts, and other information to tune the instructions/input to generative Al model 307 for the context of the conversation. An example of contextualization is the sentence “This conversation is taking place on a smartphone in the messaging app iMessage between Jessica and Amy” where “iMessage”, “Jessica” and “Amy” are contextual information that has been identified and added from various processes. Specifically, app classifier 346 has identified “iMessage” as the application. The name “Jessica” has been identified in a screenshot by OCR segmentation 312 and classified as a name by segmentation classifier 313. Lastly, “Amy” is identified as the name of the user in the application database as collected at signed up.
The information used to compile and contextualize an input includes one or more of the following: 1) text from a conversation, irrespective of where that text originates, such as from a screenshot analyzed by the system or directly from an application, 2) user specified inputs as to the desired suggestions such as requesting that suggestions are “witty” and/or “professional”, 3) metadata about where the conversation is taking place such the application as identified by app classifier 346, 4) profile information about any of the conversants, such as their name, age, location, demographic information, hobbies, interests, or any other relevant data points as specified in the system and well known in the art, and 5) past conversations including one or more of the conversants. This information is compiled into a list of instructions and data that are passed to generative Al model 307.
Many generative text models are probabilistic models, meaning that they predict the probability of words or phrases, or parts of words or phrases rather than the words or phrases themselves. It is then necessary to choose which words or phrases by sampling from the probability distribution. Sampling parameter algorithm 347 adjusts sampling parameters dynamically for each set of text suggestions. In the preferred embodiment, sampling parameter algorithm 347 adjusts a sampling parameter called temperature. Temperature is a term well known in the art related to adjusting the randomness of the sampling. Increased randomness of the sampling as the effect of choosing a greater variety of words, thus increasing the creativity of the produced text. The algorithm categorizes the conversation based on the length of the conversation, measured in words. There can be two or more categorizations, for example short, medium and long. Each category corresponds to a non-overlapping range of words, e.g. short is for conversations less than 40 words, medium for 40-60 words, and long for conversations with more than 60 words. A sampling method is then specified for each categorization to generate the suggestions. The optimal configuration for the sampling parameter algorithm 347 operates in the following manner with two categorizations: For long conversations (those with 64 words or more): one of the suggestions is generated with a low temperature (=> 1.0) and two of the suggestions are generated with a medium temperature (1.1). For short conversations (those with 64 words or less): one of the suggestions is generated with a low temperature (=> 1.0) and two of the suggestions are generated with high temperatures (1.3 or greater).
Suggested text 308 is generated by feeding the output of compilation and contextualization 306 to generative AI model 307 using the sampling parameters selected by the sampling parameter algorithm 347. Generative AI model 307 is typically a large language model that is based on a transformer neural network. Example generative AI models 307 are: 1) open source models that are finely-tuned on conversational data such as gpt-j, 2) commercially available models such as OpenAI’s models, including GPT-3.5 text-davinci-003, gpt-3.5-turbo and gpt-4, 3) commercially available models that have been fine-tuned on our conversational data such OpenAI’s text-davinci-003 that has been fine-tuned on application conversational data. In the preferred embodiment, generative AI model 307 includes each of the three models listed above where simultaneous requests are sent to each to generate candidate responses.
Suggestions filter 331 filters suggested words or phrases before displaying them to a user. The suggested words or phrases may be filtered on: (1) how closely they match conversation metadata, (2) whether the words or phrases contain violent, graphic, or inappropriate content, (3) the rating of the words or phrases provided by an artificial intelligence system according to the match between the words or phrases and the conversation metadata, or a rating based on the quality, appropriateness, popularity or other characteristic of the words or phrases. When suggestions are filtered on any of the listed criteria, or other criteria as determined by the system, those filtered suggested words/phrases are fed back to compilation and contextualization module 306. This feedback loop ensures that the generative AI model is provided information that enables training of the model for improved suggested words and phrases. Suggested text 308 may be presented directly to the user or tagged with further information.
Tagged suggested text is suggested words or phrases that are tagged with information about the suggested text. One or more of these tags may be displayed as plain text adjacent to the suggestion. For example, a suggestion that reads “You’re doing great!” may have a tag of “Encouraging” and “Positive” next to the phrase in a manner which enables the user to easily characterize each word or phrase. The intent of this design is to enable a user of an intelligent keyboard to quickly identify which words or phrases are likely to fit their needs without having to read the full text of the suggestions. Any tag representing any information related to a suggested word or phrase may be included in the present system.
Input compiler 331 takes the requested data inputs and complies them into a format appropriate for passing to the AI model such that the model will output the optimal suggestions.
Configuration data 324 may include conversation metadata which comprises conversant information, conversation type, conversation information, conversation history and screenshot information. Conversation metadata is any information about a conversation on a smartphone or other computing device that is not the text of the conversation or a screenshot of the conversation. This conversation metadata is utilized to assist in generating suggested words or phrases. Conversant information includes information about the people who are communicating with each other, including their name(s), location(s), a description of their relationship, demographic profiles, their communication history, an analysis of their previous communication, and relevant facts about one or more of the person(s) in the conversation. Conversation type includes information about the desired suggested words or phrases generated by an intelligent keyboard, including the tone, intent, relevant factual information or other details that characterize the desired suggested words or phrases. Conversation information includes information about the conversation contained in the smartphone, tablet or other computing device, including the application in which the conversation took place (if applicable), send and receipt times for individual messages in a conversation, send and receipt status for messages in a conversation, including user behavior before, during, or after the conversation. Conversation history data includes past conversations that one or more of the people in the conversation have had, either with each other or with other people. Lastly, screenshot information includes data related to the screenshots sent by user as part of the conversation. Specific information includes whether the screenshot is cropped, what app the screenshot depicts, and the theme of the app depicted in the screenshot (e.g. “dark mode”, “light mode” or “custom”).
In many mobile applications, a user may create a profile including information such as name/screenname, contact information, employment details, birthdate/age, photographs/avatars, hobbies/interests, places they visit, opening questions and other personal information. The intelligent keyboard of the present invention will analyze a screenshot taken by the user of a potential conversation partner’s profile page and make suggestions on how the user can begin a conversation with that person.
Profile information can be also gathered from a screenshot using the OCR segmentation module 312 and segment classifier 313. uses the same technology and process as advanced screenshot parsing 302 to collect profile information from a user profile screenshot. In this case, the screenshot 311 is an input screenshot of a user profile from a computing device. This screenshot is passed to OCR segmentation 312. OCR segmentation 312 draws boxes around relevant profile information and then recognizes the text in each such box. Segmentation classifier 313 includes an advanced neural network and classifies each segment of text according to its purpose in the profile. Relevant profile data that may be collected includes: name, age, hobbies, education, workplace, location/residence, interests, hobbies, profile description, tags, pets, distances and last online. Additional data points may be available in other profiles not included in this list.
Profile data may also be collected by means other than screenshot, including direct collection from a user profile, API or other method of gathering profile data. This profile data may include any of the data points listed above or any other information included in a user profile to be utilized in the intelligent keyboard system.
Further, the intelligent keyboard may utilize conversation metadata when rephrasing words or phrases. A user may request tone, subject matter, or other details desired for the rephrased words or phrases. For example, a user can select text in their text box and choose “Witty” from a dropdown menu; the intelligent keyboard rephrases the selected text in a witty way.
Rephrase module 332 is comprised of conversation metadata 324, text 334 (text to be rephrased) and requested characteristics 335. Conversation metadata is as previously described and includes any information about a conversation on a smartphone or other computing device that is not the text of the conversation or a screenshot of the conversation, including conversant information, conversation type, conversation information, conversation history and screenshot information. Conversant information includes information about the people who are communicating with each other, including their name(s), location(s), a description of their relationship, demographic profiles, their communication history, an analysis of their previous communication, and relevant facts about one or more of the person(s) in the conversation. Text 334 is the text to be rephrased as provided by the user. Required characteristics 335 are the parameters specified for tone, content or other required characteristics as provided by the user. This information is provided compilation and contextualization 306 for passing to generative AI model 307 to create new rephrased text suggestions.
Seeds refer to compressed information that is provided to a large language model in order to improve the quality of outputs or to increase the speed at which outputs are provided. The intelligent keyboard application may compress conversation metadata into one or more seeds. The intelligent keyboard application then may use these seeds when generating suggested or rephrased words or phrases by providing these seeds to an artificial intelligence system.
“Your task is to compress the following text such that {model name} can reconstruct it as close as possible to the original. It does not need to be human readable. Use language mixing, abbreviations, symbols (unicode and emojis) to aggressively compress it, while still keeping ALL the information to fully reconstruct it.\n\n ##Text to compress:\n\n {text to compress}”.
Conversation history 338 includes conversation history from the user and also the user with the specific other party to the communication. Prompts and instructions 339 are comprised of: 1) a task description, which describes the task and the people in it, 2) task instructions, which provides explicit instructions on what to generate (e.g. “Write a response...”), 3) response guidance that provides information on what is a good or bad response, including (a) characterization of what a suggestion should be (e.g. “the response should be thoughtful”, (b) specific requirements that a suggestion should have (e.g. “include an emoji”), (c) optional requirements that a suggestion may have (e.g. “feel free to use slang and emojis”), and (d) things that should be omitted from a response (e.g. “Do not use a formal tone.”, “Do not use exclamation points.”), and (4) ‘start response’ prompt. For models such as gpt-4 and davinci-text, it is best to phrase the task description (1) in the second person (use “you”). The following are optimal for prompts and instructions for a dating context. Each element is modified by “Further Specifications”. The “Further Specifications” are the optimal list, but it is not exhaustive and can be extended:
Although the present invention has been described in relation to the above disclosed preferred embodiment, many modifications in design, implementation, systems and execution are possible while still maintaining the novel features and advantages of the invention. The preferred embodiment is not meant to limit the scope of the patent in any way, and it should be given the broadest possible interpretation consistent with the language of the disclosure on the whole.
Number | Date | Country | |
---|---|---|---|
Parent | 17592414 | Feb 2022 | US |
Child | 18342736 | US |