Computing devices may provide content (e.g., user interfaces, audio content, textual content, video content) in a variety of different languages based on language settings for those computing devices. For example, a user might modify the language of their operating system using a configuration menu, and/or might watch foreign language video content with subtitles enabled. As a wider variety of users consume an increasingly varied quantity of content, it is increasingly likely that computing device language settings are misconfigured. For example, a computing device might inadvertently display subtitles in a language that cannot be read by a viewer, and/or might output audio data too quickly to be consumed by a hearing-impaired listener.
The following summary presents a simplified summary of certain features. The summary is not an extensive overview and is not intended to identify key or critical elements.
Systems, apparatuses, and methods are described for modifying the language preferences of computing devices based on audio data. A computing device may receive audio content corresponding to the speech of one or more different users. Based on processing that audio content, the computing device may determine language settings for the display of content. For example, based on detecting that a viewer speaks Spanish, English, and combinations thereof, the computing device may disable subtitles when displaying Spanish-language content, but may enable subtitles when displaying Japanese-language content. As another example, based on detecting that a viewer speaks in Spanish and determining that the viewer speaks a command (and, e.g., not the title of content), the computing device may change language settings to Spanish. The computing device may store a user profile indicating such language preferences. Moreover, based on the processing of that audio content, accessibility features may be implemented. For example, the speed of audio content may be modified based on detecting that a user speaks with a slow cadence.
These and other features and advantages are described in greater detail below.
Some features are shown by way of example, and not by limitation, in the accompanying drawings In the drawings, like numerals reference similar elements.
The accompanying drawings, which form a part hereof, show examples of the disclosure. It is to be understood that the examples shown in the drawings and/or discussed herein are non-exclusive and that there are other examples of how the disclosure may be practiced.
The communication links 101 may originate from the local office 103 and may comprise components not shown, such as splitters, filters, amplifiers, etc., to help convey signals clearly. The communication links 101 may be coupled to one or more wireless access points 127 configured to communicate with one or more mobile devices 125 via one or more wireless networks. The mobile devices 125 may comprise smart phones, tablets or laptop computers with wireless transceivers, tablets or laptop computers communicatively coupled to other devices with wireless transceivers, and/or any other type of device configured to communicate via a wireless network.
The local office 103 may comprise an interface 104. The interface 104 may comprise one or more computing devices configured to send information downstream to, and to receive information upstream from, devices communicating with the local office 103 via the communications links 101. The interface 104 may be configured to manage communications among those devices, to manage communications between those devices and backend devices such as servers 105-107 and 122, and/or to manage communications between those devices and one or more external networks 109. The interface 104 may, for example, comprise one or more routers, one or more base stations, one or more optical line terminals (OLTs), one or more termination systems (e.g., a modular cable modem termination system (M-CMTS) or an integrated cable modem termination system (I-CMTS)), one or more digital subscriber line access modules (DSLAMs), and/or any other computing device(s). The local office 103 may comprise one or more network interfaces 108 that comprise circuitry needed to communicate via the external networks 109. The external networks 109 may comprise networks of Internet devices, telephone networks, wireless networks, wired networks, fiber optic networks, and/or any other desired network. The local office 103 may also or alternatively communicate with the mobile devices 125 via the interface 108 and one or more of the external networks 109, e.g., via one or more of the wireless access points 127.
The push notification server 105 may be configured to generate push notifications to deliver information to devices in the premises 102 and/or to the mobile devices 125. The content server 106 may be configured to provide content to devices in the premises 102 and/or to the mobile devices 125. This content may comprise, for example, video, audio, text, web pages, images, files, etc. The content server 106 (or, alternatively, an authentication server) may comprise software to validate user identities and entitlements, to locate and retrieve requested content, and/or to initiate delivery (e.g., streaming) of the content. The application server 107 may be configured to offer any desired service. For example, an application server may be responsible for collecting, and generating a download of, information for electronic program guide listings. Another application server may be responsible for monitoring user viewing habits and collecting information from that monitoring for use in selecting advertisements. Yet another application server may be responsible for formatting and inserting advertisements in a video stream being transmitted to devices in the premises 102 and/or to the mobile devices 125. The local office 103 may comprise additional servers, such as the audio processing server 122 (described below), additional push, content, and/or application servers, and/or other types of servers. Although shown separately, the push server 105, the content server 106, the application server 107, the audio processing server 122, and/or other server(s) may be combined and/or server operations described herein may be distributed among servers or other devices in ways other than as indicated by examples included herein. Also or alternatively, one or more servers (not shown) may be part of the external network 109 and may be configured to communicate (e.g., via the local office 103) with other computing devices (e.g., computing devices located in or otherwise associated with one or more premises 102). Any of the servers 105-107, and/or 122, and/or other computing devices may also or alternatively be implemented as one or more of the servers that are part of and/or accessible via the external network 109. The servers 105, 106, 107, and 122, and/or other servers, may be computing devices and may comprise memory storing data and also storing computer executable instructions that, when executed by one or more processors, cause the server(s) to perform steps described herein.
An example premises 102a may comprise an interface 120. The interface 120 may comprise circuitry used to communicate via the communication links 101. The interface 120 may comprise a modem 110, which may comprise transmitters and receivers used to communicate via the communication links 101 with the local office 103. The modem 110 may comprise, for example, a coaxial cable modem (for coaxial cable lines of the communication links 101), a fiber interface node (for fiber optic lines of the communication links 101), twisted-pair telephone modem, a wireless transceiver, and/or any other desired modem device. One modem is shown in
The gateway 111 may also comprise one or more local network interfaces to communicate, via one or more local networks, with devices in the premises 102a. Such devices may comprise, e.g., display devices 112 (e.g., televisions), other devices 113 (e.g., a DVR or STB), personal computers 114, laptop computers 115, wireless devices 116 (e.g., wireless routers, wireless laptops, notebooks, tablets and netbooks, cordless phones (e.g., Digital Enhanced Cordless Telephone—DECT phones), mobile phones, mobile televisions, personal digital assistants (PDA)), landline phones 117 (e.g., Voice over Internet Protocol—VoIP phones), and any other desired devices. Example types of local networks comprise Multimedia Over Coax Alliance (MoCA) networks, Ethernet networks, networks communicating via Universal Serial Bus (USB) interfaces, wireless networks (e.g., IEEE 802.11, IEEE 802.15, Bluetooth), networks communicating via in-premises power lines, and others. The lines connecting the interface 120 with the other devices in the premises 102a may represent wired or wireless connections, as may be appropriate for the type of local network used. One or more of the devices at the premises 102a may be configured to provide wireless communications channels (e.g., IEEE 802.11 channels) to communicate with one or more of the mobile devices 125, which may be on- or off-premises.
The mobile devices 125, one or more of the devices in the premises 102a, and/or other devices may receive, store, output, and/or otherwise use assets. An asset may comprise a video, a game, one or more images, software, audio, text, webpage(s), and/or other content.
Although
As described herein, language settings and/or recording settings of a computing device (e.g., any of the devices described above with respect to
In step 301, the computing device may receive audio data. Receiving audio data may comprise receiving data that indicates all or portions of vocalizations made by a user. For example, the computing device may receive audio data corresponding to speech of a user. The audio data may be received from one or more sources, such as via a microphone of the computing device, a microphone of a different computing device, or the like. For example, the audio data may be received, via a network, from a smartphone, voice-enabled remote control, and/or similar computing devices. The audio data may be in a variety of formats. For example, the audio data may comprise a recording (e.g., an .mp3 file) of a user's speech. As another example, the audio data may comprise speech-to-text output of an algorithm that has processed the speech of a user.
A microphone or similar audio capture device may record multiple different users, such that the audio data may comprise the speech of one or more different users. For example, the audio data may correspond to speech of a plurality of different users. In turn, the audio data may comprise a variety of different spoken languages, speech cadences, and the like. For example, a multilingual family may speak in both English and Chinese, or a combination thereof. As another example, within a multigenerational family, older family members might have difficulty hearing and speak with a slower but louder cadence, whereas younger family members might speak more quickly but more quietly.
In step 302, the computing device may process the received audio data to determine one or more properties of speech by one or more users. For example, the computing device may process the audio data to determine one or more properties of the speech of the user. The one or more properties of the speech may comprise any subjective or objective characterization of the speech, including but not limited to a language of the speech, a cadence of the speech, a volume of the speech, one or more indicia of communication limitations indicated by the speech, or the like.
The one or more properties of speech may indicate a language spoken by a user. To determine a language spoken by the user, the computing device may process the audio data using one or more algorithms that compare sounds made by the user to sounds associated with various languages. Additionally and/or alternatively, to determine a language spoken by the user, the computing device may use a speech-to-text algorithm to determine one or more words spoken by a user, then compare the one or more words to words associated with different languages. The language spoken by the user may correspond to both languages (e.g., English, Spanish, Japanese) as well as subsets of those languages (e.g., specific regional dialects of English). For example, the computing device may process the audio data to determine a particular regional dialect of English spoken by a user. As will be described below, this language information may be used to modify user interface elements (e.g., to switch user interface elements to a language spoken by one or more users), to select content (e.g., to play an audio track corresponding to a language spoken by the one or more users), or the like.
The one or more properties of speech may indicate a speech pattern of a user. The particular loudness, cadence, and overall tenor of the speech of a user may suggest information about a user's relationship to a language. For example, mispronunciations, slow speech, and mistakes in use of certain terms may suggest that a user has a limited understanding of a particular language. In such a circumstance, and as will be described below, it may be desirable to provide simplified forms of this language for the user. As one example, the one or more properties of speech may suggest that a user has only a basic understanding of Japanese, such that user interface elements should be displayed in hiragana or katakana instead of kanji. In this manner, the one or more properties of speech may indicate stuttering, speech impediment(s), atypical speech patterns (e.g., hearing loss), slurred speech, or the like.
The one or more properties of speech may indicate communicative limitations of a user. In some circumstances, the manner in which a user speaks may suggest that they have difficulty communicating. For example, certain speech patterns may suggest that a user may be wholly or partially deaf. In such a circumstance, and as will be described below, it may be desirable to modify presentation of content (by, e.g., turning on subtitles/captions, increasing the volume of content, or the like).
Processing the audio data may comprise determining a language spoken by two or more of a plurality of different users. The audio data may correspond to speech by a plurality of different users. For example, multiple users may speak in a living room, and the audio data may comprise portions of both users' speech. In such a circumstance, the computing device may be configured to determine one or more portions of the audio data that correspond to each of a plurality of different users, then determine one or more properties of the speech of each user of the plurality of different users. This processing may be used to determine language and/or recording settings for multiple users. For example, a majority of users captured in audio data speak Spanish, but one of the plurality may speak Portuguese. In such a circumstance, the computing device may determine (e.g., based on a count of the one or more users speaking Spanish versus those speaking Portuguese) whether to modify the language settings to Spanish or Portuguese.
Processing the audio data may comprise determining whether one or more words correspond to a command. A user's speech might comprise a command (e.g., “Play,” “Pause”) and/or the title of content (e.g., the name of a movie), such that the audio data might comprise a combination of both a command and a title of content (e.g., “Play [MOVIE NAME],” “Search for [SHOW NAME]”). The computing device may determine which words, of one or more words spoken by a user, correspond to a command. The computing device may additionally and/or alternatively determine which words, of the one or more words spoken by the user, correspond to a title of content, such as the title of a movie, the title of television content, or the like. The computing device may be configured to modify language settings based on the language used by a user for commands, but not the language used by a user for content titles. For example, the computing device may change language settings to English based on determining that a user used the English word “Play” as a command, but the computing device might not change language settings when a user uses the Spanish title of a movie. To determine whether one or more words correspond to a command, the computing device may process the audio data to identify one or more words, then compare those words to a database of words that comprises commands in a variety of languages. In this manner, the computing device may determine not only that a word is a command (e.g., “Play”), but that the command is in a particular language (e.g., English).
As art part of processing the audio data to determine one or more properties of speech indicated in audio data in step 302, the computing device may train a machine learning model to identify the speech properties of users. To perform this training, training data may be used with respect to the machine learning model. That training data may comprise, for example, associations between audio content corresponding to speech of a plurality of different users and properties of the speech of the plurality of different users. In this manner, the trained machine learning model may be configured to receive input (e.g., audio data) and provide output (e.g., indication(s) of the one or more properties of the speech indicated by the audio data). An example of a neural network which may be used to implement such a machine learning model is provided below with respect to
In step 303, the computing device may compare the one or more properties determined in step 302 to language settings. This comparison may determine whether there is a difference between the current language settings of a computing device versus the one or more properties of the speech of the user. For example, the computing device may compare the one or more properties of command-related words included in the speech of the user to language settings of the computing device to determine, e.g., if the user interface is displaying text in a language that is the same as that spoken by the users captured in the audio data. In this manner, step 303 may comprise determining whether the language settings of one or more computing devices are consistent with the one or more properties of speech determined as part of step 302.
In step 304, the computing device may determine, based on the comparing in step 303, whether to modify the language settings. Language settings may comprise any settings which govern the manner of presentation of content, such as the language with which video, audio, or text is displayed, the speed at which video, audio, or text is displayed, or the like. As indicated above, step 303 may comprise determining whether the language settings of one or more computing devices are inconsistent with the one or more properties of speech determined as part of step 302. If such an inconsistency exists (e.g., if the language settings are inconsistent with the one or more properties of speech), the computing device may determine to modify the language settings (e.g., to switch a display language of a user interface element, to turn on subtitles/captions, or the like). If the computing device determines to modify the language settings, the computing device may perform step 305. Otherwise, the computing device may perform step 306.
Determining whether to modify the language settings may comprise determining whether one or more portions of the speech correspond to a command. Speech may comprise indications of commands, but might additionally and/or alternatively comprise indications of the titles of content. Moreover, a user might speak one language, but refer to the title of content (e.g., a movie, a television show, a song, a podcast) in another language. In turn, the language a user uses for commands might be different than the language used by the same user for the title of content. For example, a user might speak English to issue a command (e.g., “Play,” “Pause”), but may speak the Spanish name of a Spanish television show. In such an example, it may be preferable to maintain the language settings in English and not switch the settings to Spanish. In contrast, if that same user provided commands (e.g., “Play”) in Spanish, whether or not the user used the Spanish or English language title of a content item, the user's use of Spanish may indicate that the language settings should be switched to Spanish. Accordingly, if a user provides a command in a language, the computing device may determine to modify the language settings based on that language. In contrast, if the user recites the name of content, the language settings might not be changed.
In step 305, the computing device may modify the language settings. For example, the computing device may modify, based on the comparing in step 303, whether or not the speech corresponds to a command, and/or the one or more properties determined in step 302, the language settings of the computing device to, e.g., turn subtitles/captions on or off, change subtitle language, switch an audio track of content to a particular language, implement accessibility features, implement machine translation, or the like. As part of this process, the computing device may determine a language indicated by the one or more properties. For example, the computing device may determine a language indicated by the one or more properties, then modify the language settings of the computing device based on that language. As part of step 305, the computing device may determine (e.g., based on the one or more properties determined in step 302) a language used to display video content, audio content, and/or textual content. For example, the computing device may modify a language setting (e.g., for subtitles/captions, for user interface element(s), for an audio track) to match a language spoken by one or more users. More examples of how the language settings of the computing device may be modified are described below in connection with
Modifying the language settings may comprise prompting a user to modify the language settings. For example, the computing device may cause display of a user interface element providing an option, to a user, to modify the language settings. Additionally and/or alternatively, in certain circumstances, modification of language settings may be performed automatically. For example, a computing device displaying content on a television in a public area (e.g., an office lobby) might be configured to automatically modify language settings based on audio captured by speech of those in the office lobby. As another example, if the one or more properties determined in step 303 indicate a single language (and/or a predominant language), the indicated language may be automatically selected and language settings automatically modified based on that automatic selection.
As part of modifying the language settings, the computing device may create and/or store a user profile. A user profile may comprise a data element which may store all or portions of language settings and/or recording settings for one or more users. That user profile may be used by the computing device and/or one or more other computing devices to implement language settings and/or recording settings. For example, the computing device may store a user profile that indicates the one or more properties of the speech of the user, and then provide, to one or more second computing devices, the user profile. In this manner, one computing device may determine (for example) that a user speaks Spanish, create a user profile that indicates that the user speaks Spanish, and that user profile may be used by a wide variety of devices to configure their user interfaces to display Spanish text. Further description of user profiles is provided below with respect to
In step 306, the computing device may determine, based on the comparing in step 303 and/or based on whether the speech corresponds to a command, whether to modify recording settings. Recording settings may comprise settings that control the manner of capture of audio data, such as the audio data received in step 301. The one or more properties determined in step 303 may indicate, for example, that modifications to recording settings should be made to better capture speech of a user. For example, if a user speaks quietly but slowly, the computing device may increase its gain and record for a longer duration so as to better capture the voice of a user. This may be particularly useful where one or more computing devices implement voice commands, as modification of the recording settings may enable the computing device to better capture voice commands spoken by a user. If the computing device determines to modify the recording settings, the computing device may perform step 307. Otherwise, the computing device may perform step 308.
In step 307, the computing device may modify recording settings. For example, the computing device may modify, based on the one or more properties of the speech of the user determined in step 302, recording settings of the user device. Modifying the recording settings may comprise, for example, modifying a gain of a microphone of one or more computing devices, modifying a duration with which audio content is recorded by one or more computing devices, modifying one or more encoding parameters of an encoding of audio data captured by a computing device, modifying pitch/tone control of a microphone used to capture audio data, implementing voice normalization algorithms, or the like.
In step 308, the computing device may determine whether to revert the language settings and/or the recording settings. It may be desirable to reset a computing device back to default language and/or recording settings after a period of time has expired. Accordingly if the computing device determines to revert the language settings and/or the recording settings (e.g., because an elapsed time has satisfied a threshold associated with reverting the settings), the computing device may perform step 309. Otherwise, the method may proceed to the steps depicted in
In step 309, the computing device may revert the language and/or recording settings. Reverting the language and/or settings may comprise modifying the language and/or recording settings to a state before step 305 and/or step 307 were performed. After step 309, the steps depicted in
In step 401, the computing device may determine whether to modify subtitles. Subtitles may comprise captions and/or any other text for display that corresponds to audio and/or video content. If a user speaks a different language than an audio track, and/or if a user has hearing difficulties, it may be desirable to turn on subtitles for that user. Similarly, if a user speaks a particular language, it may be desirable to switch the subtitle language to a language spoken by a user. If the computing device decides to modify the subtitles, the computing device may perform step 402. Otherwise, the computing device may perform step 403.
In step 402, the computing device may modify subtitles. For example, the computing device may modify subtitle settings of the computing device to turn subtitles on or off, change a language of subtitles, or the like. To modify the subtitles, the computing device may transmit instructions to a video player application to enable subtitles, disable subtitles, modify a language of subtitles, or the like. Multiple subtitles may be shown. For example, based on determining that one user speaks Spanish but another user speaks English, both English and Spanish subtitles may be shown simultaneously.
In step 403, the computing device may determine whether to implement accessibility features. Accessibility features may comprise, for example, slowing down audio and/or video, modifying a size of displayed text, implementing colorblindness modes, or any other settings that may be used to make content more easily consumed by users (such as visually, aurally, and/or physically impaired users). If the computing device decides to implement accessibility features, the computing device may perform step 404. Otherwise, the computing device may perform step 405.
In step 404, the computing device may implement accessibility features. For example, the computing device may modify a playback speed of content, may modify a size of displayed text, may simplify words and/or controls displayed by an application, or the like.
In step 405, the computing device may determine whether to modify content. Modifying content may comprise selecting content for display, changing content currently displayed by a computing device, ending the display of content, or the like. If the computing device decides to modify content, the computing device may perform step 406. Otherwise, the computing device may perform step 407.
In step 406, the computing device may modify content. Modifying the content may comprise selecting content for display. For example, the computing device may select, based on the one or more properties of the speech of the user determined in step 302, content, and then cause display of that selected content. In this way, a computing device might select a version of a movie in a language that is spoken by a user. This selection process may be used for other purposes as well: for example, the computing device might select a Spanish television show for display based on determining that a user speaks Spanish, and/or might select a particular notification and/or advertisement based on the language spoken by a user.
One example of how content may be modified is in the selection of content for display. A user may speak, using a voice remote, a command requesting that a movie be played. That command may be in a particular language, such as Chinese. Based on detecting that the command is in Chinese, the computing device may determine a version of the movie in Chinese, then cause display of that movie. This process might be particularly efficient where the same movie might have different titles in different languages, as identifying the language spoken by the user might better enable the computing device to retrieve the requested movie.
In step 407, the computing device may determine whether to implement machine transaction. In some circumstances, content in a particular language might not be available. For example, a movie might have English and Spanish subtitles, but not Korean or Japanese subtitles. Similarly, a user interface might be configured to be displayed in English and Spanish, but not Korean or Japanese. In such circumstances, the computing device may use a machine translation algorithm to, where possible, translate content to a language spoken by a user. For example, the computing device may use a machine translation algorithm to translate English subtitles into Korean subtitles. If the computing device decides to implement machine translation, the computing device may perform step 408. Otherwise, the computing device may perform step 409.
In step 408, the computing device may implement machine translation. For example, the computing device may perform machine translation of text content (e.g., subtitles, user interface elements, or the like).
In step 409, the computing device may determine whether to modify display properties. Display properties may comprise any aspect of the manner with which content is displayed, including a size of user interface elements, a resolution of content displayed on a display screen, or the like. If the computing device decides to modify display properties, the flow chart may proceed to step 410. Otherwise, the flow chart may proceed to step 306 of
In step 410, the computing device may modify display properties. For example, the computing device may modify display properties of a user interface provided by the computing device by, e.g., lowering a display resolution of content displayed by the computing device (e.g., to increase an overall size of user interface elements displayed on a display device), increasing the size of user interface elements displayed by the computing device, or the like.
The process described with respect to
In step 501, a machine learning model may be trained to identify speech properties of users. For example, the computing device may train a machine learning model to output, in response to input comprising audio data, indications of one or more properties of the speech contained in that audio data. The machine learning model may be trained by training data. The training data may be tagged, such that it comprises information about audio data that has been tagged to indicate which aspects of that audio data correspond to properties of speech. In this manner, the computing device may train, using training data, a machine learning model to identify speech properties of users.
The training data may indicate associations between speech and properties of that speech. In this manner, the training data may be tagged data which has been tagged by, e.g., an administrator. For example, the computing device may comprise associations between audio content corresponding to speech of a plurality of different users and properties of the speech of the plurality of different users. The audio content corresponding to speech of the plurality of different users may correspond to commands spoken by the plurality of different users. The properties of the speech of the plurality of different users may indicate a language of the commands spoken by the plurality of different users.
In step 502, the computing device may provide the audio data (e.g., from step 302) as input to the trained machine learning model. The audio data may be preprocessed before being provided to the trained machine learning model. For example, the audio data may be processed using a speech-to-text algorithm, such that the input to the trained machine learning model may comprise text data. As another example, various processing steps (e.g., noise reduction algorithms) may be performed on the audio data to aid in the clarity of the audio data.
In step 503, the computing device may receive output from the trained machine learning model. The output may comprise one or more indications of one or more properties of speech in the audio data provided as input in step 502. For example, the computing device may receive, as output from the trained machine learning model, an indication of one or more properties of the speech of the first user. The one or more properties indicated as part of this output may be the same or similar as discussed with respect to step 302 of
Steps 504 and 505 describe a process which may occur any time after step 302 whereby the trained machine learning model may be further trained based on later information about a user. The trained machine learning model may output incorrect and/or inaccurate information. For example, the trained machine learning model might incorrectly identify the language spoken by a particular user. In such a circumstance, subsequent activity by a user (e.g., the user changing a language setting back) may indicate that the trained machine learning model provided incorrect output. This information (e.g., that the output was incorrect) may be used to further train the trained machine learning model, helping avoid such inaccuracy in the future. As such, the process described in steps 504 and 505 may be used to, after the training performed in step 501, improve the accuracy of the trained machine learning model.
In step 504, the computing device may determine whether a user modified the language settings. For example, the computing device may, after modifying the language settings of the computing device, receive an indication that the first user further modified the language settings of the computing device. Such a modification may indicate, as discussed above, that the trained machine learning model provided incorrect output. If the computing device determines that a user modified the language settings, the computing device may perform step 505. Otherwise, the method may proceed back to one or more of the steps of
In step 505, the computing device may, based on determining that a user modified the language settings, further train the trained machine learning model. This training may be configured to indicate that the output from the trained machine learning model received in step 503 was incorrect in whole or in part. For example, the computing device may cause the trained machine learning model to be further trained based on the indication that the first user further modified the language settings of the computing device. In this manner, the trained machine learning model may procedurally learn, based on subsequent user activity, to better identify one or more properties of the speech of a user. After step 505, the method may proceed back to the one or more steps of
User profiles, such as those discussed with respect to step 305 of
In step 601, the computing device may store a user profile for one or more first users based on one or more properties of the audio data received in step 301. The creation of the user profile may comprise processing the audio data to determine one or more properties of the audio data, such as is described with respect to step 302 of
The user profile may be configured to indicate one or more languages. In this manner, the user profile may indicate one or more languages associated with a user, such as one or more languages spoken by the user. The user profile may indicate a proficiency of the user with respect to the one or more languages, and/or may indicate a preference as to which language(s) should be used when displaying content for a user. For example, the user profile may be configured to cause display of video content in a first language, and may be configured to cause display of subtitles corresponding to a second language.
In step 602, the computing device may receive second audio data. That second audio data might not necessarily correspond to speech by one or more first users, as may have been the case with respect to the audio data received in step 301 and referenced in step 601. For example, speech corresponding to the second audio data may be from an entirely different user. This step may be the same or similar as step 301 of
The computing device may receive the second audio data from a different device as compared to the device from which the first audio data was received (e.g., in step 301). As indicated with respect to step 301 of
In step 603, the computing device may compare the user profile stored in step 602 to one or more properties of the second audio data. For example, the computing device may compare the language settings with one or more second properties of the second audio data. This step may be the same or similar as step 303 of
As part of comparing the user profile to the properties of the second audio data, the computing device may determine whether the second audio data is associated with the same user that is associated with the first audio data. For example, the computing device may determine whether the second audio data is associated with the first user. If the second audio data is received from the same user as the first audio data, then this may indicate that the user profile should be modified based on the second audio data. In this manner, for example, if a user begins speaking Spanish, then the computing device may modify that user's user profile to add Spanish to a list of languages, such that Spanish-language content is selected and provided to the user. If the second audio data is not received from the same user as the first audio data, then this may indicate that a new user profile (e.g., for a second user associated with the second audio data) should be created and stored.
In step 604, the computing device may determine whether to modify the user profile stored in step 601. This decision may be based on the comparing described in step 603 and/or based on determining whether the second audio data corresponds to a command. If the computing device determines to modify the user profile, the computing device may perform step 605. Otherwise, the method may end. Additionally and/or alternatively, the method may be repeated (e.g., based on the receipt of additional audio data).
In step 605, the computing device may modify the user profile. For example, the computing device may modify, based on the comparing of step 603 and the one or more second properties of the second audio data, the user profile. In this manner, languages may be added, removed, and/or altered in the user profile. For example, the first audio data may indicate that a user has a basic understanding of English, but the second audio data may indicate that the same user has a strong understanding of English. In such a circumstance, the user profile for that user may be modified such that an “English (Basic)” designation for languages is replaced with an “English (Advanced)” designation.
Modifying the user profile may comprise adding, to the user profile, an indication of an accessibility feature to be implemented via the computing device. As described with respect to step 403 and step 404 of
A second computing device may use the user profile and/or the modified user profile. In this manner, the user profile may be used by a plurality of different computing devices, rather than just the computing device that created the user profile. For example, a second computing device may display content based on the modified user profile. That second computing device might be, for example, in a call center. In this manner, information about a user's language settings for their computer might be used by a call center system to route the user to a customer representative that speaks their language.
The differences between
The first user profile 800a shows that a first user speaks two languages (English and Spanish), with one (English) being preferred for subtitles, and the other (Spanish) being only understood at a basic level by the first user. The first user profile 800a also shows that the first user prefers that subtitles be on for all content. The first user profile 800a further shows accessibility settings that provide that audio is to be played back at half speed, and that enlarged fonts are to be displayed (e.g., for user interface elements and subtitles).
The second user profile 800b shows that a second user speaks three languages (Korean, Chinese, and English), with one (Korean) being preferred for all content, and the other two (Chinese and English) being understood at only a basic level. The second user profile 800b also shows that the second user prefers that subtitles be enabled for content in languages that the second user does not speak (that is, languages other than Korean, Chinese, and English). The second user profile 800b further shows recording settings specifying that recording gain should be increased when recording audio data associated with the second user.
An artificial neural network may have an input layer 910, one or more hidden layers 920, and an output layer 930. A deep neural network, as used herein, may be an artificial network that has more than one hidden layer. The example neural network architecture 900 is depicted with three hidden layers, and thus may be considered a deep neural network. The number of hidden layers employed in deep neural network 900 may vary based on the particular application and/or problem domain. For example, a network model used for image recognition may have a different number of hidden layers than a network used for speech recognition. Similarly, the number of input and/or output nodes may vary based on the application. Many types of deep neural networks are used in practice, such as convolutional neural networks, recurrent neural networks, feed forward neural networks, combinations thereof, and others.
During the model training process (e.g., as described with respect to step 501 and/or step 508 of
Steps 301-302 of
Step 1001 through step 1004 recite a loop whereby words are evaluated and, based on determining that one or more words correspond to a command, the computing device decides whether to modify language settings. Given the wide variety of different content titles, it might not be immediately apparent which words are intended to be commands and which words are intended to correspond to content titles. As such, the loop depicted in step 1001 through step 1004 may be repeated for different permutations and/or combinations of words in the speech of the user, such that commands might be distinguished from titles, stop words, and the like. For example, the user might say “Play Play The Football Game,” with “Play the Football Game” being the title of a movie. In such a circumstance, the loop depicted in step 1002 might analyze each word individually (“Play,” “Play,” “The,” “Football,” “Game”) and words in combination (“Play Play,” “The Football,” “Football Game,” “Play The Football,” “Play The Football Game,” etc.). Based on such testing of various permutations, the computing device might ultimately correctly identify that “Play” corresponds to a command, whereas “Play The Football Game” is the title of a movie. Such a process might be particularly useful where, for instance, a user is prone to stuttering or repeating words.
In step 1001, the computing device may identify one or more words in the speech of the audio data received in step 301. The computing device may subdivide speech of the user (e.g., a full sentence spoken by a user) into discrete portions (e.g., individual words or phrases), and the identified portion in step 1001 may be one of those subdivided portions. For example, the phrase “play [MOVIE NAME]” may be divided into two portions: a first portion corresponding to “play,” and a second portion corresponding to “[MOVIE NAME].” In this example, the loop depicted from step 1001 to step 1004 might be repeated twice: once for “play,” and once for “[MOVIE NAME].”
In step 1002, the computing device may determine if the one or more words identified in step 1001 correspond to a title. If those words do correspond to a title, the flow chart proceeds to step 1004. Otherwise, the flow chart proceeds to step 1003.
In step 1003, computing device may determine if the one or more words identified in step 1001 correspond to a command. This may be effectuated by comparing the identified words to a list of words known to be associated with commands. That list of words might indicate commands in different languages, such as the word “play” in a variety of different languages. In turn, as part of step 1003, the computing device might not only determine that the words correspond to a command, but the language with which the user spoke the command in. If those words do correspond to a title, the flow chart proceeds to step 304. Otherwise, the flow chart proceeds to step 1004.
Step 304 in
In step 1004, the computing device may determine whether there are more words to process. As indicated above, step 1001 through step 1004 may form a loop whereby the computing device may iteratively process different portions of user speech to determine whether one or more of those words correspond to a command. In turn, as part of step 1004, if there are additional words and/or permutations of words to process, the computing device may return to step 1001. Otherwise, the flow chart may end.
Although examples are described above, features and/or steps of those examples may be combined, divided, omitted, rearranged, revised, and/or augmented in any desired manner. Various alterations, modifications, and improvements will readily occur to those skilled in the art. Such alterations, modifications, and improvements are intended to be part of this description, though not expressly stated herein, and are intended to be within the spirit and scope of the disclosure. Accordingly, the foregoing description is by way of example only, and is not limiting.