Learning a subject, skill or discipline has long been limited to traditional methods that cannot provide on-demand immersive, and accessible experiences that replicate communication with a human instructor, subject matter expert or tutor. Traditional methods, such as, classroom instruction or sell-study often struggle to provide consistent and practical access to on-demand tuition and conversations with subject matter experts which are essential for developing skills in a student's target subject of study. The proposed solution leverages advances in artificial intelligence, large language models, speech recognition, speech-to-text, text-to-speech, and multilingual speech synthesis technologies integrated with telephony to address these limitations and provide an accessible, personalized approach to learning a target subject of study through conversation with emulated human tutors via telephony communication systems.
An automated learning system that comprises a set of designated phone numbers that allow users to place and receive voice telephone calls to interact in real-time with artificial intelligence large language models by using natural language and more specifically the sound of a user's voice during one to one learning sessions. In various embodiments, the automated learning system provides personalized ways for users to learn a subject of study through natural conversations with an emulated human tutor in a realistic environment. Users can participate in role-playing exercises, engage in everyday conversation or request lessons on specific subjects. Moreover, the system can provide real-time feedback, correction, hints, and explanations related to the user's target subject of study, promoting better understanding and retention. During a call the system sets the spoken language by the emulated human instructor to the user's preferred language, with flexibility that allows combining the users native and target languages to be used interchangeably when the subject of study is set to learning foreign languages. The system adapts to the user's proficiency level and each of the received utterances of speech during a call, this allows users with different proficiency levels and learning needs to engage with the system and the process of learning a target subject of study at their own convenience.
To facilitate seamless communication over voice calls, the system utilizes a combination of artificial intelligence programs such as, large language models for generating contextually relevant voice interactions between a user and a human tutor voice emulated by speech synthesis programs.
The automated learning system employs programmable communication tools for making and receiving phone calls, managing inbound and outbound connections and allowing to deliver human-like tuition over traditional analog and digital phone lines, voice over internet protocol lines and web real-time communication.
The system also comprises an online system user interface where users can register for services, place calls through web real-time communication programs, set their preferences, such as, subject of study native language, target language, target subject of study level of proficiency, inbound and outbound calling preferences, a database for storing learning data, user preferences, user interactions, conversation transcripts and learning analytics in order to measure user progress and create a unique and personalized learning experience for each user.
The invention has additional benefits and characteristics that will be revealed on the upcoming detailed description of the invention and the associated claims, when taken in conjunction with the accompanying drawings, in which:
The client device 102A: can be any hardware or software component that enables users to access and interact with the automated learning system via telephony systems. This may include landline phones, cellular phones, voice over internet protocol devices, or applications on smartphones, computers, or tablets. The client device allows users to place and receive voice calls, send text messages, and participate in real-time interactions with the automated learning system and emulated human tutors.
Phone 103A: refers to the user's phone, which can be landline phones, cellular phones, voice over internet protocol phones, smartphones or similar devices that enable the electronic transmission of speech or other data via telephonic exchange communication with the automated learning system 111. Users use their personal phone numbers registered with the system to access learning sessions.
User Authorized Phone Numbers 106: Represents the list of telephone numbers belonging to a user of the automated learning system 111. These phone numbers have been authorized by the user using the graphical user interface to make and receive calls to and from the system designated phone number lines 105. The system uses caller id to validate active participants in the automated learning system 111.
Network 104: Refers to the communication network that connects client devices 102 with the automated learning system infrastructure. This can include traditional analog landlines, digital cellular networks, or internet-based connections like voice over internet protocol and web real-time communication. In one implementation, the network comprises the users' carrier networks, network exchange providers, and the internet, potentially incorporating dedicated or private communication links like WAN, MAN, or LAN, which may not necessarily be part of the internet. The network 104 facilitates real-time audio interactions and the exchange of information between users 101 and the automated learning system 111.
User's Carrier Network 104A: The telecommunications network operated by the user's telephony service provider carrier, which may be a specific mobile, fixed- line, or voice over internet protocol operator responsible for providing voice and data services to the user's phone. It helps establish connections between the user's device 102 and the automated learning system 111 through telephony channels.
Exchange Provider 104B: The organization, system or company that provides the infrastructure for exchanging voice calls and text messages between the automated learning system and the user's carrier network. This can include, traditional telecom operators or internet-based service providers. Their role is to interconnect the communication between user's carrier networks and the automated learning system 111 cloud infrastructure.
System Designated Phone Numbers lines 105: A set of dedicated phone numbers managed by the automated learning system, which users are instructed to call for accessing one-on-one learning sessions with artificial intelligence tutors. These numbers serve as the interface for users to interact with the emulated instructor and artificial intelligence model during real-time conversations.
Automated learning system 111: It encompasses the entire technology stack necessary for providing immersive, and accessible learning experiences via voice phone calls and text messaging interactions, wherein the technology stack uses modules that process a specific function, such as, web servers, artificial intelligence, large language models, multilingual speech recognition, speech-to-text, text-to-speech, multilingual speech synthesis, databases, graphical user interfaces, and programmable telephony communication exchange and designated phone numbers.
In some embodiments, the automated learning system utilizes web real-lime communication technology for facilitating voice calls. web real-time communication allows direct audio connections between the user's browser or client device 102B and the automated learning system 111, via the user interface which depending on the implementation can be in the form of a mobile or web application.
The user interface module 113 is designed to provide users with a graphical user interface and an accessible platform for managing their learning experience. Users can register for services, set preferences, such as, native language, target subject of study, subject of study proficiency level, inbound and outbound calling preferences, and access various tools to measure progress and tailor personalized learning experiences unique to each student. It can be implemented as a mobile or web application, catering to the convenience of users accessing the system via different devices. The user interface also enables seamless communication between the automated learning system and clients through web real-time communication channels.
The network exchange processing module 118 is responsible for managing inbound and outbound phone call and text message connections between the automated learning system infrastructure and the user's carrier networks. This module ensures efficient handling of audio streams and text messages transmissions between users and the emulated human instructor via traditional analog and digital phone lines as well as voice over internet protocol and web real-time communication. It oversees connecting user's carrier networks with the system designated phone numbers, streamlining communication across various telephony channels.
The speech recognition module 108, the client device 102 receives an electrical signal representing a person's voice and converts that signal into a digital signal, the automated learning system's speech recognition module 108 receives that signal for preprocessing it to enhance the speech signal while mitigating any noise. The speech recognition module analyzes the signal using acoustic modeling to register phonemes, distinct units of speech sound that represent and distinguish one word from another.
The language recognition module 109 is responsible for identifying the language used in the input audio stream, setting the language context for subsequent processing by the speech-to-text module 110.
The speech-to-text module 110 works in parallel with the speech recognition module 108 and the language recognition module 109 to convert spoken language into text format, the speech phonemes are constructed into understandable words and sentences using language modeling facilitating real-time speech-to-text conversion.
The large language model inference module 112 employs artificial intelligence technology including but not limited to generative pre-trained transformers or bidirectional encoder representations from transformers to generate contextually rich and coherent responses during subject of study learning sessions that pair with speech synthesis modules that emulate human sounding tutors.
The inference processing module 115 formats the large language model inference module 112 responses to include speech synthesis markup language and get further process by the multilingual text-to-speech module 116.
The multilingual text-to-speech Module 116 generates human-like synthesized speech in multiple languages based on text inputs and speech synthesis markup language inserted by the inference processing module 115, enabling the system to speak in one or more languages interchangeably during learning sessions.
The speech streaming module 117 manages the raw audio stream of phone calls, ensuring real-time communication between users and the automated learning system using websockets.
The learning management module 114 tracks user progress, interaction data, conversation transcripts, and analytics to measure user proficiency improvement over time. This component analyzes performance metrics and generates personalized learning strategies based on individual needs, ensuring skill acquisition and development.
The web server 119 hosts the online user interface, allowing users to sign up for services, manage their preferences, such as, subject of study, native language, target language, proficiency level, inbound and outbound calling and text messaging preferences, and access various analytics that measure progress and help tailor learning experiences uniquely to each student. It acts as a central hub for managing interactions between all the system modules, user accounts, settings, and storing learning data.
Those of skill in the art will appreciate that the automated learning system 111 may contain additional modules relevant to its functionality, such as, social networking integrations, interactive learning tools, payments or ecommerce features, but they are not described herein since they are not directly material to the invention. Furthermore, conventional components like firewalls, authentication systems, encryption methods, network management tools, load balancers, and the like are not shown as they are not crucial to the core functionality of the invention. The automated learning system 111 can be implemented using a single computer or a network of computers, including cloud-based implementations, utilizing server-class machines equipped with high-performance graphic processing units, processors and main memory running an operating system, such as, Linux or derivatives thereof.
The operations of the automated learning system 111 described herein can be controlled through either hardware or through computer programs installed on non-transitory storage devices and executed by the processors to perform the functions outlined herein. The database 107 is implemented using non-transitory computer readable storage devices, employing suitable database management systems for data access and retrieval, such as, relational or non-relational databases (e.g., MySQL, MongoDB). The automated learning system 111 includes other essential hardware elements required for its operation, like network interfaces and protocols, input devices for data entry, and output devices for display, printing, or other presentation formats of content. As will become apparent below, the intricate operations and functions of the automated learning system 111 requires their implementation on computer systems and cannot be performed in the human mind alone.
This application claims the benefit of the U.S. Provisional Application No. 63/545,061 filed on Oct. 20, 2023, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63545061 | Oct 2023 | US |