This application includes material which is subject or may be subject to copyright and/or trademark protection. The copyright and trademark owner(s) has no objection to the facsimile reproduction by any of the patent disclosure, as it appears in the Patent and Trademark Office files or records, but otherwise reserves all copyright and trademark rights whatsoever.
The disclosed subject matter relates generally to a system and method for enhancing and augmenting the accessibility of multimedia presentations. More particularly, the present disclosure relates to a system and method for providing real-time audio and text narration to augment the accessibility of slide presentations for visually impaired users.
Multimedia presentations, ranging from slide shows to video and interactive content, serve as a crucial form of information dissemination in fields such as education, business, and entertainment. While these presentations offer a compelling medium for conveying complex ideas, they pose significant accessibility challenges for individuals with visual impairments.
Current solutions aimed at enhancing accessibility, like embedded descriptive text or manual narration, fall short in providing a real-time, adaptive experience. Automated methods, such as screen readers or text-to-speech engines, often lack full integration with multimedia presentation platforms, thereby offering a limited and disjointed user experience. These methods are generally not adaptable to real-time changes in the content or the behavior of the presenter and typically require pre-configured setups that may not be easily transferable across different platforms or types of multimedia content.
In addition, existing systems usually offer limited personalization options such as language preferences, narration speed, and volume control. These constraints hinder visually impaired individuals from fully engaging with the multimedia content being presented. The situation is further exacerbated by the lack of real-time user feedback mechanisms and supplementary features like special audio effects, which could otherwise enhance comprehension and user experience.
In the light of the aforementioned discussion, there exists a need for a system and method capable of providing real-time, context-rich audio and text narration for multimedia presentations. Such a system would be fully integrated with a range of multimedia platforms, adaptable to real-time changes, and customizable to individual user preferences, thereby significantly enhancing accessibility.
The following invention presents a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not an extensive overview of the disclosure and it does not identify key/critical elements of the invention or delineate the scope of the invention. Its sole purpose is to present some concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later.
An objective of the present disclosure is directed towards a system and method for enhancing and augmenting presentation accessibility through real time audio and text narration.
Another objective of the present disclosure is directed towards a comprehensive solution for enhancing and augmenting the accessibility of multimedia presentations for visually impaired individuals through real-time audio and text narration.
Another objective of the present disclosure is directed towards a system that can adapt in real-time to changes in multimedia content or presenter behavior, thereby ensuring that the provided audio and text narrations are contextually appropriate and up-to-date.
Yet another objective of the present disclosure is directed towards a system that is fully integrated with a variety of multimedia platforms, ranging from slide presentation software to video and interactive content providers, ensuring a seamless user experience across diverse forms of media.
A further objective of the present disclosure is directed towards providing a highly customizable user experience, including but not limited to preferences for language, narration speed, and volume, thereby addressing the individual needs of each user.
An additional objective of the present disclosure is directed towards incorporating real-time user feedback mechanisms and supplementary features like special audio effects to further enhance the comprehension and overall user experience for visually impaired individuals.
According to an exemplary aspect of the present disclosure includes a system and method for enhancing and augmenting presentation accessibility through real time audio and text narration.
According to another exemplary aspect of the present disclosure includes a Real-Time Audio and Text Narration Engine that processes multimedia content to generate contextual audio and text narrations. The engine utilizes advanced artificial intelligence modules to interpret multimedia content and produce narrations that are then delivered to users' devices in real-time.
According to another exemplary aspect of the present disclosure is directed towards full integration with various multimedia platforms. Through Application programming interface (APIs) or embedded bots, the system is designed to be compatible with slide presentation module, video streaming services, and interactive content providers. This ensures seamless narration delivery across multiple types of multimedia content.
According to another exemplary aspect of the present disclosure is directed towards an Adaptive Real-Time Response feature, which tracks changes in the multimedia presentation or presenter's behavior. This feature uses real-time analytics to adapt the narration dynamically, ensuring that the content is always contextually appropriate and up-to-date.
According to another exemplary aspect of the present disclosure is directed towards Customizable User Settings that allow users to personalize their experience by selecting their preferred language, narration speed, and volume. These settings can be adjusted on-the-fly, providing a tailored experience for each individual user.
According to another exemplary aspect of the present disclosure is directed towards a Real-Time Feedback Mechanism that allows users to rate the quality of narrations and the show, as well as any supplementary features like special audio effects. This feedback is processed in real-time to further enhance the system's performance and user experience.
According to another exemplary aspect of the present disclosure is directed towards a Presenter-Side Script and Control System that interacts with the presentation module. It enables the presenter to control the timing and type of narrations sent out, offering yet another layer of customization and adaptability.
According to another exemplary aspect of the present disclosure is directed towards an AI-Generated Slide Summary and Description feature, where the AI engine scans through the multimedia content and generates concise summaries or descriptions, which are then converted into audio or text narrations.
According to another exemplary aspect of the present disclosure is directed towards an integration with Voiceover and Special Effect Audio. In this setup, text descriptions can be read out by a voiceover function while special effect audio plays in the background, enhancing the sensory experience for the user.
According to another exemplary aspect of the present disclosure, a computing and/or communication device comprises a display unit for showing a presentation content, and a processor for executing instructions from a real-time audio and text narration engine located within the computing and/or communication device.
According to another exemplary aspect of the present disclosure, the real-time audio and text narration engine comprises a presenter-side script and control module configured to enable a presenter to initiate a presentation and control slide transitions, whereby the real-time audio and text narration engine actively listens to the presenter's speech to detect specific cues and markers embedded within the presentation content using natural language processing algorithms.
According to another exemplary aspect of the present disclosure, the real-time audio and text narration engine initiates dynamic responses for changing slides, and selects and delivers appropriate images, audio playback, and text file descriptions in real-time, generating audio playback with an into-sound indicator, an AI voice reading description, and sound effects related to a slide, generating text file descriptions with a voiceover accessibility feature and special effect audio playing simultaneously to enhance user experience, presenting selected images on a user interface of the computing and/or communication device, and transmitting audio playback and text file descriptions to the computing and/or communication device to enhance accessibility for users, including visually impaired members.
According to another exemplary aspect of the present disclosure, a server communicatively coupled to the computing and/or communication device via the network, wherein the server comprises a receiver module configured to receive the presentation content from the computing and/or communication device.
According to another exemplary aspect of the present disclosure, a processing module configured to generate real-time audio and text narration from the presentation content.
According to another exemplary aspect of the present disclosure, the processing module comprising an adaptive real-time response module configured to monitor the presenter's presentation content for cues and markers by tracking the slides being displayed, the adaptive real-time response module further configured to monitor users' interactions and feedback, analyze the context of the presentation, and assess the importance of slides and the presenter's mood, whereby the adaptive real-time response module makes real-time adjustments, alters the speed of the narration, modifies the sequence of slides, and integrates feedback, thereby providing synchronized real-time audio and text narration that enhances presentation accessibility for users.
In the following, numerous specific details are set forth to provide a thorough description of various embodiments. Certain embodiments may be practiced without these specific details or with some variations in detail. In some instances, certain features are described in less detail so as not to obscure other aspects. The level of detail associated with each of the elements or features should not be construed to qualify the novelty or importance of one feature over the others.
Furthermore, the objects and advantages of this invention will become apparent from the following description and the accompanying annexed drawings.
It is to be understood that the present disclosure is not limited in its application to the details of construction and the arrangement of components set forth in the following description or illustrated in the drawings. The present disclosure is capable of other embodiments and of being practiced or of being carried out in various ways. Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting.
The use of “including”, “comprising” or “having” and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. The terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced item. Further, the use of terms “first”, “second”, and “third”, and so forth, herein do not denote any order, quantity, or importance, but rather are used to distinguish one element from another.
Referring to
Element 106 may symbolize the network through which the computing and/or communication device 102 and server 104 may communicate. This network 106 could be the internet for online presentations, or a local area network (LAN) for onsite presentations. Element 108 may represent memory, which could be either server-side storage or client-side storage. This memory 108 may store the audio and text files generated for each slide or piece of multimedia content, along with other data like user settings and preferences.
Element 110 may be the Real-Time Audio and Text Narration Engine, which may serve as the core functionality of the system. This engine may be responsible for the actual generation of audio and text descriptions based on the slides and user inputs. It may reside within both the server 104 and the client-side computing and/or communication device 102, potentially enabling real-time communication and adaptation based on user feedback. The block diagram 100 may offer a high-level overview of how the different components of the system may interact to deliver real-time audio and text narration, thereby potentially enhancing the accessibility of presentations or multimedia content for visually impaired individuals.
Element 102 represents a computing and/or communication device, which can be a desktop, laptop, tablet, or mobile device. This computing and/or communication device 102 may serve as the client-side interface where users, including both the presenter and individuals with visual impairments, interact with the system. The computing and/or communication device 102 includes: A display unit 118, which is responsible for showing the presentation content. A processor 112, which executes instructions from the real-time audio and text narration engine 110. Memory 108, which stores the real-time audio and text narration engine 110. The real-time audio and text narration engine 110 comprises several functional modules, including a presenter-side script and control module, special purpose audio effects module, customizable user settings module, real-time feedback module, and user interface module, all working together to facilitate the presentation. Element 104 represents the server, which hosts the core functionalities of the system. The server 104 is responsible for generating real-time audio and text narration based on the slides or multimedia content provided by the presenter. The server 104 handles processing user inputs and settings, among other administrative tasks. The server 104 includes: A receiver module 114, which is configured to receive the presentation content from the computing and/or communication device 102 via the network 106. The real-time audio and text narration engine 110, which is responsible for generating the audio and text narration. A processing module 116, which processes the presentation content to generate real-time audio and text narration. The processing module 116 comprises an adaptive real-time response module that monitors the presenter's presentation content for cues and markers, tracks the slides being displayed, and analyzes user interactions and feedback. This module makes real-time adjustments, alters the speed of the narration, modifies the sequence of slides, and integrates feedback, thereby providing synchronized real-time audio and text narration that enhances presentation accessibility for users. The network 106 facilitates communication between the computing and/or communication device 102 and the server 104. This network 106 may be the internet for online presentations or a local area network (LAN) for onsite presentations. In summary, the block diagram 100 offers a high-level overview of how the different components of the system interact to deliver real-time audio and text narration, thereby potentially enhancing the accessibility of presentations or multimedia content for visually impaired individuals.
Referring to
Element 202 may represent the client-side computing and/or communication device, which may include various functional modules designed to improve user interaction and experience. This computing and/or communication device may be the same as element 102 in
Element 204 may denote the server, which may also include a set of functional modules aimed at executing core functionalities and administrative tasks. This server 204 may correspond to element 104 in
Module 206, or the Customizable User Settings Module, may allow users to personalize various aspects of the presentation, such as the speed of the narration, the volume of the audio, and the display settings. Module 208, known as the Special Purpose Audio Effects Module, may introduce ambient sounds or special audio effects to enhance the narration. Module 210, termed as the Real-Time Feedback Module, may capture user feedback in real time to adapt the presentation's audio and text narration accordingly. Module 212, the Presenter-Side Script and Control Module, may enable the presenter to control the flow of the presentation, including slides and associated narrations. Module 214, called the Voiceover and Special Effect Integration Module, may handle the integration of voiceovers and special effects into the presentation. Module 216, the User Interface (UI) Module, may be responsible for the design and interaction of the system's interface from the client side.
Modules 218 through 226 may be primarily located on the server side, with Module 218, the Multi-Platform Integration Module, enabling compatibility with various presentation software platforms; Module 220, the Adaptive Real-Time Response Module, automatically adapting the system's output based on real-time conditions; Module 222, the Slide Summary and Description Module, generating brief summaries and detailed descriptions of each slide; Module 224, the Communication and API Module, managing the data exchange between the server and client; and Module 226, the Data Storage and Retrieval Module, storing and retrieving presentation data.
Referring to
Component 301, termed as the Preference Manager Component, may be the central hub for managing all user-specific settings. It may interface with other components within this module to store and retrieve user preferences, such as language, narration speed, and volume levels. Component 302, known as the Language Selector Component, may enable users to choose the language in which they wish to receive the audio and text narration. This may be especially useful in multi-lingual settings or for users who are more comfortable with a language other than the default. Component 306, called the Speed and Volume Control Component, may allow users to adjust the speed of the audio narration as well as the volume. Users may customize these settings to match their listening capabilities or to suit the ambient noise conditions. Together, these components may form the Customizable User Settings Module 206, which may be designed to enhance user experience by providing a flexible and personalized interface.
Referring to
The Effect Generator Component 402 may be the primary element within the Special Purpose Audio Effects Module 208. It may be responsible for generating various special audio effects, such as echoes, reverberation, pitch modulation, and other sound manipulations. This component may interact with the Real-Time Audio and Text Narration Engine to integrate these effects into the live or pre-recorded narration, depending on user settings or preset conditions. In summary, the Special Purpose Audio Effects Module 208, primarily through its Effect Generator Component 402, may add an additional layer of engagement and interest to the audio component of the presentation.
Referring to
Referring to
The Feedback Collector Component 502 may be tasked with gathering user feedback in various forms, such as likes, dislikes, comments, or other user interactions during the presentation. This feedback may be collected from multiple sources, including in-app user inputs, voice commands, or other third-party integrations. The Feedback Analysis Component 504 may work in conjunction with the Feedback Collector Component 502 to analyze the collected data. The analysis may include sentiment analysis, frequency counts, or other relevant metrics that help in understanding the effectiveness of the presentation or narration. The results of this analysis may then be utilized to make real-time adjustments to the presentation or for future reference. In summary, the Real-Time Feedback Module 210, primarily consisting of the Feedback Collector Component 502 and Feedback Analysis Component 504, may offer a comprehensive system for enhancing user experience by gathering and interpreting user feedback in real-time.
Referring to
The Presenter Interface Component 602 may serve as the primary interaction point for the presenter, enabling them to manage various aspects of the presentation. This could include slide navigation, activation of specific narration or audio effects, and other customizable controls that the presenter may need during the presentation. The Timing Manager Component 604 may be responsible for coordinating the timing aspects of the presentation. This could include synchronizing the audio and text narration with the slide transitions, providing countdowns, or triggering specific actions based on pre-set times or conditions. The Narration Trigger Component 606 may function to initiate the real-time audio and text narration. It may operate based on cues from the Timing Manager Component 604 or direct input from the Presenter Interface Component 602. This component may also interact with other system modules to ensure that the narration is consistent with the presentation's current state and user settings. In summary, the Presenter-Side Script and Control Module 212, featuring the Presenter Interface Component 602, Timing Manager Component 604, and Narration Trigger Component 606, may offer an advanced toolset to presenters for optimizing the flow and interactivity of their presentations.
Referring to
The Text-to-Voice Converter Component 702 may serve as the main element responsible for converting written text into spoken words. It may take input from the Presenter-Side Script and Control Module 212 or directly from pre-written scripts and convert it into real-time audio narration, which can be included in the presentation. The Effect Integration Component 704 may be tasked with incorporating various special audio effects into the presentation. These could range from background music to specific sound effects that may enhance the overall presentation experience. The Effect Integration Component 704 may synchronize these effects with the audio narration or other media, as directed by other system modules or user settings. The Multimedia Output Component 706 may serve as the final stage for the assembled audio stream. It may combine the audio narration generated by the Text-to-Voice Converter Component 702 and any special audio effects integrated by the Effect Integration Component 704. The Multimedia Output Component 706 may then output this complete audio package to the presentation system or directly to the audience's devices. In summary, the Voiceover and Special Effect Integration Module 214, featuring the Text-to-Voice Converter Component 702, Effect Integration Component 704, and Multimedia Output Component 706, may offer a comprehensive solution for enriching presentations with advanced audio capabilities.
Referring to
The Navigation Component 802 may be responsible for enabling the users to navigate through the various functionalities and settings available in the system. This could include moving from one section or feature to another, accessing additional information, or interacting with the presentation in real-time. The Display Component 804 may manage the visualization of the presentation as well as other pertinent information and settings. It may include a graphical user interface (GUI) designed to make it easy for both presenters and audience members to view and understand the presentation content, audio narrations, and associated features. The Settings Interface Component 806 may allow users to customize various aspects of the system to their preference. This could include altering audio settings, changing the language, and adjusting the display among other options. The Settings Interface Component 806 may interact closely with the Customizable User Settings Module 206 to provide a personalized experience for each user. In summary, the User Interface (UI) Module 216, equipped with the Navigation Component 802, Display Component 804, and Settings Interface Component 806, may serve as the front-end of the system, offering an intuitive and customizable experience for both presenters and audience members.
Referring to
The Platform Identification Component 902 may be responsible for identifying the various platforms with which the system can interact. This could include different types of presentation software, third-party applications, or even various operating systems. This component may also handle compatibility checks and ensure that the system can operate efficiently across these platforms. The API Connector Component 906 may handle the interaction between the Multi-Platform Integration Module 218 and other systems through the use of Application Programming Interfaces (APIs). This can include transmitting data, sending commands, or receiving information from third-party platforms or services. The Data Mapping Component 908 may be designed to map the data and functionalities from the system onto the platform it is integrated with. This could involve converting the system's data formats to those that can be understood by third-party platforms, or vice versa, and ensuring that functionalities in one system translate effectively into another. In summary, the Multi-Platform Integration Module 218 may serve as the conduit for ensuring that the system is compatible and can function seamlessly across various platforms. It achieves this through the Platform Identification Component 902, API Connector Component 906, and Data Mapping Component 908.
Referring to
The Content Monitoring Component 1002 may be configured to continuously observe the presentation content in real-time. This could include tracking the slides being displayed, the pace of the presentation, as well as any audio or text narratives that are part of it. It may also monitor user interactions and feedback to help adapt the presentation accordingly. The Context Analysis Component 1004 may work in tandem with the Content Monitoring Component 1002 to analyze the context in which the presentation is being made. This could involve understanding the audience demographics, the importance of particular slides or sections, or even the mood of the presenter. Based on this analysis, the component may make suggestions for real-time adjustments. The Real-Time Update Component 1006 may be responsible for implementing any changes suggested by the Context Analysis Component 1004. This could include altering the speed of the narration, modifying the sequence of slides, or integrating real-time feedback into the presentation. These updates are meant to enhance the accessibility and effectiveness of the presentation as it is happening. In summary, the Adaptive Real-Time Response Module 220 may serve to make real-time adaptations to a presentation, thereby enhancing its accessibility and effectiveness. It accomplishes this through its Content Monitoring Component 1002, Context Analysis Component 1004, and Real-Time Update Component 1006.
Referring to
The Content Scanning Component 1102 may be tasked with examining the content of each slide in a presentation. It may analyze text, images, graphics, and other multimedia elements present on the slide. By scanning the content, it can gather the necessary data required to generate a comprehensive summary and description. The Summary Creation Component 1104 may utilize the data collected by the Content Scanning Component 1102 to create a concise summary of the slide. The goal of this component may be to offer a quick overview of the main points or topics covered in the slide, allowing listeners or readers to quickly grasp the core message. The Description Creation Component 1106 may delve deeper, creating a detailed description of the slide's content. This can be particularly beneficial for individuals who may have visual impairments, providing them with a thorough understanding of the slide's content through audio narration. The description may cover all elements on the slide, including text, graphics, and any other multimedia components, ensuring a comprehensive understanding. In essence, the Slide Summary and Description Module 222 may aim to make presentations more accessible by providing both brief summaries and in-depth descriptions of each slide. This is accomplished through the combined efforts of the Content Scanning Component 1102, Summary Creation Component 1104, and Description Creation Component 1106.
Referring to
The Data Send/Receive Component 1202 may be responsible for managing the transmission of data to and from the client-side computing and/or communication device and the server. This can include the sending of audio files, text narrations, presentation slides, user settings, and other relevant data that the system may require for real-time audio and text narration. The API Management Component 1204 may handle interactions with third-party services or platforms, which could range from social media networks to other types of multimedia content providers. Through the use of APIs (Application Programming Interfaces), this component may enable seamless integration and data exchange, thus extending the capabilities of the Real-Time Audio and Text Narration Engine.
The Data Synchronization Component 1206 may work in conjunction with the Data Send/Receive Component 1202 and the API Management Component 1204 to ensure that all data across the system is up-to-date and consistent. This might be particularly important when the system is being used on multiple platforms or devices simultaneously. In summary, the Communication and API Module 224 may serve as the backbone for data transfer and communication within the system, and possibly with external platforms. This is enabled by the Data Send/Receive Component 1202, the API Management Component 1204, and the Data Synchronization Component 1206.
Referring to
The User Data Storage Component 1302 may be responsible for securely storing all user-specific data. This could include user settings, preferences, customized profiles, and other data that are crucial for tailoring the presentation experience according to individual needs. The Multimedia Data Storage Component 1304 may handle the storage of all multimedia content that the system processes or generates. This can consist of audio files, text narrations, presentation slides, and even potentially video files or other types of multimedia. The Data Retrieval Component 1306 may manage the fetching of stored data upon request. Whether it is the user who requires a past setting or the system itself that needs to pull up a specific multimedia file for real-time audio and text narration, this component may ensure that the correct data is retrieved quickly and efficiently. In summary, the Data Storage and Retrieval Module 226 may serve as the central repository for all data that the system uses. It may be composed of the User Data Storage Component 1302 for personal settings and profiles, the Multimedia Data Storage Component 1304 for all types of multimedia content, and the Data Retrieval Component 1306 to facilitate timely and accurate data fetching.
Referring to
CPU 1410 may execute instructions stored in RAM 1420 to provide several features of the present disclosure. CPU 1410 may contain multiple processing units, with each processing unit potentially being designed for a specific task. Alternatively, CPU 1410 may contain only a single general-purpose processing unit. RAM 1420 may receive instructions from secondary memory 1430 using communication path 1450. RAM 1420 is shown currently containing software instructions, such as those used in threads and stacks, constituting shared environment 1425 and/or user programs 1426. Shared environment 1425 includes operating systems, device drivers, virtual machines, etc., which provide a (common) run time environment for execution of user programs 1426.
Graphics controller 1460 generates display signals (e.g., in RGB format) to display unit 1470 based on data/instructions received from CPU 1410. Display unit 1470 contains a display screen to display the images defined by the display signals. Input interface 1490 may correspond to a keyboard and a pointing device (e.g., touch-pad, mouse) and may be used to provide inputs. Network interface 1480 provides connectivity to a network (e.g., using Internet Protocol), and may be used to communicate with other systems (such as those shown in
Some or all of the data and instructions may be provided on removable storage unit 1440, and the data and instructions may be read and provided by removable storage drive 1437 to CPU 1410. Floppy drive, magnetic tape drive, CD-ROM drive, DVD Drive, memory, removable memory chip (PCMCIA Card, EEPROM) are examples of such removable storage drive 1437. Removable storage unit 1440 may be implemented using medium and storage format compatible with removable storage drive 1437 such that removable storage drive 1437 can read the data and instructions. Thus, removable storage unit 1440 includes a computer readable (storage) medium having stored therein computer software and/or data. However, the computer (or machine, in general) readable medium can be in other forms (e.g., non-removable, random access, etc.).
In this document, the term “computer program product” is used to generally refer to removable storage unit 1440 or hard disk installed in hard drive 1435. These computer program products are means for providing software to digital processing system 1400. CPU 1410 may retrieve the software instructions, and execute the instructions to provide various features of the present disclosure described above.
The term “storage media/medium” as used herein refers to any non-transitory media that store data and/or instructions that cause a machine to operate in a specific fashion. Such storage media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical disks, magnetic disks, or solid-state drives, such as storage memory 1430. Volatile media includes dynamic memory, such as RAM 1420. Common forms of storage media include, for example, a floppy disk, a flexible disk, hard disk, solid-state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge.
Storage media is distinct from but may be used in conjunction with transmission media. Transmission media participates in transferring information between storage media. For example, transmission media includes coaxial cables, copper wire and fibre optics, including the wires that comprise bus (communication path) 1450. Transmission media can also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
Referring to
The Presentation Deck 1502 may represent the collection of slides that the presenter wishes to showcase. It may contain various forms of content including text, images, videos, and other multimedia elements. The SlideTeller AI Description Engine 1504 may be the core component that generates real-time audio and text narrations for each slide in the Presentation Deck 1502. This engine may utilize advanced algorithms to interpret and describe the contents of each slide accurately. The Edit Slide Description 1506 may be a feature that allows the presenter to manually edit or approve the AI-generated slide descriptions. This feature may offer greater control over the content and its representation during the presentation. The Publish 1508 function may be used to finalize the presentation, making it ready for public or private viewing. The system may generate two types of output, namely Audio 1508A and Text 1508B. Audio 1508A can be an audio narration of the slide contents, while Text 1508B can be a textual description or summary.
The SlideTeller AI API 1510 may serve as the interface between the Slide-Teller application and other third-party services or applications. This may allow for extended functionality and integration beyond the native capabilities of the Slide-Teller application.
The SlideTeller APP 1512 may be the client-side application through which end-users interact with the presentation. It may provide various features such as real-time narration playback, textual summaries, and possibly other interactive elements to enrich the presentation experience. In summary,
Referring to
The audio playback generated by the Slide-Teller Publisher may consist of several segments, providing a rich and interactive user experience. These segments include an Into-sound Indicator: This component serves as an initial auditory cue or signal to the user that an audio file is about to play. The Into-sound Indicator may use a unique sound pattern or melody to capture attention and prepare the user for the upcoming audio content. AI Voice Reading Description: After the Into-sound Indicator, the AI Voice Reading Description component takes over. This feature uses advanced text-to-speech algorithms to convert the slide descriptions into a natural, easy-to-understand voice narration. The narration aims to provide users with a detailed explanation or summary of the slide content, enhancing the overall accessibility and understanding of the presentation. Sound Effects Related to Slide: To further enrich the audio experience, Sound Effects Related to Slide are integrated into the audio file. These sound effects may be customized to match the slide content, adding an extra layer of engagement. For example, a slide discussing oceanography might include the sound of waves, while a slide focusing on automotive innovation could incorporate engine sounds.
In summary,
Referring to
The generated text file is designed to be read by the voiceover accessibility feature commonly found in smartphones and other computing and/or communication devices. The voiceover function may read the text description in the background to enhance the accessibility of the presentation for visually impaired users or those who prefer auditory learning. For example, if the exemplary content shown in the presentation slide is “Point A: Japan, Point B: Los Angeles. Connected by a dashed line with an airplane flying toward Los Angeles,” then the voiceover accessibility feature would read this text description out loud. It provides the user with an auditory understanding of the slide's visual elements, narrating the objects and their interactions, such as the airplane flying from Japan to Los Angeles.
Concurrent with the voiceover reading, special effect audio may be played to enrich the user's experience. These sound effects are custom-tailored to the slide's content. In the given example, one could imagine the sound of an airplane taking off or flying in the sky, adding an additional layer of context and immersion for the user. In summary,
Referring to
In
In
In one of the scenarios illustrated in
If the “By Audio” option is selected, the Slide-Teller bot will provide a “play slide 1 button” and a “replay button.” Upon selecting the “play slide 1 button,” the user will hear an audio narration saying, “Point A is Japan, and Point B is Los Angeles, connected by a dashed line with an airplane flying toward Los Angeles.” This narration would be accompanied by the same special sound effects, adding a layer of immersion to the presentation.
According to exemplary embodiments that are not limiting, if certain third-party applications like Apple Keynote or Google Slide do not support plugins, an alternative script-based approach can be taken. This script controls the respective presentation application and connects it to Slide-Teller's API. This enables a Slide-Teller bot to manage various functionalities like audio playback, thereby enhancing accessibility for visually impaired users.
The Slide-Teller script, when run on a Mac, can control PowerPoint or Keynote applications. As the presenter navigates from one slide to another, the script sends either audio or text files to the Slide-Teller bot. The visually impaired user can then access this information via their screen reader. The exemplary script to accomplish this is provided below:
In the example script, variables are defined to specify the folder containing MP3 files (theMp3FilesFolderName) and a separator (theSepearator). The script then uses System Events and AppleScript to interact with the Keynote application. It starts the presentation and monitors the current slide. If a slide that is not to be ignored is displayed (nonIgnoredSlides), and it has not already been processed (msgSent2), the script will send the corresponding message and/or audio file. In essence, the Slide-Teller system employs this script as an intermediary between the presentation software and its bot, streamlining the delivery of audio or text descriptions to enhance the experience for all users, including those with visual impairments.
In another exemplary embodiment, PowerPoint users can benefit from Slide-Teller's functionality through a feature known as Add-Ons. Upon installing the Slide-Teller Add-On, users have the option to either manually add slide descriptions or have the Slide-Teller AI generate descriptions for them. These descriptions can be edited as needed, and the presentation can be published to generate the necessary assets and upload them to the server.
Referring to
As the presenter delivers their predetermined script, which is enriched with embedded markers for multimedia cues, Slide-Teller is ever-attentive. In this example, the script leads up to the phrase, “This is exactly what I wanted. Thank you!” Accompanying this phrase is a special marker that instructs the Slide-Teller system to initiate a slide change and send specific files and descriptions, tagged as “(H)” in the process flow. Upon recognizing this phrase, the Slide-Teller app springs into action. It promptly selects the appropriate image, audio, and text files, denoted as “(I),” and sends them for display on the presentation screen and to mobile devices tailored for visually impaired individuals. The image that aligns with the script's context is projected onto the presentation screen, an action identified as “(J)” in the diagram. Concurrently, the application sends audio descriptions to the mobile app, a feature marked as “(K),” thereby offering an inclusive and enhanced experience for visually impaired attendees. This seamless orchestration of multimedia elements exemplifies Slide-Teller's capability to make presentations more inclusive and accessible.
Referring to
The interface may include multiple components tailored to the needs of the presenter, such as slide thumbnails, a timer, and audio controls, among others. These components are organized in an intuitive layout that may adapt as the show progresses, offering a seamless and interactive user experience. The dynamism of the screen is evident as slides advance, new audio files are cued, or as new data is input into the system, offering a truly interactive and responsive environment for the presenter.
Features like live captions and real-time annotations may also be available on the main screen, offering auxiliary channels of communication to the audience. These elements not only serve to improve the presentation but may also offer additional layers of accessibility for differently-abled individuals. The main screen may also interface with other modules of the Slide-Teller system, such as the Communication and API Module or the Data Storage and Retrieval Module, enabling a holistic and interconnected operation. In summary,
Referring to
In an exemplary scenario, upon reaching the Playbill slide, the Slide-Teller application may automatically cue an audio file that provides a comprehensive description of the Playbill's content. This auditory description may serve as an invaluable aid to those who rely on audio cues for comprehension. Furthermore, the screen layout is designed to be intuitive, with easy-to-navigate controls for audio playback, such as play, pause, and skip buttons, allowing the user to control the auditory experience. Additionally, the application may offer the flexibility to switch to a text version of the Playbill description. This option allows users to utilize their own screen reader or voiceover software to read the text, thereby offering another layer of accessibility. Such a feature may be particularly beneficial for those who prefer reading at their own pace or those who may want to use a specific voiceover service that they are accustomed to.
In summary,
Referring to
In line with the operation of Slide-Teller, the first step involves the Admin or AI sending an audio file or text description to the user's phone. The design takes into account the public nature of presentations; hence the phone may not play the audio unless headphones are connected to prevent disturbing the audience around the user. Upon displaying a new slide in the app, the user may receive a haptic notification or vibration as an initial alert. This is particularly useful for visually impaired users as it serves as a tactile indicator that new content is available for review. Following the haptic notification, the user may then select the “Play” button on the interface. Once activated, the app may first play a sound indicator as an additional prompt, immediately followed by the audio description of the slide. This sequence ensures that the user is adequately prepared to receive the information being presented.
For those who prefer reading, the application may also offer an option to display a text version of the slide description. Users can let their own voiceover reader interpret this text, thus giving them the freedom to utilize accessibility features they are comfortable with. After the audio has been played or the text has been read, the user may have the option to replay either the text or the audio, affording them the opportunity to review the content as many times as needed for comprehension.
Lastly, the user interface may include settings that allow personalization based on individual preferences. These settings may cover options for language selection, speed of playback, and volume control. In summary,
Referring to
Moreover, the audio files sent to the users may include special effects layered in the background. These special effects aim to elevate the auditory experience and contribute to a more engaging and dynamic presentation. By incorporating auditory embellishments, the application not only conveys the slide information but also aims to capture and sustain user attention throughout the show.
In cases where text is sent to the users, the Slide-Teller application may possess the capability to concurrently handle both text and audio. Specifically, while the voiceover feature reads the text aloud for the user, special effect audio may play in the background. This dual functionality could further enrich the multi-sensory experience for the user, combining textual, auditory, and even haptic elements for a more comprehensive engagement with the content. In summary,
Referring to
The Slide-Teller API may be designed to handle various functionalities of the dynamic app, categorically segmented into basic features for user-friendly navigation. These features may include the Main screen, which serves as the landing page and central hub for accessing other functionalities. Another notable feature is the Playbill for the show, which offers an overview of the presentation and can also include additional media such as descriptive audio. Within the presentation, the Show slides feature is where the core of the presentation takes place, displaying the slides in real-time as the presenter advances through them. It is within this section that a rating system for the show may be embedded, offering users the opportunity to provide real-time feedback on individual slides or the presentation as a whole.
The How Slide-Teller Works feature is another fundamental aspect managed by the API. This section serves an educational purpose, explaining to users how to get the most out of the Slide-Teller experience, including how to switch between audio and text descriptions. Lastly, a Join Mailing List feature may be included, serving as an opt-in mechanism for users who wish to stay updated on future presentations or updates to the Slide-Teller application itself. In essence,
Referring to
Immediately after step 2506, the user may transition to step 2508. In this phase, the user could receive real-time audio and/or text descriptions as the presenter may initiate the talk. The slides may automatically display on the user's screen, offering a potentially interactive and informative experience throughout the presentation. The final segment of the user's journey is outlined in step 2510. Here, the user may be prompted to submit feedback and ratings at the conclusion of the presentation. This step not only allows for potential immediate user interaction but also may contribute valuable insights that could be used for future improvements in the application. In summary,
Referring to
After step 2604, the sequence moves to step 2606. In this stage, the presenter may initiate the talk. The application might listen for specific speech cues from the presenter to auto-switch slides. This feature could ensure a smooth flow during the presentation, reducing manual intervention to a minimum. Step 2606 is followed by step 2608, in which the application may transmit real-time updates. During this phase, the application could harmonize the slides, audio, and text with the attendees' mobile application, maintaining a coherent and synchronized experience for all involved.
Finally, the presenter's journey culminates in step 2610. At this point, the presenter may conclude the talk and could receive instant feedback and ratings from the audience via the application. This final step may provide the presenter with immediate insights into the success and areas for improvement in their presentation. To sum it up,
Referring to
Step 2704 is succeeded by step 2706, where the API might be tasked with updating slide information for all connected users in real-time. Whenever the presenter alters a slide, the API may ensure immediate synchronization across all user interfaces. This step is crucial for maintaining coherence and a unified experience during the presentation. Step 2708 follows, focusing on managing individual actions from users, such as requests for slide ratings or replays of audio descriptions. In this step, the API may be responsible for processing these specific requests and ensuring they are met in a timely manner. This capability allows for a personalized and interactive experience for each user.
The last step in this flow is step 2710, where the API may send out prompts to all users for their feedback and ratings at the end of the presentation. By doing so, the API could gather valuable user insights, which may be used for further refinement and improvement of the application. In summary,
Referring to
Step 2806 leads to step 2808, where the application may allow users to opt for their device's voiceover reader to read aloud the text description of a slide. This option provides an alternative means of accessibility, granting users another layer of customization and interaction with the content. Concluding the sequence is step 2810. In this step, users may have the option to personalize language, speed, and volume settings for a more accessible experience. This level of customization allows users to tailor the application to better suit their needs and preferences. In summary,
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through a special purpose audio effects module, the special purpose audio effects module comprising an audio processor and a sound library stored in the computing and/or communication device, configured to generate ambient sounds and special audio effects to enhance the narration.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through a customizable user settings module, the customizable user settings module comprising a preference manager component configured to manage user-specific settings, including language, narration speed, and volume levels.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through the customizable user settings module, the customizable user settings module comprising a language selector component configured to enable users to choose the language in which they wish to receive the audio and text narration.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through the customizable user settings module, the customizable user settings module comprising a speed and volume control component configured to enable users to adjust the speed of the audio narration and the volume to match the users' listening capabilities and to suit the ambient noise conditions.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through a real-time feedback module, the real-time feedback module comprising a feedback collector component configured to collect user feedback in various forms, including likes, dislikes, comments, and other user interactions during the presentation.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through the real-time feedback module, the real-time feedback module comprising a feedback analysis component configured to work in conjunction with a feedback collector component to analyze collected data for understanding the effectiveness of the presenter's presentation.
According to an exemplary aspect of the present disclosure, wherein the processor executes instructions from the real-time audio and text narration engine through the presenter-side script and control module, the presenter-side script and control module comprising a presenter interface component configured to enable the presenter to manage various aspects of the presentation, including slide navigation, activation of specific narration, audio effects, and other customizable controls during the presentation.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through the presenter-side script and control module, the presenter-side script and control module comprising a timing manager component configured to coordinate the timing aspects of the presentation, including synchronizing the audio and text narration with the slide transitions, providing countdowns, and triggering specific actions based on preset times and conditions.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through the presenter-side script and control module, the presenter-side script and control module comprising a narration trigger component configured to initiate the real-time audio and text narration based on cues from the timing manager component and direct input from the presenter interface component.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through a voiceover and special effect integration module, the voiceover and special effect integration module comprising a text-to-voice converter component configured to convert written text into spoken words.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through the voiceover and special effect integration module, the voiceover and special effect integration module comprising an effect integration component configured to incorporate various special audio effects into the presentation to enhance the overall presentation experience.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through a user interface module, the user interface module comprising a navigation component configured to enable users to navigate through various functionalities.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through the user interface module, the user interface module comprising a display component configured to manage the visualization of the presentation and related information.
According to an exemplary aspect of the present disclosure, the server executes instructions from the real-time audio and text narration engine through an adaptive real-time response module, the adaptive real-time response module comprising a content monitoring component configured to continuously observe the presentation content in real-time, including tracking the slides being displayed, the pace of the presentation, and the audio and text narratives.
According to an exemplary aspect of the present disclosure, the server executes instructions from the real-time audio and text narration engine through the adaptive real-time response module, the adaptive real-time response module comprising a context analysis component configured to analyze the context of the presentation, including understanding audience demographics, the importance of particular slides and sections, and the mood of the presenter.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through a special purpose audio effects module, the special purpose audio effects module comprising an effect generator component configured to incorporate various special audio effects into the presentation to enhance the overall presentation experience.
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through a user interface module, the user interface module comprising a settings interface component configured to enable users to personalize their experience by selecting preferred language, narration speed, and volume.
According to an exemplary aspect of the present disclosure, the server executes instructions from the real-time audio and text narration engine through an adaptive real-time response module, the adaptive real-time response module comprising a real-time update component configured to implement changes suggested by the context analysis component.
According to an exemplary aspect of the present disclosure, the server executes instructions from the real-time audio and text narration engine through a slide summary and description module, the slide summary and description module comprising a content scanning component configured to analyze text, images, graphics, and other multimedia elements present on the slide.
According to an exemplary aspect of the present disclosure, the server executes instructions from the real-time audio and text narration engine through a slide summary and description module, the slide summary and description module comprising a summary creation component configured to create concise summaries of the slide content.
According to an exemplary aspect of the present disclosure, the server executes instructions from the real-time audio and text narration engine through a slide summary and description module, the slide summary and description module comprising a description creation component configured to generate detailed descriptions of the slide content.
According to an exemplary aspect of the present disclosure, the server executes instructions through a multi-platform integration module, the multi-platform integration module comprising an API connector component configured to handle interactions between the multi-platform integration module and other systems using Application Programming Interfaces (APIs).
According to an exemplary aspect of the present disclosure, the processor executes instructions from the real-time audio and text narration engine through the voiceover and special effect integration module, the voiceover and special effect integration module comprising a multimedia output component configured to combine the audio narration generated by the text-to-voice converter and any special audio effects for playback.
According to an exemplary aspect of the present disclosure, the server executes instructions through the multi-platform integration module, the multi-platform integration module comprising a data mapping component configured to map the data and functionalities from the system onto the platform it is integrated with.
According to an exemplary aspect of the present disclosure, the server executes instructions through a communication and API module, the communication and API module comprising a data send/receive component configured to manage the transmission of data to and from the client-side computing and/or communication device and the server.
According to an exemplary aspect of the present disclosure, the server executes instructions through the communication and API module, the communication and API module comprising an API management component configured to handle interactions with third-party services or platforms through APIs.
According to an exemplary aspect of the present disclosure, the server executes instructions through the communication and API module, the communication and API module comprising a data synchronization component configured to ensure that all data across the system is up-to-date and consistent.
According to an exemplary aspect of the present disclosure, the server executes instructions through a data storage and retrieval module, the data storage and retrieval module comprising a user data storage component configured to securely store user-specific data including settings, preferences, and profiles.
According to an exemplary aspect of the present disclosure, the server executes instructions through the data storage and retrieval module, the data storage and retrieval module comprising a multimedia data storage component configured to store multimedia content including audio files, text narrations, and presentation slides.
According to an exemplary aspect of the present disclosure, the server executes instructions through the data storage and retrieval module, the data storage and retrieval module comprising a data retrieval component configured to fetch stored data upon request for use by the system and users.
According to an exemplary aspect of the present disclosure, enabling a presenter to initiate and manage various aspects of the presentation on a computing and/or communication device using a presenter-side script and control module;
According to an exemplary aspect of the present disclosure, actively listening to the presenter's speech using a real-time audio and text narration engine to detect specific cues and markers embedded within the presenter's presentation content using natural language processing algorithms.
According to an exemplary aspect of the present disclosure, initiating dynamic responses for changing slides and providing additional information upon detecting the cues and markers within the presenter's presentation content using the real-time audio and text narration engine.
According to an exemplary aspect of the present disclosure, selecting and delivering appropriate images, audio playback, and text file descriptions in real-time upon detecting the cues and markers within the presenter's presentation content;
According to an exemplary aspect of the present disclosure, presenting the selected images on a user interface of the computing and/or communication device and transmitting the audio playback and text file descriptions to the computing and/or communication device to enhance accessibility for users, including visually impaired members.
According to an exemplary aspect of the present disclosure, allowing users to personalize various aspects of the presentation using a customizable user settings module in the real-time audio and text narration engine.
According to an exemplary aspect of the present disclosure, collecting and analyzing user feedback in real-time using a real-time feedback module in the real-time audio and text narration engine.
According to an exemplary aspect of the present disclosure, continuously monitoring the presenter's presentation content for additional cues and markers by tracking the slides being displayed using an adaptive real-time response module enabled in the server.
According to an exemplary aspect of the present disclosure, monitoring user interactions and feedback, and analyzing the context of the presentation, the importance of particular slides, and the mood of the presenter using the adaptive real-time response module.
According to an exemplary aspect of the present disclosure, altering the speed of the narration, modifying the sequence of slides, and integrating real-time feedback using the adaptive real-time response module.
Reference throughout this specification to “one embodiment”, “an embodiment”, or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrases “in one embodiment”, “in an embodiment” and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
Although the present disclosure has been described in terms of certain preferred embodiments and illustrations thereof, other embodiments and modifications to preferred embodiments may be possible that are within the principles and spirit of the invention. The above descriptions and figures are therefore to be regarded as illustrative and not restrictive.
Thus the scope of the present disclosure is defined by the appended claims and includes both combinations and sub-combinations of the various features described hereinabove as well as variations and modifications thereof, which would occur to persons skilled in the art upon reading the foregoing description.
This patent application claims priority benefit of U.S. Provisional Patent Application No. 63/535,664, entitled “SYSTEM AND METHOD FOR ENHANCING AND AUGMENTING PRESENTATION ACCESSIBILITY THROUGH REAL TIME AUDIO AND TEXT NARRATION”, filed on 31 Aug. 2023. The entire contents of the patent application are hereby incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
63535664 | Aug 2023 | US |