SYSTEMS, METHODS, AND APPARATUS FOR PROVIDING ACCESSIBLE USER INTERFACES

Information

  • Patent Application
  • 20210117152
  • Publication Number
    20210117152
  • Date Filed
    December 23, 2020
    4 years ago
  • Date Published
    April 22, 2021
    4 years ago
Abstract
Systems, methods, and apparatus for providing accessible user interfaces are disclosed. An example apparatus includes a display region analyzer to identify one or more of text or graphics in display frame image data, the display frame image data corresponding to a portion of a display frame associated with a touch event on a display screen of an electronic device; an audio controller interface to transmit, in response to the identification of the text in the display frame image data, an instruction including audio output corresponding to the text to be output by the electronic device; and a haptic feedback controller interface to transmit, in response to the identification of the graphics in the display frame image data, an instruction including a haptic feedback response to be output by the electronic device.
Description
FIELD OF THE DISCLOSURE

This disclosure relates generally to electronic computing devices and, more particularly, to systems, methods, and apparatus for providing accessible user interfaces.


BACKGROUND

An electronic user device can include user accessibility features to facilitate ease of access for users who are visually impaired, hearing impaired, neurologically impaired, and/or motor impaired when interacting with the device. Some user accessibility features include peripheral devices such as a Braille display for visually impaired users.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 illustrates an example system constructed in accordance with teachings of this disclosure and including an example user device and an example display region analyzer for identifying content in display frame(s) displayed/to be displayed via a display screen of the user device.



FIG. 2 is a communication diagram showing an example data exchange between a touch controller, a display controller, an audio controller, a haptic feedback controller, and the display region analyzer of FIG. 1 in accordance with teachings of this disclosure.



FIG. 3 illustrates a user touch event relative to an example display frame presented via the display screen of the example user device of FIG. 1.



FIG. 4 is a block diagram of the example system of FIGS. 1 and/or 2 including an example implementation of the display region analyzer and one or more computing systems for training neural network(s) to generate model(s) for use during analysis of display frames in accordance with teachings of this disclosure.



FIG. 5 is a flowchart representative of an example machine readable instructions to be executed by a display controller of the example system of FIGS. 1, 2, and/or 4 to generate display frame data in response to a touch event.



FIG. 6 is a flowchart representative of example machine readable instructions that, when executed by a first computing system of the example system of FIG. 4, cause the first computing system to train a neural network to identify content in a display frame.



FIG. 7 is a flowchart representative of example machine readable instructions that, when executed by a second computing system of the example system of FIG. 4, cause the second computing system to train a neural network to perform text recognition.



FIG. 8 is a flowchart representative of example machine readable instructions that, when executed, cause the example display region analyzer of FIGS. 1, 2, and/or 4 to analyze display frame content and to generate output(s) representative of the content.



FIG. 9 is a block diagram of an example processing platform structured to execute the instructions of FIG. 6 to implement the example first computing system of FIG. 4.



FIG. 10 is a block diagram of an example processing platform structured to execute the instructions of FIG. 7 to implement the example second computing system of FIG. 4.



FIG. 11 is a block diagram of an example processing platform structured to execute the instructions of FIG. 8 to implement the example user device of FIG. 1.



FIG. 12 is a block diagram of an example implementation of the system-on-chip of FIG. 1.



FIG. 13 is a block diagram of an example software distribution platform to distribute software (e.g., software corresponding to the example computer readable instructions of FIG. 8) to client devices such as consumers (e.g., for license, sale and/or use), retailers (e.g., for sale, re-sale, license, and/or sub-license), and/or original equipment manufacturers (OEMs) (e.g., for inclusion in products to be distributed to, for example, retailers and/or to direct buy customers).





The figures are not to scale. Instead, the thickness of the layers or regions may be enlarged in the drawings. In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts.


Unless specifically stated otherwise, descriptors such as “first,” “second,” “third,” etc. are used herein without imputing or otherwise indicating any meaning of priority, physical order, arrangement in a list, and/or ordering in any way, but are merely used as labels and/or arbitrary names to distinguish elements for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for identifying those elements distinctly that might, for example, otherwise share a same name.


DETAILED DESCRIPTION

An electronic user device can include user accessibility features to facilitate ease of access for users who are visually impaired, hearing impaired, neurologically impaired, and/or motor impaired when interacting with the device. Some user accessibility features are provided by an operating system of the user device and/or user applications installed on the user device to increase ease of interaction of, for instance, a visually impaired user with the device. Such user accessibility features can include adjustable sizing of icons, font, or cursors; screen contrast options; magnifiers; and/or keyboard shortcuts. Some user devices provide hardware support for peripheral Braille displays that translate text in a user interface displayed by the user device into Braille, which can be read by the user at the peripheral Braille display.


Although known user accessibility features can facilitate interactions by a visually impaired user with the user device, such user accessibility features are limited with respect to an amount of accessibility provided. For instance, known user accessibility features are typically associated with an operating system and, thus, may not be available with third-party applications installed on the device. Therefore, user accessibility features such as increased font sizing may not be compatible with all applications installed on the user device. Further, user accessibility features that are provided by an operating system are not available if, for instance, the user device is in a pre-operating system boot mode, such as a Basic Input Output System (BIOS) mode because the operating system is not running when the user device is in the preboot BIOS mode. As such, the user accessibility features are not available to a user who wishes to change a change BIOS setting, perform troubleshooting of the device in BIOS mode, etc. Additionally, peripheral Braille displays are costly add-on devices that are limited to translating text into Braille, but do not provide information as to, for instance, graphical or non-text content displayed.


Disclosed herein are example systems, apparatus, and methods that provide for audio and/or haptic feedback representation(s) of content in display frame(s) (e.g., graphical user interface(s)) displayed via a display screen of an electronic user device in response to a touch event by the user on the display screen. The touch event can include a touch by a user's finger(s) and/or by an input device such as a stylus.


Examples disclosed herein sample or capture a portion of a display frame associated with or corresponding to the location of the touch event on the display screen. In examples disclosed herein, a display region analyzer executes neural network model(s) to identify content such as text and/or non-textual or graphics (e.g., shapes, icons, border lines of a menu windows, non-text character(s), etc.) in the sampled portion of the display frame. In examples in which text is identified in the display frame at or near the location of the user's touch, the neural network model(s) recognize or predict the text. The predicted text is converted to audio waveforms (e.g., using text to speech synthesis) and output as audio data by speakers of the user device and/or peripheral audio devices (e.g., Bluetooth® headphones). Thus, examples disclosed herein provide a visually impaired user with an audio stream or read-out of text on the display screen in response to the touch event(s) on the display screen. In examples in which graphical or non-textual elements such as shapes and/or icons are identified in display frame content associated with a touch event, haptic feedback output(s) (e.g., vibrations) can be generated to provide the user with a sense of feedback in response to the touch. For example, haptic feedback can be provided when a user touches the display screen proximate to a border or line defining a user application window or menu to alert the user that the user's finger is near an edge of the window or menu and to orient the user relative to the display frame. Thus, examples disclosed herein inform a visually, neurologically, and/or motor impaired user of textual and/or graphical information displayed on the display screen.


In some examples disclosed herein, the display region analyzer is implemented by a system-on-chip of the user device. This implementation enables the display region analyzer to analyze display frame(s) and determine the corresponding audio and/or haptic output(s) independent of the operating system of the user device. For example, the system-on-chip architecture enables the display region analyzer to operate when the user device is in BIOS mode before the operating system has been loaded. Thus, examples disclosed herein provide users with accessibility features that are not limited to certain applications or operating systems and, therefore, provide a more complete accessibility experience when interacting with the device.



FIG. 1 illustrates an example system 100 constructed in accordance with teachings of this disclosure for providing accessible graphical user interfaces to a user of a user device 102 who may be visually, neurologically, or motor-impaired. (The terms “user” and “subject” are used interchangeably herein and both refer to a biological (e.g., carbon-based) creature such as a human being). The user device 102 can be, for example, a personal computing (PC) device such as a laptop computer, a desktop computer, an electronic tablet, an all-in-one PC, a hybrid or convertible PC, a mobile phone, a monitor, etc.


The example user device 102 of FIG. 1 includes a display screen 104. In the example of FIG. 1, the display screen 104 is a touch screen that enables a user to interact with data presented on the display screen 104 by touching the screen with a stylus and/or one or more fingers or a hand of the user. The example display screen 104 includes one or more display screen touch sensor(s) 106 that detect electrical changes (e.g., changes in capacitance, changes in resistance) in response to touches on the display screen. In some examples, the display screen is a capacitive display screen. In such examples, the display screen touch sensors 106 include sense lines that intersect with drive lines carrying current. The sense lines transmit signal data when a change in voltage is detected at locations where the sense lines intersect with drive lines in response to touches on the display screen 104. In other examples, the display screen 104 is a resistive touch screen and the display screen touch sensor(s) 106 include sensors that detect changes in voltage when conductive layers of the resistive display screen 104 are pressed together in response to pressure on the display screen from the touch. In some examples, the display screen touch sensor(s) 106 can include force sensor(s) that detect an amount of force or pressure applied to the display screen 104 by the user's finger or stylus.


The example user device 102 of FIG. 1 includes a touch controller 108 to process the signal data generated by the display screen touch sensor(s) 106 when the user touches the display screen 104. The touch controller 108 interprets the signal data to identify particular locations of touch events on the display screen 104 (e.g., where voltage change(s) were detected by the sense line(s) in a capacitive touch screen). The touch controller 108 communicates the touch event(s) to, for example, logic circuitry such as a microcontroller 130 on a system-on-chip (SoC) 128 and/or a processor 110 (e.g., a central processing unit) of the user device 102. Additionally or alternatively, the user can interact with data presented on the display screen 104 via one or more user input devices 107, such as microphone(s) 119 that detect sounds in the environment in which the user device 102 is located, a keyboard, a mouse, a touch pad, etc. In some examples, the touch controller 108 is implemented by the processor 110.


The processor 110 of the illustrated example is a semiconductor-based hardware logic device. The hardware processor 110 may implement a central processing unit (CPU) of the user device 102, may include any number of cores, and may be implemented, for example, by a processing commercially available from Intel® Corporation. The processor 110 executes machine readable instructions (e.g., software) including, for example, an operating system 112 and/or other user application(s) 113 installed on the user device 102, to interpret and output response(s) based on the user input event(s) (e.g., touch event(s), keyboard input(s), etc.). The example user device 102 includes a Basic Input/Output System (BIOS) 114, which may be implemented by firmware that provides for initialization of hardware of the user device 102 during start-up of the user device 102 prior to loading of the operating system software 112. The operating system 112, the user application(s) 113, and the BIOS 114 are stored in one or more storage devices 115. The user device 102 of FIG. 1 includes a power source 116 such as a battery and/or a transformer and AC/DC convertor to provide power to the processor 110 and/or other components of the user device 102 communicatively coupled via a bus 118. Some or all of the processor 110 and storage device(s) 115 may be located on a same die and/or on a same printed circuit board (PCB). The semiconductor die may be separate from a die of the SoC 128. The dies of the SoC 128 and the CPU may be mounted to the same PCB or different PCBs.


A display controller 120 (e.g., a graphics processing unit (GPU)) of the example user device 102 of FIG. 1 controls operation of the display screen 104 and facilitates rending of content (e.g., display frame(s) associated with graphical user interface(s)) via the display screen 104. As discussed above, the display screen 104 is a touch screen that enables the user to interact with data presented on the display screen 104 by touching the screen with a stylus and/or one or more fingers of a hand of the user. In some examples, the display controller 120 is implemented by the processor 110. In some examples, the display controller 120 is implemented by the SoC 128. In some examples, the processor 110, the touch controller 108, the display controller 120 (e.g., a GPU), and the SoC 128 are implemented on separate chips (e.g., separate integrated circuits), which may be carried by the same or different PCBs.


The example user device 102 includes one or more output devices 117 such as speakers 121 to provide audible outputs to a user. The example user device 102 includes an audio controller 122 to control operation of the speaker(s) 121 and facilitate rendering of audio content via the speaker(s) 121. In some examples, the audio controller 122 is implemented by the processor 110. In some examples, the audio controller 122 is implemented by the SoC 128 (e.g., by the microcontroller 130 of the SoC 128). In other examples, the audio controller 122 is implemented by stand-alone circuitry in communication with one or more of the processor 110 and/or the SoC 128.


The example user device 102 of FIG. 1 can provide haptic feedback or touch experiences to the user of the user device 102 via vibrations, forces, etc. that are output in response to, for example, touch event(s) on the display screen 104 of the device 102. The example user device 102 includes one or more haptic feedback actuator(s) 123 (e.g., piezoelectric actuator(s)) to produce, for instance, vibrations. The example user device 102 includes a haptic feedback controller 124 to control the actuator(s) 123. In some examples, the haptic feedback controller 124 is implemented by the processor 110. In some examples, the haptic feedback controller 124 is implemented by the SoC 128 (e.g., by the microcontroller 130 of the SoC 128). In other examples, the haptic feedback controller 124 is implemented by stand-alone circuitry in communication with one or more of the processor 110 and/or the SoC 128.


Although shown as one device 102, any or all of the components of the user device 102 may be in separate housings and, thus, the user device 102 may be implemented as a collection of two or more user devices. In other words, the user device 102 may include more than one physical housing. For example, the logic circuitry (e.g., the SoC 128 and the processor 110) along with support devices such as the one or more storage devices 115, a power supply 116, etc. may be a first user device contained in a first housing of, for example, a desktop computer, and the display screen 104, the touch sensor(s) 106, and the haptic feedback actuator(s) 123 may be contained in a second housing separate from the first housing. The second housing may be, for example, a display housing. Similarly, the user input device(s) 107 (e.g., microphone(s) 119, camera(s), keyboard(s), touchpad(s), mouse, etc.) and/or the output device(s) (e.g., the speaker(s) 121 and/or the haptic feedback actuator(s) 123) may be carried by the first housing, by the second housing, and/or by any other number of additional housings. Thus, although FIG. 1 and the accompanying description refer to the components as components of the user device 102, these components can be arranged in any number of manners with any number of housings of any number of user devices.


In the example of FIG. 1, the touch event(s) (e.g., user finger and/or stylus touch input(s)) detected by the display screen touch sensor(s) 106 and processed by the touch controller 108 to facilitate analysis of user interface content in the display frame(s) rendered via the display screen 104 and associated with the location(s) of the touch event(s) on the display screen 104. The touch controller 108 generates touch coordinate position data indicative of location(s) or coordinate(s) of the touch event(s) detected by the display screen touch sensor(s) 106 on the display screen 104. In response to communicating from the touch controller 108 identifying a touch event on the display screen 104, the display controller 120 captures or samples a region or area of a display frame rendered via the display screen 104 at the time of the touch event and associated with the location of the touch event. As disclosed herein, the sampled display region(s) are used in connection with position data of the touch event(s) to identify content on the screen 104 with which the user is interacting and to generate audio and/or haptic feedback output(s) indicative of the content.


In the example of FIG. 1, the touch position data generated by the touch controller 108 and the display region data captured by the display controller 120 in response to the touch event(s) are passed to a display region analyzer 126. The display region analyzer 126 analyzes the data from the touch controller 108 and the display controller 120 to identify content in the sampled display region(s) and to generate output(s) representative of the content. In some examples, the display region analyzer 126 generates audio output(s) corresponding to text in the sampled display region proximate to the location of the user's touch. In some examples, the display region analyzer 126 generates haptic feedback output(s) indicative of content such as an outline of a menu box or user application window proximate to the location of the user's touch.


In the example of FIG. 1, the display region analyzer 126 is implemented by a the SoC 128 carried by the user device 102 that is separate from the (e.g., main or central) processing platform/integrated circuit that executes, for example the operating system of the device 102 (e.g., the processor 110 of FIG. 1). The SoC 128 and the processor 110 may be mounted to the same printed circuit board (PCB) or the SoC 128 and the processor 110 may be on separate PCBs. The example SoC 128 of FIG. 1 includes a hardware processor such as a microcontroller 130 or any other type of processor such as a processor sold by Intel® Corporation. Although the display region analyzer 126 may be implemented by dedicated logic circuitry, in this example, the display region analyzer 126 is implemented by instructions executed on the microcontroller 130 of the SoC 128. The implementation of the display region analyzer 126 by the SoC 128 enables the display region analyzer 126 to execute independent of the operating system 112 of the user device 102. As a result, the display region analyzer 126 can respond to touch event(s) with audio output(s) and/or haptic feedback output(s) irrespective of the state of the operating system 112 (e.g., before the operating system 112 loads and when, for instance, the user is interacting with the BIOS 114 (e.g., a pre-boot mode when the BIOS is active and the operating system 112 is inactive)). Also, the SoC 128 can consume less power than if the display region analyzer 126 were implemented by the same processor 110 that implements the operating system 112 of the device 102 because, for example, the SoC 128 may include lower power-consuming, less complex circuitry than the main processor 110.


In some examples, one or more components of the display region analyzer 126 is implemented by a neural network accelerator 132 (e.g., of the SoC 128) to facilitate neural network processing performed by the display region analyzer 126 when analyzing the display frame(s). The neural network accelerator 132 can be implemented by, for example, an accelerator such as the Intel® Gaussian & Neural Accelerator (GNA)) or an ANNA (an autonomous neural network accelerator that can be an extension to the GNA), among others. The neural network accelerator 132 can be implemented by dedicated logic circuitry or by a processor such as a microcontroller executing instructions on the SoC 128. In some examples, the display region analyzer 126 and the neural network accelerator 132 are implemented by the same microcontroller of the SoC 128. In some examples, one or more components of the neural network accelerator 132 is implemented by the microcontroller 130 of FIG. 1. In other examples, the neural network accelerator 132 is implemented separately from the microcontroller 130 (e.g., by a different processor (e.g., microcontroller 130) on the SoC 128, by the processor 110, by a separate SoC, etc.).


Although in the example of FIG. 1, the example display region analyzer 126 of FIG. 1 is implemented by the microcontroller 130 executing machine readable instructions, one or more of the components of the display region analyzer 126 could additionally or alternatively be implemented by dedicated circuitry on the SoC 128, by instructions executed on the processor 110 of the user device 102, by a processor 131 of another user device 134 (e.g., a smartphone, an edge device, etc.) in communication with the user device 102 (e.g., via wired or wireless communication protocols), and/or by a cloud-based device 136 (e.g., one or more server(s), processor(s), and/or virtual machine(s)). These components may be implemented in software, in firmware, in hardware, or in any combination of two or more of software, firmware, and/or hardware. In some examples, the microcontroller 130 is communicatively coupled to one or more other processors on the device 102 and/or on other devices (e.g., a second user device 134, a cloud computing device accessible via the cloud 136, etc.). In such examples, the touch controller 108 can transmit data to the on-board microcontroller 130 of the user device 102. The on-board microcontroller 130 can then transmit the data to, for instance, the cloud-based device 136 and/or the processor 131 of the second user device 134.


In the example of FIG. 1, the display region analyzer 126 serves to process the touch event position data generated by the touch controller 108 and the display region data generated by the display controller 120 to identify content displayed on the display screen 104 at or near the location of the user's touch on the display screen using neural network processing. The display region analyzer 126 generates audio output(s) corresponding to the text in the sampled display region of the display frame displayed at the time of the touch event on the display screen 104. The audio output(s) are presented to the user via one or more audio transducers such as the speaker(s) 121 of the user device 102 or speakers of another device (e.g., headphones) to inform, for example, a visually impaired user of the content displayed on the display screen 104 at the location of the touch event. In some examples, the display region analyzer 126 analyzes the touch event position data and the display region data to generate haptic feedback output(s) via one or more haptic feedback devices such as the haptic feedback actuator(s) 123. The haptic feedback output(s) are used to alert the user to, for example, the presence of edges of a menu or window box at specific screen locations corresponding to the user's finger(s) or a stylus as the user moves his or her finger(s) and/or the stylus across the display screen 104.


In other examples, the audio controller 122 and/or the haptic feedback controller 124 are one or more components separate from the SoC 128 and separate from the processor 110. As such, the SoC 128 may communicate with the audio controller 122 and/or the haptic feedback controller 124 without involving the processor 110. Similarly, the processor 110 may communicate with the audio controller 122 and/or the haptic feedback controller 124 without the involvement of the SoC 128. In some examples, the SoC 128 communicates with the audio controller 122 and/or the haptic feedback controller 124 at least (e.g., only) prior to loading of the operating system 112 and the processor 110 communicates with the audio controller 122 and/or the haptic feedback controller 124 at least (e.g., only) after the loading of the operating system 112.



FIG. 2 is a flow diagram showing an example data exchange between the display region analyzer 126, the touch controller 108, the display controller 120, the audio controller 122, and the haptic feedback controller 124 of the example user device 102 of FIG. 1. As disclosed above, the display region analyzer 126 uses data received from the touch controller 108 and the display controller 120 to analyze the user interface content and to instruct the audio controller 122 and/or the haptic feedback controller 124 to generate output(s) that provide a visually, motor, and/or neurologically impaired user with information about the content displayed on the display screen 104 associated with a touch event on the display screen 104. As also discussed above, the display region analyzer 126 can be implemented by the microcontroller 130 of the SoC 128 of the user device 102. In some examples, one or more components of the display region analyzer 126 are implemented by the neural network accelerator 132.


In the example of FIG. 2, the touch controller 108 generates coordinate data or touch position data 200 representing a location of a user's touch on the display screen 104 in response to signal(s) received from the display screen touch sensor(s) 106 when the user and/or an input device (e.g., a stylus) touches the screen 104. The touch controller 108 can identify the coordinates (e.g., x-y coordinates) for the location(s) of the touch event(s) on the screen 104 based on, for instance, changes in capacitance or voltage captured in the signal data generated by the display screen touch sensor(s) 106. In some examples, the touch position data can include a time at which the touch event occurred. In some examples, the touch position data represents changes in the position of the user's finger(s) and/or the stylus relative to the screen 104 as the user makes gestures on the screen and/or moves his finger(s) and/or the stylus on the screen 104.


The touch controller 108 transmits the touch coordinate data 200 to the display region analyzer 126. In some examples, the display region analyzer 126 receives the touch position data 200 from the touch controller 108 in substantially real-time (as used herein “substantially real time” refers to occurrence in a near instantaneous manner recognizing there may be real world delays for computing time, transmission, etc.). In other examples, the display region analyzer 126 receives the touch position data 200 at a later time (e.g., periodically and/or aperiodically based on one or more settings but sometime after the activity that caused the signal data to be generated, such as a user touching the display screen 104 of the device 102, has occurred (e.g., seconds later)).


In some examples, the touch controller 108 also transmits the touch position data 200 to the display controller 120 to alert the display controller 120 to the touch event. In some instances, the display controller 120 receives the touch position data 200 from the touch controller 108 in substantially real-time. In other examples, the display controller 120 receives the touch position data 200 at a later time (e.g., periodically and/or aperiodically based on one or more settings but sometime after the activity that caused the signal data to be generated, such as a user touching the display screen 104 of the device 102, has occurred (e.g., seconds later)).


However, in other examples, the touch controller 108 only sends the touch position data 200 to the display region analyzer 126 and the display region analyzer 126 generates instructions to alert the display controller 120 of the touch event in response to receipt of the touch position data 200.


In response to notification of a touch event from the display region analyzer 126 and/or the touch controller 108, the display controller 120 identifies and saves the display frame rendered at the time of the touch event. As shown in FIG. 2, the display controller 120 includes a display region sampler 201. The display region sampler 201 samples or captures portion(s) or region(s) of the display frame(s) rendered via the display screen 104 in response to instructions from the display region analyzer 126 and/or in response to receipt of the touch position data 200 from the touch controller 108. In particular, the display region sampler 201 samples a portion or region of the display screen corresponding to the location of the touch in substantially real-time (e.g., within milliseconds of the touch event). As a result of the sampling, the display region sampler 201 generates display region data 202 including image data of captured or sampled portion(s) or region(s) of the display frame(s). A portion of the display frame sampled is an area less than an area of the display screen 104 (e.g., the outer screen size). The sampled area may be centered on the touch point. For example, a rectangle (e.g., square) of the region surrounding the touch event centered on the point of touch may be sampled. In some examples, the display region data generated by the display region sampler 201 includes a full display frame instead of a portion of the frame.


The display region sampler 201 can identify the location of the touch event relative to the display frame based on the touch position data 200 from the touch controller 108 and/or the instructions from the display region analyzer 126, which can include the location of the touch event. The boundaries that define a size or resolution of the region of the display frame that is sampled by the display region sampler 201 can be defined by one or more variables. For example, the size of the captured display region can be defined by content located within a threshold distance of the coordinates corresponding to the location of the touch event on the display screen 104. In some examples, the size of the region or area of the user interface captured by the display region sampler 201 is based on an amount of pressure applied by the user's finger and/or a stylus on the screen 104 and detected by the display screen touch sensor(s) 106. The force data can be transmitted from the touch controller 108 to the display region analyzer 126, which generates instructions for the display region sampler 201 regarding the size of the display region to sample. The size of the screen region captured by the display region sampler 201 can be proportional to the amount of pressure applied (e.g., the greater the pressure associated with the touch, the larger the size of the user interface sampled by the display region sampler 201).


The display region sampler 201 can (e.g., automatically) sample the display frame(s) periodically (e.g., several times a second) in response to, for instance, changes in the location of the user's touch as detected by the touch controller 108 and/or the display region analyzer 126. As such, the display region data 202 generated by the display region sampler 201 can reflect movement of, for instance, the user's finger across a region of the display screen 104. In other examples, the display region sampler 201 generates the display region data in response to requests from the display region analyzer 126 (e.g., on-demand).


The display region sampler 201 of the example display controller 120 of FIG. 2 sends the display region data 202 (e.g., image data) to the display region analyzer 126. In some examples, the display region analyzer 126 receives the display region data 202 from the display controller 120 in substantially real-time. In other examples, the display region analyzer 126 receives the display region data 202 at a later time (e.g., periodically and/or aperiodically based on one or more settings).


The display region analyzer 126 analyzes the touch position data 200 and the display region data 202 to identify the content in the sampled display region and to determine output(s) (e.g., audio output(s) and/or a haptic feedback output(s)) that inform the user as to the content displayed. In some examples, the display region analyzer 126 determines that no output should be provided (e.g., in instances when the display region analyzer 126 determines that the user is touching a portion of the display screen that is displaying neither textual content nor non-textual content such as an icon).


The display region analyzer 126 analyzes the touch position data 200 and the display region data 202 to correlate the location(s) of the touch event(s) with content in the sampled display region(s). As disclosed herein, the display region analyzer 126 implements neural network model(s) to identify user interface content proximate to the location of the touch event on the display screen 104 as either corresponding to a word (i.e., text), a non-textual or graphical element (e.g., a shape such as a line of a menu box, an icon, etc.), or content that does not include a word or graphics (e.g., a blank portion of a word document, a space between words, a space between two lines of text) and to generate the output(s) representative of the content.


If the display region analyzer 126 recognizes a word in text in the sampled display region, the display region analyzer 126 generates audio output data 204 (e.g., audio waveforms) that includes the identified word to be output in audio form. The audio output data 204 is transmitted to the audio controller 122. The audio controller 122 causes the audio output data 204 to be output by the speaker(s) 121 of the device 102 (FIG. 1).


If the display region analyzer 126 determines that the location of the touch event on the display screen 104 corresponds to non-textual or graphical content such as a line of a window or menu box, the display region analyzer 126 generates haptic feedback output data 206 to alert the user that, for instance, the user's finger has touched or moved over a border line of a menu box or an application window. The haptic feedback output data 206 includes instructions regarding haptic feedback (e.g., vibrations) to be generated by the haptic feedback actuator(s) 123 of the device 102 (FIG. 1). The haptic feedback controller 124 cause the haptic feedback output data 206 to be output via the haptic feedback actuator(s) 123.


In some examples, the display region analyzer 126 determines that the graphical element includes text (e.g., an icon with a title of the user application represented by the icon). In some such examples, the display region analyzer 126 can generate an audio output and haptic feedback output such that both outputs are produced by the device 102.


As disclosed herein, the audio controller 122 and/or the haptic feedback controller 124 may be implemented by the processor 110. The display region analyzer 126 (e.g., the microcontroller 130 of the SoC 128) may, thus, output requests to the processor 110 to cause the audio controller 122 and/or the haptic feedback controller 124 to take the actions described herein. In some examples, the audio controller 122 and/or the haptic feedback controller 124 are implemented by the BIOS 114 (the basic input output system which controls communications with input/output devices). In such examples, the SoC 128 communicates with the audio controller 122 and/or the haptic feedback controller 124 by sending requests to the processor 110 that implements the audio controller 122 and/or the haptic feedback controller 124.


In some examples, the display region analyzer 126 determines, based on the display region data 202 and the neural network model(s), that the location of the touch event on the display screen 104 corresponds content in the display frame that does not include a word or graphics (e.g., a blank or empty portion of a window, a location that is between two words). In other examples, the display region analyzer 126 identifies a portion or fragment of a word in the sampled display region. In some such examples, the display region analyzer 126 determines that no audio output data 204 and/or haptic feedback output data 206 should be generated. As such, the display region analyzer 126 prevents outputs that would result in audio corresponding to a nonsensical word (e.g., in examples where the user's touch is located between two words) or would inaccurately represent what is on the display screen (e.g., in examples where the user has touched an empty portion of a window). However, in other examples, an empty portion of a window can prompt a haptic feedback output to alert the user that the user's touch has moved away from text.


The example display region analyzer 126 of FIG. 2 continues to receive touch position data 200 and display region data 202 as touch event(s) on the display screen 104 are detected by the touch controller 108. The display region analyzer 126 analyzes the data 200, 202 to identify the display frame content that correlates with the location of the touch event(s) and determine the response(s) to be provided by the user device 102 (e.g., audio output, haptic feedback output, or no output) as the user interacts with the user interface content via touch.



FIG. 3 illustrates an example display frame 300 that can be rendered via the display screen 104 of the user device 102 of FIG. 1. As shown in FIG. 3, a user touches the display screen 104 with his or her finger 302 at location 304 (or, in other examples, using an input device such as a stylus). In response to the touch event, the touch controller 108 of FIGS. 1 and 2 generates the touch position data indicative of the location or coordinates of the user's touch relative to the display screen 104 of FIG. 1.


As discussed above, in response to the touch event, the display region sampler 201 of the display controller 120 of FIGS. 1 and 2 samples a portion of the display frame 300 to generate display region data. The display region data includes a portion or region 306 of the display frame 300 including content displayed at the location of the touch event and in surrounding proximity to the touch event. As discussed above, the size of the region 306 and, thus, the captured content can be based on variables such as an amount of pressure associated with the touch event. As disclosed herein, the display region analyzer 126 of FIGS. 1 and 2 analyzes the touch position data and the sampled display region data 306 to identify (e.g., predict, estimate, classify, recognize) the presence of the word “spelling” in the display region proximate to the location of the touch event. As a result of the identification of the word “spelling” in the sampled display region 306 near the location of the touch event, the display region analyzer 126 causes the user device 102 to output audio of the word “spelling” to inform the user of the word that is displayed proximate to the user's touch.



FIG. 4 is a block diagram of an example implementation of the system 100 of FIGS. 1 and/or 2 including an example implementation of the display region analyzer 126. As mentioned above, the display region analyzer 126 is structured to identify (e.g., predict, estimate, classify, recognize) content in display frame(s) (e.g., the display frame 300 of FIG. 3) displayed via a display screen of an electronic device (e.g., the display screen 104 of the user device 102 of FIG. 1) in response to touch event(s) on the display screen and to generate audio and/or haptic feedback output(s) representative of the content for presentation to a user. In the example of FIG. 4, the display region analyzer 126 is implemented by the microcontroller 130 of the SoC 128. In some examples, one or more components of the display region analyzer 126 is implemented by the neural network accelerator 132 (e.g., the Intel® GNA) of the user device 102. In other examples, the display region analyzer 126 is implemented by one or more of the processor 110 of the user device 102, the processor 131 of another user device 134 (e.g., a smartphone), and/or the cloud-based devices 136 (e.g., server(s), processor(s), and/or virtual machine(s) in the cloud 136 of FIG. 1). In some examples, some of the display region analysis is implemented by the display region analyzer 126 via a cloud-computing environment and one or more other parts of the analysis is implemented by one or more of the microcontroller 130 of the SoC 128, the processor 110 of the user device 102, and/or the processor 131 of the second user device 134.


The display region analyzer 126 of FIG. 4 includes a touch controller interface 401. The touch controller interface 401 provides means for communicating with the touch controller 108. For example, the touch controller interface 401 can be implemented by circuitry that connects the display region analyzer 126 to communication line(s) of the touch controller 108. The touch controller interface 401 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130.


As disclosed in connection with FIG. 2, the display region analyzer 126 receives the touch position data 200 from the touch controller 108 indicative of the coordinates of the touch event(s) (e.g., touch from a user's finger and/or an input device such as a stylus) on the display screen 104. The touch position data 200 is stored in a database 400. In some examples, the display region analyzer 126 includes the database 400. In other examples, the database 400 is located external to the display region analyzer 126 in a location accessible to the display region analyzer 126 as shown in FIG. 4. The example database 400 of the illustrated example of FIG. 4 is implemented by any memor(ies), storage device(s) and/or storage disc(s) for storing data such as, for example, flash memory, magnetic media, optical media, etc. Furthermore, the data stored in the example database 400 may be in any data format such as, for example, binary data, comma delimited data, tab delimited data, structured query language (SQL) structures, image data, etc.


The example display region analyzer 126 includes a touch event analyzer 402. The touch event analyzer 402 provides means for analyzing the touch position data 200 to verify that the touch event is a touch event intended for the display region analyzer 126 to cause the display region analyzer 126 to analyze the display frame content and not, instead, a gesture or touch event associated with another function of the user device 102 and/or user application(s) installed thereon. The touch event analyzer 402 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130 executing block 804 of the flowchart of FIG. 8.


For example, based on the touch position data 200, the touch event analyzer 402 may determine that the user has performed a gesture intended for, for instance, the operating system 112 (e.g., a single tap on the display screen 104, a double tap on the display screen 104). In some examples, the touch event analyzer 402 recognizes that the gesture is associated with another user interface function (e.g., selection of menu item) based on touch event rule(s) 406 stored in the database 400. As a result, the touch event analyzer 402 determines that the touch event is not a touch event intended for the display region analyzer 126. If the touch event analyzer 402 determines that the touch event is not an intended touch event for the display region analyzer 126, the touch event analyzer 402 can instruct the display region analyzer 126 to refrain from activating the display region sampler 201 of the display controller 120 and/or refrain from analyzing display frame(s)). Conversely, if, based on the touch position data 200, the touch event analyzer 402 determines that, for example, the user is moving his or her finger across the display screen (e.g., as if underlining words while reading), the touch event analyzer 402 determines that the touch event is an intended touch event for the display region analyzer 126 to generate output(s) representative of the user content.


In some examples, the touch controller 108 transmits force data 408 to the display region analyzer 126. The force data 408 can be generated by the display screen touch sensor(s) 106 (e.g., resistive force sensor(s), capacitive force sensor(s), piezoelectric force sensor(s) and can indicate an amount or force or pressure associated with the touch event on the display screen 104. In such examples, the touch event analyzer 402 can determine that the touch event is intended for the display region analyzer 126 if the force data 408 associated with the touch event exceeds a force threshold as defined by the touch event rule(s) 406 (e.g., the touch event is indicative of a hard press by the user's finger or a stylus on the screen 104).


As disclosed herein, the display region sampler 201 a size of the display frame sampled can be based on an amount of pressure or force associated with the touch event (e.g., the display region sampler 201 samples a larger area of the display frame in response to increased force associated with the touch event). The touch event analyzer 402 can determine that the touch event is intended for the display region analyzer 126 based on a size of the display region captured by the display region sampler 201.


Thus, the touch event rule(s) 406 can include touch gesture(s) associated with the operating system 112 and/or other user application(s) 113 on the user device 102 and/or control function(s) (e.g., a double tap gesture to select, a pinch gesture to zoom in) that, when identified by the touch event analyzer 402, cause the display region analyzer 126 to refrain from interpreting the display content to prevent interference with output(s) by the user application(s). The touch event rule(s) 406 can also include threshold touch pressure(s) or force(s) and/or touch gesture(s) that indicate that the user is requesting information about the content displayed on the screen 104 and, thus, should trigger analysis of the display frame content by the display region analyzer 126.


The display region analyzer 126 of FIG. 3 includes a display controller interface 405. The display controller interface 405 provides means for communicating with the display controller 120. For example, the display controller interface 405 can be implemented by circuitry that connects the display region analyzer 126 to communication line(s) of the display controller 120. The display controller interface 405 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130.


The display region analyzer 126 includes a display controller manager 403. The display controller manager 403 provides means instructing the display region sampler 201 of the display controller 120 to capture region(s) of display frame(s). In some examples, the display controller manager 403 activates or instructs the display region sampler 201 of the display controller 120 to capture region(s) of display frame(s) (e.g., the display frame 300 of FIG. 3) rendered via the display screen 104 in response to touch event(s). For instance, the display controller manager 403 can activate the display region sampler 201 in examples where the touch controller 108 only transmits the touch position data 200 to the display region analyzer 126 and not to the display region sampler 201. The display controller manager 403 determines the location of the touch event based on the touch position data 200 (e.g., coordinate data) and generate instructions for the display region sampler 201 to sample the display frame based on the location of the touch event. The instructions to the display region sampler 201 can include the location of the touch event. In some examples, the display controller manager 403 generates the instructions in response to the verification of the touch event as intended touch event by the touch event analyzer 402.


As disclosed above in connection with FIG. 2, the example display region analyzer 126 receives the display region data 202 (e.g., image data) from the display controller 120 including a portion of the display frame associated with the location of the touch event. In some examples, the display region sampler 201 of the display controller 120 generates the display region data 202 directly in response to touch event(s) detected at the display screen 104. Additionally, in some examples, the display region sampler 201 automatically generates and transmits the display region data 202 periodically (e.g., several times a second) after a touch event has been detected as part of periodic sampling of the display frames. The display region data 202 is stored in the database 400.


The example display region analyzer 126 include a touch location mapper 410. The touch location mapper 410 provides means for mapping the location of the touch event (i.e., a verified touch event as determined by the touch event analyzer 402) as indicated by the touch position data 200 relative to the display region data 202 received from the display controller 120. In some examples, the touch location mapper 410 correlates or synchronizes the touch position data stream 200 with the display region data stream 202 based on time-stamps associated with the touch position data 200 and the display region data 202. The touch location mapper 410 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130 executing block 806 of the flowchart of FIG. 8.


The example display region analyzer 126 of FIG. 2 includes a neural network accelerator interface 407. The neural network accelerator interface 407 provides means for communicating with the neural network accelerator 132 in examples when the neural network accelerator 132 is implemented by a different processor than the display region analyzer 126. For example, the neural network accelerator interface 407 can be to carry signals between the display region analyzer 126 and the neural network accelerator 132. The neural network accelerator interface 407 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130.


In the example of FIG. 4, a display region content recognizer 412 of the example display region analyzer 126 analyzes the display region data 202 from the display controller 120 to identify (e.g., predict, estimate, classify, recognize) content in a sampled display region that correlates or is associated with the location of the touch event on the display screen 104. The display region content recognizer 412 provides means for identifying content in the display region data 202 via neural network analysis. In examples disclosed herein, machine learning is used to improve efficiency of the display region content recognizer 412 in identifying content in the sampled display frame region(s) and generating outputs representative of the content. In some examples, the display region content recognizer 412 is implemented by instructions executed by the neural network accelerator 132. The display region content recognizer 412 may be implemented by dedicated hardware circuitry, the microcontroller 130, and/or the neural network accelerator 132 executing blocks 808, 810, 814, 816, 817, 818, 820, 822 of the flowchart of FIG. 8


In some examples, a neural network model to be executed by the display region content recognizer 412 can be generated using end-to-end training of a neural network such that, for each sampled display region in the display region data 202 provided as an input to the display region content recognizer 412, the display region content recognizer 412 generates the audio output and/or haptic feedback output to be provided.


In other examples, two or more neural network models are used. For instance, the display region content recognizer 412 can execute a first neural network model to determine a type of content in the sample display frame (e.g., text, non-text characters, etc.). If the display region content recognizer 412 identifies text in the sampled display region as a result of the first neural network model, a text recognizer 414 of the example display region analyzer 126 can identify (e.g., predict, estimate, classify, recognize) the text using a second neural network model. If the display region content recognizer 412 identifies graphical element(s) in the sampled display region as a result of the first neural network model, the display region content recognizer 412 can separately determine the haptic feedback to be generated based on haptic feedback rule(s) 452, as disclosed herein.


Artificial intelligence (AI), including machine learning (ML), deep learning (DL), and/or other artificial machine-driven logic, enables machines (e.g., computers, logic circuits, etc.) to use a model to process input data to generate an output based on patterns and/or associations previously learned by the model via a training process. For instance, the model may be trained with data to recognize patterns and/or associations and follow such patterns and/or associations when processing input data such that other input(s) result in output(s) consistent with the recognized patterns and/or associations.


Many different types of machine learning models and/or machine learning architectures exist. In examples disclosed herein, a convolutional neural network model is used. Using a convolutional neural network model enables rotation invariant and scale robust classification. In general, machine learning models/architectures that are suitable to use in the example approaches disclosed herein will be feed forward neural networks. However, other types of machine learning models could additionally or alternatively be used such as recurrent neural networks, graph neural networks, generative adversarial networks, etc.


In general, implementing a ML/AI system involves two phases, a learning/training phase and an inference phase. In the learning/training phase, a training algorithm is used to train a model to operate in accordance with patterns and/or associations based on, for example, training data. In general, the model includes internal parameters that guide how input data is transformed into output data, such as through a series of nodes and connections within the model to transform input data into output data. Additionally, hyperparameters are used as part of the training process to control how the learning is performed (e.g., a learning rate, a number of layers to be used in the machine learning model, etc.). Hyperparameters are defined to be training parameters that are determined prior to initiating the training process.


Different types of training may be performed based on the type of ML/AI model and/or the expected output. For example, supervised training uses inputs and corresponding expected (e.g., labeled) outputs to select parameters (e.g., by iterating over combinations of select parameters) for the ML/AI model that reduce model error. As used herein, labelling refers to an expected output of the machine learning model (e.g., a classification, an expected output value, etc.). Alternatively, unsupervised training (e.g., used in deep learning, a subset of machine learning, etc.) involves inferring patterns from inputs to select parameters for the ML/AI model (e.g., without the benefit of expected (e.g., labeled) outputs).


In examples disclosed herein, ML/AI models are trained using stochastic gradient descent. However, any other training algorithm may additionally or alternatively be used. In examples disclosed herein, training is performed until a training set loss falls below a threshold and test/development set performance is acceptable. In examples disclosed herein, training is performed in advance on a server cluster. Training is performed using hyperparameters that control how the learning is performed (e.g., a learning rate, a number of layers to be used in the machine learning model, layer sizes, etc.). In examples disclosed herein, hyperparameters are selected by, for example, exhaustive search. In some examples re-training may be performed. Such re-training may be performed in response to dissatisfaction with model performance, change in model architecture(s), the availability of more and/or improved training data, etc.


Training is performed using training data. In examples disclosed herein, the training data originates from previously generated user interfaces that include content (e.g., text, graphical element(s), blank or empty portion(s) without text or graphical element(s)) in various positions within the user interface. Because supervised training is used, the training data is labeled. Labeling is applied to the training data by expert human labelers, but in some cases training data is labelled by design (e.g., display screen data may be associated with actual text fields of windows being displayed). In some examples, the training data is pre-processed using, for example, hand-written rules or outlier detection to eliminate undesired system outputs. In some examples, the training data is sub-divided into training, development, and test sets and split into mini-batches.


Once training is complete, the model(s) are deployed for use as an executable construct that processes an input and provides an output based on the network of nodes and connections defined in the model(s). The model(s) are stored at one or more databases (e.g., the databases 430, 446 of FIG. 4). The model(s) may then be executed by the display region content recognizer 412 (e.g., when a single joint image-to-audio model is used) and/or by the display region content recognizer 412 and the text recognizer 414.


Once trained, the deployed model may be operated in an inference phase to process data. In the inference phase, data to be analyzed (e.g., live data) is input to the model, and the model executes to create an output. This inference phase can be thought of as the AI “thinking” to generate the output based on what it learned from the training (e.g., by executing the model to apply the learned patterns and/or associations to the live data). In some examples, input data undergoes pre-processing before being used as an input to the machine learning model. Moreover, in some examples, the output data may undergo post-processing after it is generated by the AI model to transform the output into a useful result (e.g., a display of data, an instruction to be executed by a machine, etc.).


In some examples, output of the deployed model may be captured and provided as feedback. By analyzing the feedback, an accuracy of the deployed model can be determined. If the feedback indicates that the accuracy of the deployed model is less than a threshold or other criterion, training of an updated model can be triggered using the feedback and an updated training data set, hyperparameters, etc., to generate an updated, deployed model.


Referring to FIG. 4, the example system 100 includes a first computing system 416 to train a neural network to detect a type of content or data included in image data representative of a user interface (e.g., display region data). The example first computing system 416 includes a first neural network processor 418. In examples disclosed herein, the first neural network processor 418 implements a first neural network.


The example first computing system 416 of FIG. 4 includes a first neural network trainer 420. The example first neural network trainer 420 of FIG. 4 performs training of the neural network implemented by the first neural network processor 418. In some examples disclosed herein, training is performed using a stochastic gradient descent algorithm. However, other approaches to training a neural network may additionally or alternatively be used.


The example first computing system 416 of FIG. 4 includes a first training controller 422. The example training controller 422 instructs the first neural network trainer 420 to perform training of the neural network based on first training data 424. In the example of FIG. 4, the first training data 424 used by the first neural network trainer 420 to train the neural network is stored in a database 426. The example database 426 of the illustrated example of FIG. 4 is implemented by any memory, storage device and/or storage disc for storing data such as, for example, flash memory, magnetic media, optical media, etc. Furthermore, the data stored in the example database 426 may be in any data format such as, for example, binary data, comma delimited data, tab delimited data, structured query language (SQL) structures, image data, etc. While in the illustrated example, the database 426 is illustrated as a single element, the database 426 and/or any other data storage elements described herein may be implemented by any number and/or type(s) of memories.


In the example of FIG. 4, the first training data 424 can include image data of known display frame(s) and/or display frame region(s) or sequences of display frame(s) that may be encountered by a user when interacting with a user device for the purposes of training. The known display frame data can be obtained via, for instance, screen scraping. The first training data 424 can include images of display frame(s) associated with user application(s) (e.g., a word processing application, a music player, a video player) and/or display frame(s) of BIOS user interface(s). In some examples, the first training data 424 includes the display region data 202 generated by the display controller 120.


The first training data 424 (e.g., display frame(s)) labeled with content in the user interface that should prompt an audio output (e.g., text) and/or a haptic feedback output (e.g., graphical element(s) such as icon(s) or lines defining menu boxes). In some examples, the content is labeled to prompt both an audio output and a haptic feedback output (e.g., an icon that includes text, a border of a window that includes text). The first training data 424 is also labeled content that should not prompt an audio output and/or a haptic feedback output. For example, blank or empty portions of the user interface(s) that do not include text and/or graphical elements(s) may be labeled as content that should not prompt an output (e.g., classified as a “null output”). In some examples, portions or fragments of word(s), phrase(s) and/or spaces between words may be labeled as content that should not prompt an output (e.g., audio output) to prevent nonsensical output(s) (e.g., audio that does not correspond to an actual word). However, in some other examples, blank or empty portions of the user interface(s) that do not include text and/or graphical elements(s) may be labeled as content that is to cause a haptic feedback output to alert a user that the user's touch has moved away from text.


In some example, the training data can include words such as “file,” “save,” “open” in a user interface associated with a word processing application and words such as “boot” “setting” “configuration” in a user interface associated with the BIOS. In some examples, fragments of words are used in the training data 424 to train the neural network to identify the text in the user interface (e.g., to predict or estimate that the term “say” is most likely the word “save”). In other examples, fragments of words are used in the training data to train the neural network to refrain from attempting to identify a word if the corresponding output would be a nonsensical word (e.g., the phrase “ot set” in “boot settings” could be used to train the neural network to refrain from outputting a predicted word that would not correspond to an actual word).


In examples in which the neural network is trained using end-to-end training, the first training data 424 is also labeled with the audio output (e.g., text-to-speech) and/or haptic feedback output that is to be produced for the predicted content.


The first neural network trainer 420 trains the neural network implemented by the neural network processor 418 using the training data 424. Based on the type of content in the user interface(s) in the training data 424 and associated output response, the first neural network trainer 420 trains the neural network 418 to identify (e.g., predict, estimate, classify, recognize) the type of content in the display region data 202 generated by the display controller 120 and whether the content is to prompt audio and/or haptic feedback output(s). In some examples, the text recognition training is based on optical character recognition. In examples in which end-to-end training is used, the first neural network trainer 420 trains the neural network 418 to identify the content in the display region data 202 generated by the display controller 120 and to generate the corresponding audio and/or haptic feedback output(s).


A content recognition model 428 is generated as a result of the neural network training. The content recognition model 428 is stored in a database 430. The databases 426, 430 may be the same storage device or different storage devices.


The content recognition model 428 is executed by the display region content recognizer 412 of the display region analyzer 126 of FIG. 4. In particular, the display region content recognizer 412 executes the content recognition model 428 for each sampled display region in the display region data 202 (e.g., portion(s) of the display frame(s) associated with the touch event location(s)). As a result of the execution of the content recognition model 428, the display region content recognizer 412 determines whether the content in the portion user interface associated with the touch event includes (a) text, (b) graphical element(s), which may or may not include text, (c) or content that does not warrant an audio and/or haptic feedback output.


In some examples, the display region content recognizer 412 verifies the content identified in the sampled display region(s) (e.g., text, graphical element(s), blank portion(s)) relative to the location of the touch event as mapped by the touch location mapper 410 and/or based on the touch position data 200. For example, if the sampled display region includes text and a graphical element, multiple items of text (e.g., two words, two sentences, etc.), and/or text proximate to a blank portion, the display region content recognizer 412 uses the mapping of the coordinates of the touch event relative to the sampled display region to identify the content most closely associated with the location of the user's touch (i.e., the content located nearest to the coordinates of the touch event). Thus, the display region content recognizer 412 compares the location of the content identified in the sampled display region to the location of the touch event to improve an accuracy of the identification of the content associated with the touch event and, thus, the accuracy of the output(s).


Based on the neural network analysis of the display region data 202, the display region content recognizer 412 generates predicted content data 427 including content associated with the touch event (e.g., text, graphical element(s), or empty portion(s)) and the corresponding response to be provided (audio output, haptic feedback output, no output). The predicted content data 427 is stored in the database 400.


The display region content recognizer 412 continues to analyze the display region data 202 received from the display controller 120 in response to changes in the location(s) of the touch event(s) occurring on the display screen 104. In some examples, the display region content recognizer 412 analyzes the display region data 202 in substantially real-time as the display region data 202 is received from the display controller 120 to enable the display region analyzer 126 to provide audio and/or haptic feedback output(s) in substantially real-time as the user interacts with the display screen 104 of the user device 102 and the interface(s) display thereon.


As disclosed above, in examples in which the content recognition model 428 is generated based on end-to-end training, the display region content recognizer 412 generates audio and/or haptic feedback output(s) as a result of the execution of the content recognition model 428. For example, the display region content recognizer 412 can generate audio or speech sample(s) in response to the detection text. The example display region analyzer 126 includes an audio controller interface 415. For example, the audio controller interface 415 can be implemented by circuitry that connects the display region analyzer 126 to communication line(s) of the audio controller 122. The audio controller interface 415 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130. The audio controller interface 415 facilitates transmission the audio sample(s) to the audio controller 122.


In some examples in which the content recognition model 428 is generated using end-to-end training, the display region content recognizer 412 determines, using the content recognition model 428, that a user interface in the display region data 202 includes a graphical element (e.g., a border of a window, an icon, etc.). In such examples, the display region content recognizer 412 determines that a haptic feedback output should be provided (e.g., based on the content recognition model 428). The display region content recognizer 412 generates instructions for the haptic feedback controller 124 to cause the haptic feedback actuator(s) 123 of the user device 102 to generate haptic feedback in response to, for example, detection of text and/or graphical elements (e.g., icon(s)) in the display region data. The haptic feedback controller interface 450 of the display region analyzer 126 provides means for communicating with the haptic feedback controller 124 to cause the actuator(s) 123 (FIG. 1) to generate haptic feedback output(s) (e.g., vibration(s)). For example, the haptic feedback controller interface 450 can be implemented by circuitry that connects the display region analyzer 126 to communication line(s) of the haptic feedback controller 124. The haptic feedback controller interface 450 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130.


Thus, as a result of end-to-end neural network training, the display region content recognizer 412 executes the content recognition model 428 to identify content in the sampled display region(s) and to generate corresponding output(s). However, in other examples, the display region analyzer 126 of FIG. 4 executes two or more neural network models to identify content in the sampled display frame(s). In such instances, the display region content recognizer 412 executes the content recognition model 428 to identify a type of content in the sampled display region(s) (e.g., text versus non-text character(s)). In this example, if the display region content recognizer 412 identifies text in the sampled display region(s), the text recognizer 414 of the display region analyzer 126 executes a second neural network model to identify (e.g., predict) the word and to cause the user device 102 to output audio corresponding to the predicted word. Also, in this example, if the display region content recognizer 412 identifies graphical element(s), the display region content recognizer 412 determines the haptic feedback to be generated based on haptic feedback rule(s) 452, as discussed herein.


The example system 100 includes a second computing system 432 to train a neural network to identify or recognize word(s) and/or phrase(s) in image data representative of user interfaces (e.g., display region data). The example second computing system 432 includes a second neural network processor 434. In examples disclosed herein, the second neural network processor 434 implements a second neural network.


The example second computing system 432 of FIG. 4 includes a second neural network trainer 436. The example second neural network trainer 436 of FIG. 4 performs training of the neural network implemented by the second neural network processor 434. In some examples disclosed herein, training is performed using a stochastic gradient descent algorithm. However, other approaches to training a neural network may additionally or alternatively be used.


The example second computing system 432 of FIG. 4 includes a second training controller 438. The example training controller 438 instructs the second neural network trainer 436 to perform training of the neural network based on second training data 440. In the example of FIG. 4, the second training data 440 used by the second neural network trainer 436 to train the neural network is stored in a database 442. The example database 442 of the illustrated example of FIG. 4 is implemented by any memory, storage device and/or storage disc for storing data such as, for example, flash memory, magnetic media, optical media, etc. Furthermore, the data stored in the example database 442 may be in any data format such as, for example, binary data, comma delimited data, tab delimited data, structured query language (SQL) structures, image data, etc. While in the illustrated example, the database 442 is illustrated as a single element, the database 442 and/or any other data storage elements described herein may be implemented by any number and/or type(s) of memories.


In the example of FIG. 4, the second training data 440 can include listings of known words and/or fragments of words that may be displayed via user interfaces associated with user applications and/or a BIOS for purposes of training. For instance, the second training data can include words such as “file,” “save,” and “open” in a user interface associated with a word processing application and words such as “boot,” “setting,” and “configuration” in a user interface associated with the BIOS. As disclosed above, in some examples, fragments of words are used in the training data to train the neural network to refrain from attempting to identify a word if the corresponding output would be a nonsensical word.


The second neural network trainer 436 trains the neural network implemented by the neural network processor 434 using the training data 440. Based on the words in the training data 440, the second neural network trainer 436 trains the neural network 434 to recognize (e.g., predict, estimate, classify, recognize) the word(s) in the portion of the user interface in the display region data 202 associated with the touch event.


A text recognition model 444 is generated as a result of the neural network training. The text recognition model 444 is stored in a database 446. The databases 442, 446 may be the same storage device or different storage devices.


As discussed above, in examples in which end-to-end training is not used to generate the content recognition model 428, the display region content recognizer 412 executes the content recognition model 428 to identify the type of content in the display frame(s). In such examples, if text is identified by the display region content recognizer 412 as a result of execution of the content recognition model 428, then the text recognizer 414 of the display region analyzer 126 of FIG. 4 executes the text recognition model 444 to predict the identified text. Put another way, the text recognizer 414 executes the text recognition model 444 to predict or identify text in the portion(s) of the sampled display region(s) proximate to the touch event location(s) that are identified by the display region content recognizer 412 as including text. As a result of the execution of the text recognition model 444, the text recognizer 414 generates predicted text data 447 for the text in the user interface(s) in the display region data 202 associated with the touch event(s) (e.g., via optical character recognition). The predicted text data 447 is stored in the database 400. In some examples, the predicted text data 447 is used as the second training data 440 for training the neural network 434. The text recognizer 414 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130 executing block 810 and 814 of the flowchart of FIG. 8 (e.g., when end-to-end neural network training is not used).


The example display region analyzer 126 includes a text-to-speech synthesizer 448 to convert the predicted written text data 447 identified by the text recognizer 414 to phonemic representation(s) and, subsequently, to audio waveforms that are transmitted to the audio controller 122 (e.g., the audio output data 204) via the audio controller interface 415. The text-to-speech synthesizer 448 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130 executing block 816 of the flowchart of FIG. 8 (e.g., when end-to-end neural network training is not used). The audio controller 122 outputs the audio waveforms via the speaker(s) 121 of the user device 102 (FIG. 1).


In examples in which end-to-end training is not used, the display region content recognizer 412 may determine, using the content recognition model 428, that a sampled display frame in the display region data 202 includes a graphical element (e.g., a border of a window, an icon, etc.). In such examples, display region content recognizer 412 determines the haptic feedback to be generated based on haptic feedback rule(s) 452 stored in the database 400. The haptic feedback rule(s) 452 can define settings for the haptic feedback (e.g., vibration type, vibration intensity) based on user input(s). In some examples, the haptic feedback rule(s) 452 are defined by user input(s) or preference(s) defining haptic feedback setting(s) for the user device 102. The haptic feedback controller interface 450 transmits instructions to cause the haptic feedback controller 124 to output haptic feedback via the actuator(s) 123 (FIG. 1).


In some examples, display region content recognizer 412 identifies text and a graphical element in the user interface content at the location of the touch event. For example, an icon representing a user application may include text in addition to a graphical element. As another example, text may be included a header or border of a window. In such examples, audio and haptic feedback output(s) can be provided based on respective analyses performed by one or more of (a) the display region content recognizer 412 (when end-to-end neural network training is used) and/or (b) the display region content recognizer 412 and the text recognizer 414 (when end-to-end neural network training is not used). In other examples in which the graphical element includes text, the display region content recognizer 412 and/or the text recognizer 414 only generate an audio output indicative of the text in the graphical element and no haptic feedback is generated.


In some examples in which the display region includes graphical element (e.g., an icon), the display region content recognizer 412 may generate audio representative of the graphical element, including for graphical elements that do not include text. For instance, in response to detection of an icon illustrating a storage disk, the display region content recognizer 412 may determine that audio including the word “save” should be output to inform the user of the presence of this menu option in the display region and to provide more guidance to the user than would be provided by (only) haptic feedback. Thus, in some examples, the neural network can be trained to correlate graphical element(s) with audio output(s).


As disclosed above, in some examples, the display region content recognizer 412 determines that the sampled display region includes data that should not trigger an audio and/or haptic feedback output. Such data can include, for instance, portion(s) of the display region not associated with text or graphical element(s) (e.g., icon(s)). In such examples, if the display region content recognizer 412 determines that the location of the touch event is most closely associated with the blank or empty portion(s) of the user interface, then no outputs for that sampled display region. However, in other examples, a blank or empty portion(s) of the user interface(s) can prompt haptic feedback output(s) (depending on the labeled training data 424 and the training of the neural network 418) to alert the user that the location of the user's touch has moved away from text and/or other characters.


In some instances, the display region content recognizer 412 fails recognize the word(s) in the text despite execution of the content recognition model 428 (i.e., when end-to-end training of the model 428 is used). In such examples, the display region content recognizer 412 refrains from generating predicted text data for the unidentified or unrecognized word. Similarly, in some examples, the text recognizer 414 fails to recognize the word(s) in the text despite execution of the text recognition model 444. In such examples, the text recognizer 414 refrains from generating predicted text data for the unidentified or unrecognized word. Thus, in these examples, an audio output is not generated to avoid an incorrect or nonsensical word from being presented.


Thus, examples disclosed herein support different neural network schemes including (a) end-to-end training in which the display region content recognizer 412 executes the content recognition model 428 to identify the content in the sampled display frame(s) and to generate output(s) or (b) training of multiple neural network models in which text recognition and text-to-speech conversion are performed separately from the identification of the type of content in the sampled display frame(s).


Although some examples disclosed herein identify portion(s) of the sampled display region(s) in the display region data 202 as prompting haptic feedback output(s) using neural network model(s), in other examples, the determination of the haptic feedback output(s) can be based on image feature analysis. For instance, the example display region analyzer 126 can include an image brightness analyzer 454 to identify portion(s) of the sampled display region(s) associated with changes in luminance (e.g., brightness) or luminance contrasts of content in the sampled display region(s). For instance, a border of a window of a word processing application may be associated with a different color than the white color of the word document in the user interface. Based on this luminance contrast between the word document and the window border, the image brightness analyzer 454 determines that a haptic feedback output should be provided when the touch position data 200 indicates that a touch event has occurred proximate to the window border. Thus, in some examples, the haptic feedback output can be determined without use of a neural network and, instead, based on image feature analysis. The image brightness analyzer 454 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130.


The image brightness analyzer 454 can identify portion(s) of the user interface that should be associated with haptic feedback output(s) based on image brightness rule(s) 456 stored in the database 400. The image brightness rule(s) 456 can define differences or degrees of luminance contrasts (e.g., color contrast ratios) and associated haptic feedback output(s). The image brightness rule(s) 456 can be defined based on user input(s). The haptic feedback controller interface 450 transmits instruction(s) from the image brightness analyzer 454 to cause the haptic feedback controller 124 to output haptic feedback via the actuator(s) 123 (FIG. 1).


In some examples, a user interacting with the user device 102 may provide verbal commands that are captured by the microphone(s) 119 of the user device 102 as microphone data 458 in addition to touch input(s). The display region analyzer 126 includes a voice command analyzer 460 to analyze the microphone data 458 received from the microphone(s) 119. The voice command analyzer 460 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130. The voice command analyzer 460 interprets the voice command using speech-to-text analysis and compares the voice command to, for example, text identified by the display region content recognizer 412 and/or the text recognizer 414 in the sampled display region associated with the touch event. If the detected voice command corresponds to the text in the user interface (e.g., within a threshold degree of accuracy), the voice command analyzer 460 confirms that the user wishes to perform the action associated with the voice command and the text in the sampled display region. The voice command analyzer 460 can communicate the verified command to user application(s) associated with the display frame(s). In such examples, the touch event(s) can mimic or replace mouse or keyboard event(s) (e.g., left-click, right-click, escape) in response to the verified voice command.


In some examples, the voice command analyzer 460 cooperates with user application(s) to cause the touch and voice commands to be executed. For example, a user application may perform certain function in response to verbal and/or touch command(s). The voice command analyzer 460 can verify such known functions and associated command(s) based on the microphone data and touch event data. For example, if the voice command analyzer 460 detects the word “save” in the microphone data 458 and display region content recognizer 412 or the text recognizer 414 identifies the text as the word “save,” then the voice command analyzer 460 confirms that the user wishes to save an item such as a word document. The voice command analyzer 460 can send instructions to the user application with which the user is interacting to confirm that the voice command has been verified. Thus, in examples disclosed herein, the use of voice and touch can improve an accuracy with which users, such as visually impaired users, provide commands to user application(s) installed on the user device 102 and receive expected results.


In some examples, one or more of the display region content recognizer 412 and/or the text recognizer 414 provides the outputs of the neural network processing (i.e., the predicted content data 427 and/or the predicted text data 447) to the processor 110 of the user device for further processing by, for instance, the operating system and/or the user applications of the user device 102. The touch position data 200 can also be provided to the processor 110 for use by the operating system and/or user applications of the user device 102 to respond to user interactions with the device 102 in connection with the results of the neural network processing. For example, a user who is motor impaired may not be able to hold his or her hand steady enough to double tap in the same location on the display screen. In examples disclosed herein, the user can keep his or her hand resting on the display screen while the touch position data and/or identified content is provided to the user application. In such examples, the output of the user application and/or operating system can be the same as if the user has performed the double click function.


While an example manner of implementing the display region analyzer 126 of FIGS. 1 and/or 2 is illustrated in FIG. 4, one or more of the elements, processes and/or devices illustrated in FIG. 4 may be combined, divided, re-arranged, omitted, eliminated and/or implemented in any other way. Further, the example database 400, the example touch controller interface 401, the example display controller interface 405, the example neural network accelerator interface 407, the example display controller manager 403, the example touch event analyzer 402, the example touch location mapper 410, the example display region content recognizer 412, the example text recognizer 414, the example text-to-speech synthesizer 448, the example audio controller interface 415, the example haptic feedback controller interface 450, the example image brightness analyzer 454, the example voice command analyzer 460 and/or, more generally, the example display region analyzer 126 of FIG. 4 may be implemented by hardware, software, firmware and/or any combination of hardware, software and/or firmware. Thus, for example, any of the example database 400, the example touch controller interface 401, the example display controller interface 405, the example neural network accelerator interface 407, the example display controller manager 403, the example touch event analyzer 402, the example touch location mapper 410, the example display region content recognizer 412, the example text recognizer 414, the example text-to-speech synthesizer 448, the example audio controller interface 415, the example haptic feedback controller interface 450, the example image brightness analyzer 454, the example voice command analyzer 460 and/or, more generally, the example display region analyzer 126 could be implemented by one or more analog or digital circuit(s), logic circuits, programmable processor(s), programmable controller(s), graphics processing unit(s) (GPU(s)), digital signal processor(s) (DSP(s)), application specific integrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)) and/or field programmable logic device(s) (FPLD(s)). When reading any of the apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the example database 400, the example touch controller interface 401, the example display controller interface 405, the example neural network accelerator interface 407, the example display controller manager 403, the example touch event analyzer 402, the example touch location mapper 410, the example display region content recognizer 412, the example text recognizer 414, the example text-to-speech synthesizer 448, the example haptic feedback controller interface 450, the example image brightness analyzer 454, and/or the example voice command analyzer 460 is/are hereby expressly defined to include a non-transitory computer readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc. including the software and/or firmware. Further still, the example display region analyzer 126 may include one or more elements, processes and/or devices in addition to, or instead of, those illustrated in FIG. 4, and/or may include more than one of any or all of the illustrated elements, processes, and devices. As used herein, the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events.


While an example manner of implementing the first computing system 416 is illustrated in FIG. 4, one or more of the elements, processes and/or devices illustrated in FIG. 4 may be combined, divided, re-arranged, omitted, eliminated and/or implemented in any other way. Further, the example neural network processor 418, the example trainer 420, the example training controller 422, the example database(s) 426, 428 and/or, more generally, the example first computing system 416 of FIG. 4 may be implemented by hardware, software, firmware and/or any combination of hardware, software and/or firmware. Thus, for example, any of the example neural network processor 418, the example trainer 420, the example training controller 422, the example database(s) 426, 428 and/or, more generally, the example first computing system 416 could be implemented by one or more analog or digital circuit(s), logic circuits, programmable processor(s), programmable controller(s), graphics processing unit(s) (GPU(s)), digital signal processor(s) (DSP(s)), application specific integrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)) and/or field programmable logic device(s) (FPLD(s)). When reading any of the apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the example neural network processor 418, the example trainer 420, the example training controller 422, and/or the example database(s) 426, 430 is/are hereby expressly defined to include a non-transitory computer readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc. including the software and/or firmware. Further still, the example first computing system 416 may include one or more elements, processes and/or devices in addition to, or instead of, those illustrated in FIG. 4, and/or may include more than one of any or all of the illustrated elements, processes, and devices. As used herein, the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events.


While an example manner of implementing the second computing system 432 is illustrated in FIG. 4, one or more of the elements, processes and/or devices illustrated in FIG. 4 may be combined, divided, re-arranged, omitted, eliminated and/or implemented in any other way. Further, the example neural network processor 434, the example trainer 436, the example training controller 438, the example database(s) 442, 446 and/or, more generally, the example second computing system 432 of FIG. 4 may be implemented by hardware, software, firmware and/or any combination of hardware, software and/or firmware. Thus, for example, any of the example neural network processor 434, the example trainer 436, the example training controller 438, the example database(s) 442, 446 and/or, more generally, the example second computing system 32 could be implemented by one or more analog or digital circuit(s), logic circuits, programmable processor(s), programmable controller(s), graphics processing unit(s) (GPU(s)), digital signal processor(s) (DSP(s)), application specific integrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)) and/or field programmable logic device(s) (FPLD(s)). When reading any of the apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the example neural network processor 434, the example trainer 436, the example training controller 438, and/or the example database(s) 442, 446 is/are hereby expressly defined to include a non-transitory computer readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disk (CD), a Blu-ray disk, etc. including the software and/or firmware. Further still, the example second computing system 432 may include one or more elements, processes and/or devices in addition to, or instead of, those illustrated in FIG. 4, and/or may include more than one of any or all of the illustrated elements, processes, and devices. As used herein, the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events.


A flowchart representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the example display region sampler 201 of FIG. 2 is shown in FIG. 5. A flowchart representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the example first computing system 416 of FIG. 4 is shown in FIG. 6. A flowchart representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the example second computing system 432 of FIG. 4 is shown in FIG. 7. A flowchart representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the example display region analyzer 126 of FIGS. 1, 2, and/or 4 is shown in FIG. 8. The machine readable instructions of FIGS. 5-8 may be one or more executable programs or portion(s) of an executable program for execution by a computer processor and/or processor circuitry, such as the processors 110, 128, 130, 912, 1012 shown in the example processor platforms discussed below in connection with FIGS. 9-12. The program may be embodied in software stored on a non-transitory computer readable storage medium such as a CD-ROM, a floppy disk, a hard drive, a DVD, a Blu-ray disk, or a memory associated with the processor(s) 110, 128, 130, 912, 1012 but the entire program and/or parts thereof could alternatively be executed by a device other than the processor(s) 110, 128, 130, 912, 1012 and/or embodied in firmware or dedicated hardware. Further, although the example program is described with reference to the flowcharts illustrated in FIGS. 5-8, many other methods of implementing the example display region sampler 201, the example first computing system 416, the example second computing system 432, and/or the example display region analyzer 126 may alternatively be used. For example, the order of execution of the blocks may be changed, and/or some of the blocks described may be changed, eliminated, or combined. Additionally or alternatively, any or all of the blocks may be implemented by one or more hardware circuits (e.g., discrete and/or integrated analog and/or digital circuitry, an FPGA, an ASIC, a comparator, an operational-amplifier (op-amp), a logic circuit, etc.) structured to perform the corresponding operation without executing software or firmware. The processor circuitry may be distributed in different network locations and/or local to one or more devices (e.g., a multi-core processor in a single machine, multiple processors distributed across a server rack, etc.).


The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a compiled format, an executable format, a packaged format, etc. Machine readable instructions as described herein may be stored as data or a data structure (e.g., portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers) located at the same or different locations of a network or collection of networks (e.g., in the cloud, in edge devices, etc.). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, compilation, etc. in order to make them directly readable, interpretable, and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and stored on separate computing devices, wherein the parts when decrypted, decompressed, and combined form a set of executable instructions that implement one or more functions that may together form a program such as that described herein.


In another example, the machine readable instructions may be stored in a state in which they may be read by processor circuitry, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc. in order to execute the instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, machine readable media, as used herein, may include machine readable instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s) when stored or otherwise at rest or in transit.


The machine readable instructions described herein can be represented by any past, present, or future instruction language, scripting language, programming language, etc. For example, the machine readable instructions may be represented using any of the following languages: C, C++, Java, C #, Perl, Python, JavaScript, HyperText Markup Language (HTML), Structured Query Language (SQL), Swift, etc.


As mentioned above, the example processes of FIGS. 5-8 may be implemented using executable instructions (e.g., computer and/or machine readable instructions) stored on a non-transitory computer and/or machine readable medium such as a hard disk drive, a flash memory, a read-only memory, a compact disk, a digital versatile disk, a cache, a random-access memory and/or any other storage device or storage disk in which information is stored for any duration (e.g., for extended time periods, permanently, for brief instances, for temporarily buffering, and/or for caching of the information). As used herein, the term non-transitory computer readable medium is expressly defined to include any type of computer readable storage device and/or storage disk and to exclude propagating signals and to exclude transmission media.


“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, and (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.


As used herein, singular references (e.g., “a”, “an”, “first”, “second”, etc.) do not exclude a plurality. The term “a” or “an” entity, as used herein, refers to one or more of that entity. The terms “a” (or “an”), “one or more”, and “at least one” can be used interchangeably herein. Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., a single unit or processor. Additionally, although individual features may be included in different examples or claims, these may possibly be combined, and the inclusion in different examples or claims does not imply that a combination of features is not feasible and/or advantageous.



FIG. 5 is a flowchart is a flowchart representative of example machine readable instructions 500 that, when executed by the display region sampler 201 of the display controller 120 of the example user device 102 of FIGS. 1 and/or 2, cause the display region sampler 201 to generate the display region data 202 in response to a touch event on the display screen 104. The example instructions 500 begins with the display controller 120 receiving touch position data 200 from the touch controller 108 of the user device 102 and/or instruction(s) from the display controller manager 403 of the display region analyzer 126 (block 502). The touch position data 200 is generated in response to a touch event on the display screen 104 as detected by the display screen touch sensor(s) 106 of the display screen 104. The touch position data 200 includes coordinate data indicating the position of the touch relative to the display screen and a time of the touch event. In some examples, the instructions from the display controller manager 403 are generated in response to receipt of the touch position data 200 by the display region analyzer 126. The instructions from the display controller manager 403 can include the location of the touch event (e.g., as determined from the touch position data 200).


In response to notification of the occurrence of the touch event(s), the display region sampler 201 of the display controller 120 samples or captures a portion of a display frame (e.g., the display frame 300) displayed via the display screen 104 at the time of the touch event to generate the display region data 202 (e.g., image data) (block 504). The portion of the display frame sampled by the display region sampler 201 can include content at displayed at the coordinates of the user's touch and surrounding content (e.g., content located within a threshold distance of the touch event location). In some examples, the display frame(s) sampled by the display region sampler 201 include display frame(s) associated with the operating system 112 and/or user applications 113 on the device 102. In other examples, the display frame(s) sampled by the display region sampler 201 include display frame(s) associated with the BIOS 114 of the user device 102.


The display region sampler 201 transmits the display region data 202 to the display region analyzer 126 via, for example, wired or wireless communication protocols (block 506). If additional touch position data 200 is received from the touch controller 108 and/or additional instruction(s) are received from the display controller manager 403 (e.g., in response to newly detected touch events by the display screen touch sensor(s) 106), the display region sampler 201 continues to sample the display frame(s) associated with the touch event(s) to output display region data 202 for analysis by the display region analyzer 126 (block 508). The example instructions 500 of FIG. 5 ends when no further touch position data 200 and/or instruction(s) are received (block 510).



FIG. 6 is a flowchart representative of example machine readable instructions that, when executed by the example first computing system 416 of FIG. 4, cause the first computing system 416 to train a neural network to identify (e.g., predict, estimate, classify, recognize) content in a user interface and to determine a response (e.g., audio output, haptic feedback output, or no output) for the content. The example instructions 600 of FIG. 6, when executed by the first computing system 416 of FIG. 4, result in a neural network and/or a model thereof, that can be distributed to other computing systems, such as the display region content recognizer 412 of the example display region analyzer 126 of FIG. 4.


The example instructions 600 of FIG. 6 begin with the training controller 422 accessing display frame image data stored in the example database 426 (block 602). The display frame image data can include previously generated display frames associated with operation system(s), user application(s), and/or a BIOS of user device(s). In some examples, the data stored in the database 426 includes the display region data 202 from the display control 120 and/or the predicted content data 427 generated by the display region content recognizer 412 (e.g., a part of feedback training).


The example training controller 422 labels the display image data (or portions thereof) with content in the display frame(s) that should prompt an audio output (e.g., text) and/or a haptic feedback output (e.g., graphical element(s) such as icon(s) or lines defining menu boxes) (block 603). The first training data 424 is also labeled content that should not prompt an audio output and/or a haptic feedback output (e.g., fragments or portions or word(s) and/or phrase(s), blank or empty portions of the user interface(s) that do not include text and/or graphical elements(s) may be labeled as content that should not prompt an output.


In examples in which end-to-end neural network training is used (block 604), the first training data 424 is labeled with the output(s) that should be generated (e.g., audio speech sample(s) for text, haptic feedback output(s) for non-text character(s)) (block 605).


The example training controller 422 generates the training data 424 based on the labeled image data (block 606).


The example training controller 422 instructs the neural network trainer 420 to perform training of the neural network 418 using the training data 424 (block 608). In the example of FIG. 6, the training is based on supervised learning. As a result of the training, the content recognition model 428 is generated (block 610). Based on the content recognition model 428, the neural network is trained to predict a type of content in a user interface (e.g., text, graphical element, portion(s) without text or graphical element(s)). In examples in which end-to-end neural network training is performed, the neural network is trained, based on the content recognition model 428, to generate an output to be provided (e.g., audio output, haptic feedback output, no output) in response to the text. The content recognition model 428 can be stored in the database 430 for access by the display region content recognizer 412 of the display region analyzer 126. The example instructions 600 of FIG. 6 end when no additional training (e.g., retraining) is to be performed (blocks 612, 614).



FIG. 7 is a flowchart representative of example machine readable instructions that, when executed by the example second computing system 432 of FIG. 4, cause the second computing system 432 to train a neural network to recognize or identify (e.g., predict, estimate, classify, recognize) word(s) and/or phrase(s) in image data representative of display frame(s). The example instructions 700 of FIG. 7, when executed by the second computing system 432 of FIG. 4, result in a neural network and/or a model thereof, that can be distributed to other computing systems, such as the text recognizer 414 of the example display region analyzer 126 of FIG. 4. In some examples, the instructions of FIG. 7 are executed when end-to-end neural network training is not performed for the content recognition model 428 (block 604 of FIG. 6)


The example instructions 700 of FIG. 7 begin with the training controller 438 accessing listing or known word(s) and/or phrase(s) used in user interface(s) and/or image data of user interface(s) including word(s) and/or phrase(s) stored in the example database 442 (block 702). The known words can include be displayed via interfaces associated with an operation system, user application(s), and/or a BIOS of a user device. In some examples, the data stored in the database 442 include predicted text data 447 generated by the text recognizer 414.


The example training controller 438 labels the word(s) and/or phrase(s) in the listing(s) and/or image data to be used for training purposes (block 704). In some examples, the labeled content includes fragments or portions of words. The example training controller 422 generates the training data 440 based on the content in the labeled image data (block 706).


The example training controller 438 instructs the neural network trainer 436 to perform training of the neural network 434 using the training data 440 (block 708). In the example of FIG. 7, the training is based on supervised learning. As a result of the training, the text recognition model 444 is generated (block 710). Based on the text recognition model 444, the neural network is trained to recognize word(s) and/or phrase(s) in text in user interface(s). The text recognition model 444 can be stored in the database 446 for access by the text recognizer 414 of the display region analyzer 126. The example instructions 700 of FIG. 6 end when no additional training (e.g., retraining) is to be performed (blocks 712, 714).



FIG. 8 is a flowchart representative of example machine readable instructions 800 that, when executed by the display region analyzer 126 of FIGS. 1, 2 and/or 4, cause the display region analyzer 126 to identify content displayed via a display frame (e.g., a graphical user interface) that is associated with a touch event (e.g., a user finger touch, a touch by an input device such as a stylus) on the display screen 104 of the user device 102 and to generate output(s) representative of the content. The example instructions 800 of FIG. 8 can be executed by one or more of the microcontroller 130 of the SoC 128, the processor 110 of the user device 102, the processor 132 of a second user device, and/or the cloud-based device(s) 136.


The example instructions 800 of FIG. 8 begin with the display region analyzer 126 accessing touch position data 200 from the touch controller 108 and display region image data 202 from the display controller 120 of the user device 102 (block 802). The touch event analyzer 402 of the display region analyzer 126 analyzes the touch position data 200 to confirm that the touch event is a touch event intended for the display region analyzer 126 and not a touch event associated with, for example, a gesture to control a user application on the device 102 (block 804). The touch event analyzer 402 verifies the touch event based on the touch event rule(s) 406 defining the touch event(s) and stored in the database 400.


The touch location mapper 410 maps or synchronizes the touch position data 200 with the display region data 202 (block 806). In some examples, the touch location mapper 410 identifies the location of the touch event relative to a sampled display region in the display region data 202. In some examples, the touch location mapper 410 synchronizes the touch position data stream 200 and the display region data stream 202 using time stamps for each data stream.


The display region content recognizer 412 executes the content recognition model 428 to identify content in the sampled display region associated with the touch event and to determine corresponding response to be provided (e.g., audio output, haptic feedback output, not output) (block 808). As result of the execution of the content recognition model 428, the display region content recognizer 412 generates predicted content data 427 that identifies the content associated with the touch event (e.g., text, graphics, empty portion). In some examples, the display region content recognizer 412 verifies the content mostly closely associated with the touch event based on the mapping performed by the touch location mapper 410 and/or the touch position data 200 (e.g., in examples where the display region data includes, for instance, text and a graphical element).


In the example of FIG. 8, if the display region content recognizer 412 identifies text in the user interface content associated with the touch event (block 810), one or more of the display region content recognizer 412 or the text recognizer 414 identifies (e.g., predicts, estimates, classifies) the word(s) in the text (block 814). In examples where end-to-end training was used to generate the content recognition model 428, the display region content recognizer 412 identifies the text as a result of execution of the content recognition model 428. In examples in which end-to-end training was not used to generate the content recognition model 428, the text recognizer 414 executes the text recognition model 444 in response to the identification of text in the sampled display frame(s).


If the display region content recognizer 412 is able to identify the word(s) in the identified text (block 814), the display region content recognizer 412 generates audio speech sample(s) including the word(s) as a result of execution of content recognition model 428 (generated via end-to-end neural network training) (block 816). The audio controller interface 415 transmits the audio sample(s) to the audio controller 122 for output via the speaker(s) 121 of the device 102 (block 816).


Alternatively, if the text recognizer 414 is able to identify the word(s) (block 814), the text recognizer 414 generates the predicted text data 447 including the predicted word(s). The predicted text data 447 is used by the text-to-speech synthesizer 448 to convert the text to audio waveforms that are transmitted to the audio controller 122 for output via the speaker(s) 121 of the device 102 (block 816).


In the example of FIG. 8, if the display region content recognizer 412 or the text recognizer 414 do not recognize the word(s) in the text or are unable to identify the word(s), no audio output(s) are generated to prevent, for instance, nonsensical words from being output as audio (block 817).


In the example of FIG. 8, if the display region content recognizer 412 identifies non-text character(s) or graphical element(s) (e.g., icons, menu border lines) in the sampled display region associated with the touch event, the display region content recognizer 412 generates instructions for the haptic feedback controller 124 to output haptic feedback (e.g., vibration(s)) via the haptic feedback actuator(s) 123 of the device 102 (block 820). The display region content recognizer 412 can identify the haptic feedback to be output based on neural network analysis (e.g., when end-to-end training is used) or based on the haptic feedback rule(s) 452. In some examples, the haptic feedback output(s) are provided in connection with audio output(s) for text (e.g., if the graphical element(s) include text). The haptic feedback controller interface 450 transmits instruction(s) to the haptic feedback controller 124 to cause the haptic feedback to be generated via the haptic feedback actuator(s) 123.


In some examples, the haptic feedback analysis at blocks 818 and 820 is performed by the image brightness analyzer 454 of the device 102. In such examples, the image brightness analyzer 454 analyzes properties of the user interface image data to identify changes in, for instance, luminance, which can serve as indicators of changes between user application windows, menus, etc.


In the example of FIG. 8, when the display region content recognizer 412 does not identify text and/or graphical element(s) in the sampled display region associated with the touch event, the display region content recognizer 412 determines that the user interface includes data associated with no audio and/or haptic feedback outputs (block 822). In some other examples, the display region content recognizer 412 identifies content in the user interface as associated with no outputs (e.g., portions of phrases) based on execution of the content recognition model 428. In such examples, no output(s) are generated.


The display region analyzer 126 continues to analyze the display region data 202 as the data is received from the display controller 120 (e.g., in response to new touch event(s) detected by the touch controller and/or as part of periodic sampling of the user interface(s) presented via the display screen 104). Thus, the display region analyzer 126 can provide audio and/or haptic feedback output(s) that track user touch event(s) on the display screen relative to the displayed content. The example instructions 800 of FIG. 8 end when there is no further image data to analyze (blocks 826, 828).



FIG. 9 is a block diagram of an example processor platform 900 structured to execute the instructions of FIG. 6 to implement the first computing system 416 of FIG. 4. The processor platform 900 can be, for example, a server, a personal computer, a workstation, a self-learning machine (e.g., a neural network), a mobile device (e.g., a cell phone, a smart phone, a tablet such as an iPad™), a personal digital assistant (PDA), an Internet appliance, or any other type of computing device.


The processor platform 900 of the illustrated example includes a processor 912. The processor 912 of the illustrated example is hardware. For example, the processor 912 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example neural network processor 418, the example trainer 420, and the example training controller 422.


The processor 912 of the illustrated example includes a local memory 913 (e.g., a cache). The processor 912 of the illustrated example is in communication with a main memory including a volatile memory 914 and a non-volatile memory 916 via a bus 918. The volatile memory 914 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 916 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 914, 916 is controlled by a memory controller.


The processor platform 900 of the illustrated example also includes an interface circuit 920. The interface circuit 920 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.


In the illustrated example, one or more input devices 922 are connected to the interface circuit 920. The input device(s) 922 permit(s) a user to enter data and/or commands into the processor 912. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.


One or more output devices 924 are also connected to the interface circuit 920 of the illustrated example. The output devices 924 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. The interface circuit 920 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.


The interface circuit 920 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 926. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.


The processor platform 900 of the illustrated example also includes one or more mass storage devices 928 for storing software and/or data. Examples of such mass storage devices 928 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.


The machine executable instructions 932 of FIG. 6 may be stored in the mass storage device 928, in the volatile memory 914, in the non-volatile memory 916, and/or on a removable non-transitory computer readable storage medium such as a CD or DVD.



FIG. 10 is a block diagram of an example processor platform 1000 structured to execute the instructions of FIG. 7 to implement the second computing system 432 of FIG. 4. The processor platform 1000 can be, for example, a server, a personal computer, a workstation, a self-learning machine (e.g., a neural network), a mobile device (e.g., a cell phone, a smart phone, a tablet such as an iPad™), a personal digital assistant (PDA), an Internet appliance, or any other type of computing device.


The processor platform 1000 of the illustrated example includes a processor 1012. The processor 1012 of the illustrated example is hardware. For example, the processor 1012 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example neural network processor 434, the example trainer 436, and the example training controller 438.


The processor 1012 of the illustrated example includes a local memory 1013 (e.g., a cache). The processor 1012 of the illustrated example is in communication with a main memory including a volatile memory 1014 and a non-volatile memory 1016 via a bus 1018. The volatile memory 1014 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 1016 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1014, 1016 is controlled by a memory controller.


The processor platform 1000 of the illustrated example also includes an interface circuit 1020. The interface circuit 1020 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.


In the illustrated example, one or more input devices 1022 are connected to the interface circuit 1020. The input device(s) 1022 permit(s) a user to enter data and/or commands into the processor 1012. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.


One or more output devices 1024 are also connected to the interface circuit 1020 of the illustrated example. The output devices 1024 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. The interface circuit 1020 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.


The interface circuit 1020 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 1026. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.


The processor platform 1000 of the illustrated example also includes one or more mass storage devices 1028 for storing software and/or data. Examples of such mass storage devices 1028 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.


The machine executable instructions 1032 of FIG. 7 may be stored in the mass storage device 1028, in the volatile memory 1014, in the non-volatile memory 1016, and/or on a removable non-transitory computer readable storage medium such as a CD or DVD.



FIG. 11 is a block diagram of an example processor platform 1100 structured to implement the user device 102 of FIG. 1. The processor platform 1100 can be, for example, a personal computer, a workstation, a self-learning machine (e.g., a neural network), a mobile device (e.g., a cell phone, a smart phone, a tablet such as an iPad™), a personal digital assistant (PDA), an Internet appliance, or any other type of computing device.


The processor platform 1100 of the illustrated example includes the system-on-chip (SoC) 128. In this example, the SoC 128 includes logic circuitry (e.g., an integrated circuit) encapsulated in a package such as a plastic housing. As disclosed herein, the SoC 128 implements the example display region analyzer 126 and the neural network accelerator 132. An example implementation of the SoC 128 is shown in FIG. 12.


The processor platform 1100 of the illustrated example includes the processor 110. The processor 110 of the illustrated example is hardware (e.g., an integrated circuit). For example, the processor 110 can be implemented by one or more integrated circuits, logic circuits, central processing units, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In the example of FIG. 11, the processor implements the example touch controller 108, the example display controller 120, the example audio controller 122, and the example haptic feedback controller 124. However, one or more of the touch controller 108, the display controller 120, the audio controller 122, and/or the haptic feedback controller 124 may be implemented by other circuitry.


The processor 110 of the illustrated example includes a local memory 1113 (e.g., a cache). The processor 110 of the illustrated example is in communication with a main memory including a volatile memory 1114 and a non-volatile memory 1116 via the bus 118. The volatile memory 1114 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 1116 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1114, 1116 may be controlled by a memory controller.


The processor platform 1100 of the illustrated example also includes an interface circuit 1120. The interface circuit 1120 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.


In the illustrated example, one or more input devices 1122 are connected to the interface circuit 1120. The input device(s) 1122 permit(s) a user to enter data and/or commands into the processor 110. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.


One or more output devices 1124 are also connected to the interface circuit 1120 of the illustrated example. The output devices 1124 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. The interface circuit 1120 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.


The interface circuit 1120 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 1126. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.


The processor platform 1100 of the illustrated example also includes one or more mass storage devices 1128 for storing software and/or data. Examples of such mass storage devices 1128 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.


Machine executable instructions 1132 corresponding to the BIOS 114, the operating system 112, the user application(s) 113, and/or some or all of the instructions of FIG. 8 may be stored in the mass storage device 1128, in the volatile memory 1114, in the non-volatile memory 1116, and/or on a removable non-transitory computer readable storage medium such as a CD or DVD.



FIG. 12 is a block diagram of an example of an example implementation of the system-on-chip 128 of FIGS. 1 and/or 11.


The SoC 128 includes the neural network accelerator 132. The neural network accelerator 132 is implemented by one or more integrated circuits, logic circuits, microprocessors, or controllers from any desired family or manufacturer. In this example, the neural network accelerator 132 executes the example display region content recognizer 412.


The SoC 128 of the illustrated example includes the processor 130. The processor 130 of the illustrated example is hardware. For example, the processor 130 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor 130 is implemented by a microcontroller. In this example, the microcontroller 130 implements the example touch controller interface 401, the example display controller interface 405, the example neural network accelerator interface 407, the example display controller manager 403, the example text-to-speech synthesizer 448, the example audio controller interface 415, the example haptic feedback controller interface 450, the example image brightness analyzer 454, and the example voice command analyzer 460. In this example, the microcontroller 130 executes the instructions of FIG. 8 to implement the display region analyzer 126.


The processor 130 of the illustrated example includes a local memory 1213 (e.g., a cache). The processor 130 of the illustrated example is in communication with a main memory including a volatile memory 1214 and a non-volatile memory 1216 via a local bus 1218. The volatile memory 1214 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 1216 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1214, 1216 is controlled by a memory controller.


The example processor platform of FIG. 12 also includes an interface circuit 1220. The interface circuit 1220 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, and/or a network interface to facilitate exchange of data with external machines (e.g., with the processor 110 and/or any other circuitry and/or computing devices of any kind). The communication can be via, for example, the bus 118 of the processor platform of FIGS. 1 and 11.


The machine executable instructions 1232 of FIG. 8 may be stored in the volatile memory 1214 and/or in the non-volatile memory 1216.


The SoC 128 of FIG. 12 is circuitry encapsulated in a housing or package. The SoC 128 may include connectors to couple the SoC 128 to a printed circuit board. These connectors may be part of the interface 1220 and serve to carry signals to/from the SoC 128.


A block diagram illustrating an example software distribution platform 1305 to distribute software such as the example computer readable instructions 800 of FIG. 8 to third parties is illustrated in FIG. 13. The example software distribution platform 1305 may be used to update the instruction(s) 800 of FIG. 8 on the SoC 128 and/or for the system 100 when the SoC 128 is not present and the instructions 800 of FIG. 8 are executed by the CPU 110.


The example software distribution platform 1305 may be implemented by any computer server, data facility, cloud service, etc., capable of storing and transmitting software to other computing devices. The third parties may be customers of the entity owning and/or operating the software distribution platform. For example, the entity that owns and/or operates the software distribution platform may be a developer, a seller, and/or a licensor of software such as the example computer readable instructions 800 of FIG. 8. The third parties may be consumers, users, retailers, OEMs, etc., who purchase and/or license the software for use and/or re-sale and/or sub-licensing. In the illustrated example, the software distribution platform 1305 includes one or more servers and one or more storage devices. The storage devices store respective computer readable instructions 800 of FIG. 8 as described above. The one or more servers of the example software distribution platform 1305 are in communication with a network 1310, which may correspond to any one or more of the Internet and/or any of the example networks 1126 described above. In some examples, the one or more servers are responsive to requests to transmit the software to a requesting party as part of a commercial transaction. Payment for the delivery, sale and/or license of the software may be handled by the one or more servers of the software distribution platform and/or via a third party payment entity. The servers enable purchasers and/or licensors to download the computer readable instructions 800 from the software distribution platform 1305. For example, the example computer readable instructions 800 of FIG. 8 may be downloaded to the example processor platform(s) 128, 1100, which are to execute the computer readable instructions 800 to implement the display region analyzer 126. In some example, one or more servers of the software distribution platform 1305 periodically offer, transmit, and/or force updates to the software (e.g., the example computer readable instructions 800 of FIG. 8) to ensure improvements, patches, updates, etc. are distributed and applied to the software at the end user devices.


From the foregoing, it will be appreciated that example methods, systems, apparatus, and articles of manufacture have been disclosed that provide for enhanced user accessibility of an electronic user device for a visually, neurologically, and/or motor impaired user interacting with the device. Examples disclosed herein dynamically respond to touch events on a display screen of the device by generating image data corresponding to a portion of a display frame (e.g., graphical user interface) displayed on the display screen associated with the touch event. Examples disclosed herein execute neural network model(s) to identify content in the portion of the user interface associated with the touch event and to determine a response to be provided by the user device. Some examples disclosed herein generate audio outputs in response to recognition of text in the display frame to provide the user with an audio stream of words and/or phrases displayed on the screen as the user moves his or her finger relative to the screen. Additionally or alternatively, examples disclosed herein can provide haptic outputs that provide the user with feedback when, for instance, the user touch event is proximate to a graphical element such as an icon or line of a menu box. Thus, examples disclosed herein provide a visually impaired user with increased awareness of the content displayed on the screen in response to touches on the display screen. Moreover, examples disclosed herein can be implemented independent of an operating system of the device via, for instance, a system-on-chip architecture. As a result, example disclosed herein can provide user accessibility features in connection with different user applications, operating systems, and/or computing environments such as BIOS mode.


Example methods, apparatus, systems, and articles of manufacture to provide accessible user interfaces are disclosed herein. Further examples and combinations thereof include the following:


Example 1 includes an apparatus including a display region analyzer to identify one or more of text or graphics in display frame image data, the display frame image data corresponding to a portion of a display frame associated with a touch event on a display screen of an electronic device; an audio controller interface to transmit, in response to the identification of the text in the display frame image data, an instruction including audio output corresponding to the text to be output by the electronic device; and a haptic feedback controller interface to transmit, in response to the identification of the graphics in the display frame image data, an instruction including a haptic feedback response to be output by the electronic device.


Example 2 includes the apparatus of example 1, wherein the display region analyzer is to execute a neural network model to identify the one or more of the text or graphics.


Example 3 includes the apparatus of examples 1 or 2, further including a display controller manager to cause a display controller to generate the display frame image data in response to the touch event.


Example 4 includes the apparatus of example 1 or 2, wherein a size of the portion of the display frame is to be based on an amount of force associated with the touch event.


Example 5 includes the apparatus of example 1, wherein the display region analyzer is to determine one or more of the audio output or the haptic feedback response in response to execution of a neural network model.


Example 6 include the apparatus of example 1, wherein the display region analyzer is to execute a first neural network to identify content in the display frame image data as including the text and further including a text recognizer to execute a second neural network model to recognize the text and determine the audio output corresponding to the recognized text.


Example 7 includes the apparatus of examples 1 or 2, wherein the display region analyzer is to identify the text and the graphics in the display frame image data, the audio controller interface is to transmit the instruction including the audio output to be output and the haptic feedback control interface is to transmit the instruction including the haptic feedback response to be output in response to the identification of the text and the graphics.


Example 8 includes an electronic user device including a display screen; one or more sensors associated with the display screen; at least one processor to generate touch position data indicative of a position of a touch event on the display screen in response to one or more signals from the one or more sensors and sample a portion of a display frame in response to the touch event to generate display region data, the display frame to be displayed via the display frame; a system-on-chip to: operate independently of an operating system; determine content in the display region data, the content including one or more of text or a non-textual character; and generate one or more of an audio response or a haptic feedback response based on the content; an audio controller to transmit the audio response to a first output device; and a haptic feedback controller to transmit the haptic feedback response to a second output device.


Example 9 includes the electronic user device of example 8, wherein the at least one processor includes a touch controller and a display controller.


Example 10 includes the electronic user device of example 8, wherein the at least one processor is to sample the portion of the display frame based on the touch position data.


Example 11 includes the electronic user device of any of example 8-10, wherein the system-on-chip is to execute one or more neural network models to analyze the content in the display region data.


Example 12 includes the electronic user device of any of examples 8-10, wherein the content includes text and the system-on-chip is to generate a word corresponding to the text, the audio response including the word.


Example 13 includes the electronic user device of examples 8-10, wherein the content includes the non-textual character and the system-on-chip is to identify the non-textual character based on a change in luminance in the portion of the display frame.


Example 14 includes the electronic user device of any of examples 8-10, wherein a size of the portion of the display frame sampled by the at least one processor is to be based on an amount of force associated with the touch event.


Example 15 includes the electronic user device of any of examples 8-10, wherein the portion of the display frame is a first portion, the touch event is a first touch event, the display region data is first display region data, and the at least one processor is to generate second touch position data indicative of a position of a second touch event on the display screen, the position of the second touch event different than the position of the first touch event; and sample a second portion of the display frame in response to the second touch event to generate second display region data. The system-on-chip is to identify content in the second display region data and generate one or more of the audio response or the haptic feedback response based on the identified content in the second display region data.


Example 16 includes a system including means for displaying a display frame; means for detecting a location of a touch event on the means for displaying; means for sampling a portion of the display frame based on the location of the touch event; means for outputting audio; and means for identifying to execute a neural network model to recognize content in the portion of the display frame, the content including text; and in response to a recognition of the text in the portion of the display frame, cause output of an audio response via the audio output means.


Example 17 includes the system of example 16, wherein the content includes graphics and further including means for generating haptic feedback, the means for identifying to, in response to a prediction recognition of the graphics in the portion of the display frame, cause output of a haptic feedback response via the haptic feedback generating means.


Example 18 includes the system of examples 16 or 17, wherein the sampling means is to sample the portion of the display frame in response to an instruction from the detecting means.


Example 19 includes the system of examples 16 or 17, wherein the sampling means is to sample the portion of the display frame in response to an instruction from the identifying means.


Example 20 includes the system of example 16, wherein the neural network model includes a first neural network model and a second neural network model, the identifying means to execute the first neural network model to identify the content as including the text; and execute the second neural network model to generate an estimated of the text; and determine the audio response corresponding to the estimated text.


Example 21 includes the system of example 16, wherein the neural network model is to be generated using end-to-end training and the identifying means is to execute the neural network model to determine the audio response based on the text.


Example 22 includes the system of example 16, wherein the detecting means is to determine a force associated with the touch event and the sampling means is to sample the portion of the display frame having a size based on the force.


Example 23 includes at least one storage device comprising instructions that, when executed, cause a system-on-chip to at least recognize one or more of text or graphics in a region of a display frame, the region associated with a touch event on a display screen of an electronic device; cause, in response to the recognition of the text in the region, an audio output corresponding to the text to be output by the electronic device; and cause, in response to the recognition of the graphics in the region, a haptic feedback response to be output by the electronic device.


Example 24 includes the at least one storage device of example 23, wherein the instructions, when executed, cause the system-on-chip to cause a display controller to sample the display frame to generate display frame image data in response to the touch event, the display frame image data including the region.


Example 25 includes the at least one storage device of examples 23 or 24, wherein a size of the region is to be based on an amount of force associated with the touch event.


Example 26 includes the at least one storage device of examples 23 or 24, wherein the instructions, when executed, cause the system-on-chip to execute a neural network model to determine the audio output corresponding to the text.


Example 27 includes the at least one storage device of example 23, wherein the instructions, when executed, cause the system-on-chip to execute a first neural network model to identify content in the region as including the text and execute a second neural network model to recognize the text and determine the audio output corresponding to the recognized text.


Example 28 includes the at least one storage device of examples 23 or 24, wherein the instructions, when executed, cause the system-on-chip to recognize the text and the graphics in the region and cause the audio output and the haptic feedback response to be output.


Example 29 includes a method including identifying, by executing an instruction with at least one processor of a system-on-chip, one or more of text or graphics in display frame image data, the display frame image data corresponding to a portion of a display frame associated with a touch event on a display screen of an electronic device; causing, in response to identification of the text in the display frame image data and by executing an instruction with the at least one processor, an audio output corresponding to the text to be output by the electronic device; and causing, in response to the identification of the graphics in the display frame image data and by executing an instruction with the at least one processor, a haptic feedback response to be output by the electronic device.


Example 30 includes the method of example 29, further including causing a display controller to generate the display frame image data in response to the touch event.


Example 31 includes the method of examples 29 or 30, wherein a size of the portion is to be based on an amount of force associated with the touch event.


Example 32 includes the method of examples 29 or 30, further including executing a neural network model to identify the text.


Example 33 includes the method of example 29, further including executing a first neural network model to identify content in the display frame image data as including the text and executing a second neural network model to recognize the text and determine the audio output corresponding to the recognized text.


Example 34 includes the method of example 29, further including identifying the text and the graphics in the display frame image data and causing the audio output and the haptic feedback response to be output in response to the identification of the text and the graphics.


Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.


The following claims are hereby incorporated into this Detailed Description by this reference, with each claim standing on its own as a separate embodiment of the present disclosure.

Claims
  • 1. An apparatus comprising: a display region analyzer to identify one or more of text or graphics in display frame image data, the display frame image data corresponding to a portion of a display frame associated with a touch event on a display screen of an electronic device;an audio controller interface to transmit, in response to the identification of the text in the display frame image data, an instruction including audio output corresponding to the text to be output by the electronic device; anda haptic feedback controller interface to transmit, in response to the identification of the graphics in the display frame image data, an instruction including a haptic feedback response to be output by the electronic device.
  • 2. The apparatus of claim 1, wherein the display region analyzer is to execute a neural network model to identify the one or more of the text or graphics.
  • 3. The apparatus of claim 1, further including a display controller manager to cause a display controller to generate the display frame image data in response to the touch event.
  • 4. The apparatus of claim 1, wherein a size of the portion of the display frame is to be based on an amount of force associated with the touch event.
  • 5. The apparatus of claim 1, wherein the display region analyzer is to determine one or more of the audio output or the haptic feedback response in response to execution of a neural network model.
  • 6. The apparatus of claim 1, wherein the display region analyzer is to execute a first neural network to identify content in the display frame image data as including the text and further including a text recognizer to execute a second neural network model to: recognize the text; anddetermine the audio output corresponding to the recognized text.
  • 7. The apparatus of claim 1, wherein the display region analyzer is to identify the text and the graphics in the display frame image data, the audio controller interface to transmit the instruction including the audio output to be output and the haptic feedback control interface to transmit the instruction including the haptic feedback response to be output in response to the identification of the text and the graphics.
  • 8. An electronic user device comprising: a display screen;one or more sensors associated with the display screen;at least one processor to: generate touch position data indicative of a position of a touch event on the display screen in response to one or more signals from the one or more sensors; andsample a portion of a display frame in response to the touch event to generate display region data, the display frame to be displayed via the display frame;a system-on-chip to: operate independently of an operating system;determine content in the display region data, the content including one or more of text or a non-textual character; andgenerate one or more of an audio response or a haptic feedback response based on the content;an audio controller to transmit the audio response to a first output device; anda haptic feedback controller to transmit the haptic feedback response to a second output device.
  • 9. The electronic user device of claim 8, wherein the at least one processor includes a touch controller and a display controller.
  • 10. The electronic user device of claim 8, wherein the at least one processor is to sample the portion of the display frame based on the touch position data.
  • 11. The electronic user device of claim 8, wherein the system-on-chip is to execute one or more neural network models to analyze the content in the display region data.
  • 12. The electronic user device of claim 8, wherein the content includes text and the system-on-chip is to generate a word corresponding to the text, the audio response including the word.
  • 13. The electronic user device of claim 8, wherein the content includes the non-textual character and the system-on-chip is to identify the non-textual character based on a change in luminance in the portion of the display frame.
  • 14. The electronic user device of claim 8, wherein a size of the portion of the display frame sampled by the at least one processor is to be based on an amount of force associated with the touch event.
  • 15. The electronic user device of claim 8, wherein the portion of the display frame is a first portion, the touch event is a first touch event, the display region data is first display region data, and the at least one processor is to: generate second touch position data indicative of a position of a second touch event on the display screen, the position of the second touch event different than the position of the first touch event; andsample a second portion of the display frame in response to the second touch event to generate second display region data,the system-on-chip to identify content in the second display region data and generate one or more of the audio response or the haptic feedback response based on the identified content in the second display region data.
  • 16. A system comprising: means for displaying a display frame;means for detecting a location of a touch event on the means for displaying;means for sampling a portion of the display frame based on the location of the touch event;means for outputting audio; andmeans for identifying to: execute a neural network model to recognize content in the portion of the display frame, the content including text; andin response to a recognition of the text in the portion of the display frame, cause output of an audio response via the audio output means.
  • 17. The system of claim 16, wherein the content includes graphics and further including means for generating haptic feedback, the means for identifying to, in response to a prediction recognition of the graphics in the portion of the display frame, cause output of a haptic feedback response via the haptic feedback generating means.
  • 18. (canceled)
  • 19. (canceled)
  • 20. The system of claim 16, wherein the neural network model includes a first neural network model and a second neural network model, the identifying means to: execute the first neural network model to identify the content as including the text; andexecute the second neural network model to: generate an estimated of the text; anddetermine the audio response corresponding to the estimated text.
  • 21. The system of claim 16, wherein the neural network model is to be generated using end-to-end training and the identifying means is to execute the neural network model to determine the audio response based on the text.
  • 22. The system of claim 16, wherein the detecting means is to determine a force associated with the touch event and the sampling means is to sample the portion of the display frame having a size based on the force.
  • 23.-34. (canceled)
RELATED APPLICATIONS

This patent claims priority under 35 U.S.C. § 119 to U.S. Provisional Patent Application No. 63/105,021, filed on Oct. 23, 2020, and to U.S. Provisional Patent Application No. 63/105,025, filed on Oct. 23, 2020. U.S. Provisional Patent Application No. 63/105,021 and U.S. Provisional Patent Application No. 63/105,025 are hereby incorporated by reference in their entries. Priority to U.S. Provisional Patent Application No. 63/105,021 and U.S. Provisional Patent Application No. 63/105,025 is hereby claimed.

Provisional Applications (2)
Number Date Country
63105025 Oct 2020 US
63105021 Oct 2020 US