This disclosure relates generally to electronic computing devices and, more particularly, to systems, methods, and apparatus for providing accessible user interfaces.
An electronic user device can include user accessibility features to facilitate ease of access for users who are visually impaired, hearing impaired, neurologically impaired, and/or motor impaired when interacting with the device. Some user accessibility features include peripheral devices such as a Braille display for visually impaired users.
The figures are not to scale. Instead, the thickness of the layers or regions may be enlarged in the drawings. In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts.
Unless specifically stated otherwise, descriptors such as “first,” “second,” “third,” etc. are used herein without imputing or otherwise indicating any meaning of priority, physical order, arrangement in a list, and/or ordering in any way, but are merely used as labels and/or arbitrary names to distinguish elements for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for identifying those elements distinctly that might, for example, otherwise share a same name.
An electronic user device can include user accessibility features to facilitate ease of access for users who are visually impaired, hearing impaired, neurologically impaired, and/or motor impaired when interacting with the device. Some user accessibility features are provided by an operating system of the user device and/or user applications installed on the user device to increase ease of interaction of, for instance, a visually impaired user with the device. Such user accessibility features can include adjustable sizing of icons, font, or cursors; screen contrast options; magnifiers; and/or keyboard shortcuts. Some user devices provide hardware support for peripheral Braille displays that translate text in a user interface displayed by the user device into Braille, which can be read by the user at the peripheral Braille display.
Although known user accessibility features can facilitate interactions by a visually impaired user with the user device, such user accessibility features are limited with respect to an amount of accessibility provided. For instance, known user accessibility features are typically associated with an operating system and, thus, may not be available with third-party applications installed on the device. Therefore, user accessibility features such as increased font sizing may not be compatible with all applications installed on the user device. Further, user accessibility features that are provided by an operating system are not available if, for instance, the user device is in a pre-operating system boot mode, such as a Basic Input Output System (BIOS) mode because the operating system is not running when the user device is in the preboot BIOS mode. As such, the user accessibility features are not available to a user who wishes to change a change BIOS setting, perform troubleshooting of the device in BIOS mode, etc. Additionally, peripheral Braille displays are costly add-on devices that are limited to translating text into Braille, but do not provide information as to, for instance, graphical or non-text content displayed.
Disclosed herein are example systems, apparatus, and methods that provide for audio and/or haptic feedback representation(s) of content in display frame(s) (e.g., graphical user interface(s)) displayed via a display screen of an electronic user device in response to a touch event by the user on the display screen. The touch event can include a touch by a user's finger(s) and/or by an input device such as a stylus.
Examples disclosed herein sample or capture a portion of a display frame associated with or corresponding to the location of the touch event on the display screen. In examples disclosed herein, a display region analyzer executes neural network model(s) to identify content such as text and/or non-textual or graphics (e.g., shapes, icons, border lines of a menu windows, non-text character(s), etc.) in the sampled portion of the display frame. In examples in which text is identified in the display frame at or near the location of the user's touch, the neural network model(s) recognize or predict the text. The predicted text is converted to audio waveforms (e.g., using text to speech synthesis) and output as audio data by speakers of the user device and/or peripheral audio devices (e.g., Bluetooth® headphones). Thus, examples disclosed herein provide a visually impaired user with an audio stream or read-out of text on the display screen in response to the touch event(s) on the display screen. In examples in which graphical or non-textual elements such as shapes and/or icons are identified in display frame content associated with a touch event, haptic feedback output(s) (e.g., vibrations) can be generated to provide the user with a sense of feedback in response to the touch. For example, haptic feedback can be provided when a user touches the display screen proximate to a border or line defining a user application window or menu to alert the user that the user's finger is near an edge of the window or menu and to orient the user relative to the display frame. Thus, examples disclosed herein inform a visually, neurologically, and/or motor impaired user of textual and/or graphical information displayed on the display screen.
In some examples disclosed herein, the display region analyzer is implemented by a system-on-chip of the user device. This implementation enables the display region analyzer to analyze display frame(s) and determine the corresponding audio and/or haptic output(s) independent of the operating system of the user device. For example, the system-on-chip architecture enables the display region analyzer to operate when the user device is in BIOS mode before the operating system has been loaded. Thus, examples disclosed herein provide users with accessibility features that are not limited to certain applications or operating systems and, therefore, provide a more complete accessibility experience when interacting with the device.
The example user device 102 of
The example user device 102 of
The processor 110 of the illustrated example is a semiconductor-based hardware logic device. The hardware processor 110 may implement a central processing unit (CPU) of the user device 102, may include any number of cores, and may be implemented, for example, by a processing commercially available from Intel® Corporation. The processor 110 executes machine readable instructions (e.g., software) including, for example, an operating system 112 and/or other user application(s) 113 installed on the user device 102, to interpret and output response(s) based on the user input event(s) (e.g., touch event(s), keyboard input(s), etc.). The example user device 102 includes a Basic Input/Output System (BIOS) 114, which may be implemented by firmware that provides for initialization of hardware of the user device 102 during start-up of the user device 102 prior to loading of the operating system software 112. The operating system 112, the user application(s) 113, and the BIOS 114 are stored in one or more storage devices 115. The user device 102 of
A display controller 120 (e.g., a graphics processing unit (GPU)) of the example user device 102 of
The example user device 102 includes one or more output devices 117 such as speakers 121 to provide audible outputs to a user. The example user device 102 includes an audio controller 122 to control operation of the speaker(s) 121 and facilitate rendering of audio content via the speaker(s) 121. In some examples, the audio controller 122 is implemented by the processor 110. In some examples, the audio controller 122 is implemented by the SoC 128 (e.g., by the microcontroller 130 of the SoC 128). In other examples, the audio controller 122 is implemented by stand-alone circuitry in communication with one or more of the processor 110 and/or the SoC 128.
The example user device 102 of
Although shown as one device 102, any or all of the components of the user device 102 may be in separate housings and, thus, the user device 102 may be implemented as a collection of two or more user devices. In other words, the user device 102 may include more than one physical housing. For example, the logic circuitry (e.g., the SoC 128 and the processor 110) along with support devices such as the one or more storage devices 115, a power supply 116, etc. may be a first user device contained in a first housing of, for example, a desktop computer, and the display screen 104, the touch sensor(s) 106, and the haptic feedback actuator(s) 123 may be contained in a second housing separate from the first housing. The second housing may be, for example, a display housing. Similarly, the user input device(s) 107 (e.g., microphone(s) 119, camera(s), keyboard(s), touchpad(s), mouse, etc.) and/or the output device(s) (e.g., the speaker(s) 121 and/or the haptic feedback actuator(s) 123) may be carried by the first housing, by the second housing, and/or by any other number of additional housings. Thus, although
In the example of
In the example of
In the example of
In some examples, one or more components of the display region analyzer 126 is implemented by a neural network accelerator 132 (e.g., of the SoC 128) to facilitate neural network processing performed by the display region analyzer 126 when analyzing the display frame(s). The neural network accelerator 132 can be implemented by, for example, an accelerator such as the Intel® Gaussian & Neural Accelerator (GNA)) or an ANNA (an autonomous neural network accelerator that can be an extension to the GNA), among others. The neural network accelerator 132 can be implemented by dedicated logic circuitry or by a processor such as a microcontroller executing instructions on the SoC 128. In some examples, the display region analyzer 126 and the neural network accelerator 132 are implemented by the same microcontroller of the SoC 128. In some examples, one or more components of the neural network accelerator 132 is implemented by the microcontroller 130 of
Although in the example of
In the example of
In other examples, the audio controller 122 and/or the haptic feedback controller 124 are one or more components separate from the SoC 128 and separate from the processor 110. As such, the SoC 128 may communicate with the audio controller 122 and/or the haptic feedback controller 124 without involving the processor 110. Similarly, the processor 110 may communicate with the audio controller 122 and/or the haptic feedback controller 124 without the involvement of the SoC 128. In some examples, the SoC 128 communicates with the audio controller 122 and/or the haptic feedback controller 124 at least (e.g., only) prior to loading of the operating system 112 and the processor 110 communicates with the audio controller 122 and/or the haptic feedback controller 124 at least (e.g., only) after the loading of the operating system 112.
In the example of
The touch controller 108 transmits the touch coordinate data 200 to the display region analyzer 126. In some examples, the display region analyzer 126 receives the touch position data 200 from the touch controller 108 in substantially real-time (as used herein “substantially real time” refers to occurrence in a near instantaneous manner recognizing there may be real world delays for computing time, transmission, etc.). In other examples, the display region analyzer 126 receives the touch position data 200 at a later time (e.g., periodically and/or aperiodically based on one or more settings but sometime after the activity that caused the signal data to be generated, such as a user touching the display screen 104 of the device 102, has occurred (e.g., seconds later)).
In some examples, the touch controller 108 also transmits the touch position data 200 to the display controller 120 to alert the display controller 120 to the touch event. In some instances, the display controller 120 receives the touch position data 200 from the touch controller 108 in substantially real-time. In other examples, the display controller 120 receives the touch position data 200 at a later time (e.g., periodically and/or aperiodically based on one or more settings but sometime after the activity that caused the signal data to be generated, such as a user touching the display screen 104 of the device 102, has occurred (e.g., seconds later)).
However, in other examples, the touch controller 108 only sends the touch position data 200 to the display region analyzer 126 and the display region analyzer 126 generates instructions to alert the display controller 120 of the touch event in response to receipt of the touch position data 200.
In response to notification of a touch event from the display region analyzer 126 and/or the touch controller 108, the display controller 120 identifies and saves the display frame rendered at the time of the touch event. As shown in
The display region sampler 201 can identify the location of the touch event relative to the display frame based on the touch position data 200 from the touch controller 108 and/or the instructions from the display region analyzer 126, which can include the location of the touch event. The boundaries that define a size or resolution of the region of the display frame that is sampled by the display region sampler 201 can be defined by one or more variables. For example, the size of the captured display region can be defined by content located within a threshold distance of the coordinates corresponding to the location of the touch event on the display screen 104. In some examples, the size of the region or area of the user interface captured by the display region sampler 201 is based on an amount of pressure applied by the user's finger and/or a stylus on the screen 104 and detected by the display screen touch sensor(s) 106. The force data can be transmitted from the touch controller 108 to the display region analyzer 126, which generates instructions for the display region sampler 201 regarding the size of the display region to sample. The size of the screen region captured by the display region sampler 201 can be proportional to the amount of pressure applied (e.g., the greater the pressure associated with the touch, the larger the size of the user interface sampled by the display region sampler 201).
The display region sampler 201 can (e.g., automatically) sample the display frame(s) periodically (e.g., several times a second) in response to, for instance, changes in the location of the user's touch as detected by the touch controller 108 and/or the display region analyzer 126. As such, the display region data 202 generated by the display region sampler 201 can reflect movement of, for instance, the user's finger across a region of the display screen 104. In other examples, the display region sampler 201 generates the display region data in response to requests from the display region analyzer 126 (e.g., on-demand).
The display region sampler 201 of the example display controller 120 of
The display region analyzer 126 analyzes the touch position data 200 and the display region data 202 to identify the content in the sampled display region and to determine output(s) (e.g., audio output(s) and/or a haptic feedback output(s)) that inform the user as to the content displayed. In some examples, the display region analyzer 126 determines that no output should be provided (e.g., in instances when the display region analyzer 126 determines that the user is touching a portion of the display screen that is displaying neither textual content nor non-textual content such as an icon).
The display region analyzer 126 analyzes the touch position data 200 and the display region data 202 to correlate the location(s) of the touch event(s) with content in the sampled display region(s). As disclosed herein, the display region analyzer 126 implements neural network model(s) to identify user interface content proximate to the location of the touch event on the display screen 104 as either corresponding to a word (i.e., text), a non-textual or graphical element (e.g., a shape such as a line of a menu box, an icon, etc.), or content that does not include a word or graphics (e.g., a blank portion of a word document, a space between words, a space between two lines of text) and to generate the output(s) representative of the content.
If the display region analyzer 126 recognizes a word in text in the sampled display region, the display region analyzer 126 generates audio output data 204 (e.g., audio waveforms) that includes the identified word to be output in audio form. The audio output data 204 is transmitted to the audio controller 122. The audio controller 122 causes the audio output data 204 to be output by the speaker(s) 121 of the device 102 (
If the display region analyzer 126 determines that the location of the touch event on the display screen 104 corresponds to non-textual or graphical content such as a line of a window or menu box, the display region analyzer 126 generates haptic feedback output data 206 to alert the user that, for instance, the user's finger has touched or moved over a border line of a menu box or an application window. The haptic feedback output data 206 includes instructions regarding haptic feedback (e.g., vibrations) to be generated by the haptic feedback actuator(s) 123 of the device 102 (
In some examples, the display region analyzer 126 determines that the graphical element includes text (e.g., an icon with a title of the user application represented by the icon). In some such examples, the display region analyzer 126 can generate an audio output and haptic feedback output such that both outputs are produced by the device 102.
As disclosed herein, the audio controller 122 and/or the haptic feedback controller 124 may be implemented by the processor 110. The display region analyzer 126 (e.g., the microcontroller 130 of the SoC 128) may, thus, output requests to the processor 110 to cause the audio controller 122 and/or the haptic feedback controller 124 to take the actions described herein. In some examples, the audio controller 122 and/or the haptic feedback controller 124 are implemented by the BIOS 114 (the basic input output system which controls communications with input/output devices). In such examples, the SoC 128 communicates with the audio controller 122 and/or the haptic feedback controller 124 by sending requests to the processor 110 that implements the audio controller 122 and/or the haptic feedback controller 124.
In some examples, the display region analyzer 126 determines, based on the display region data 202 and the neural network model(s), that the location of the touch event on the display screen 104 corresponds content in the display frame that does not include a word or graphics (e.g., a blank or empty portion of a window, a location that is between two words). In other examples, the display region analyzer 126 identifies a portion or fragment of a word in the sampled display region. In some such examples, the display region analyzer 126 determines that no audio output data 204 and/or haptic feedback output data 206 should be generated. As such, the display region analyzer 126 prevents outputs that would result in audio corresponding to a nonsensical word (e.g., in examples where the user's touch is located between two words) or would inaccurately represent what is on the display screen (e.g., in examples where the user has touched an empty portion of a window). However, in other examples, an empty portion of a window can prompt a haptic feedback output to alert the user that the user's touch has moved away from text.
The example display region analyzer 126 of
As discussed above, in response to the touch event, the display region sampler 201 of the display controller 120 of
The display region analyzer 126 of
As disclosed in connection with
The example display region analyzer 126 includes a touch event analyzer 402. The touch event analyzer 402 provides means for analyzing the touch position data 200 to verify that the touch event is a touch event intended for the display region analyzer 126 to cause the display region analyzer 126 to analyze the display frame content and not, instead, a gesture or touch event associated with another function of the user device 102 and/or user application(s) installed thereon. The touch event analyzer 402 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130 executing block 804 of the flowchart of
For example, based on the touch position data 200, the touch event analyzer 402 may determine that the user has performed a gesture intended for, for instance, the operating system 112 (e.g., a single tap on the display screen 104, a double tap on the display screen 104). In some examples, the touch event analyzer 402 recognizes that the gesture is associated with another user interface function (e.g., selection of menu item) based on touch event rule(s) 406 stored in the database 400. As a result, the touch event analyzer 402 determines that the touch event is not a touch event intended for the display region analyzer 126. If the touch event analyzer 402 determines that the touch event is not an intended touch event for the display region analyzer 126, the touch event analyzer 402 can instruct the display region analyzer 126 to refrain from activating the display region sampler 201 of the display controller 120 and/or refrain from analyzing display frame(s)). Conversely, if, based on the touch position data 200, the touch event analyzer 402 determines that, for example, the user is moving his or her finger across the display screen (e.g., as if underlining words while reading), the touch event analyzer 402 determines that the touch event is an intended touch event for the display region analyzer 126 to generate output(s) representative of the user content.
In some examples, the touch controller 108 transmits force data 408 to the display region analyzer 126. The force data 408 can be generated by the display screen touch sensor(s) 106 (e.g., resistive force sensor(s), capacitive force sensor(s), piezoelectric force sensor(s) and can indicate an amount or force or pressure associated with the touch event on the display screen 104. In such examples, the touch event analyzer 402 can determine that the touch event is intended for the display region analyzer 126 if the force data 408 associated with the touch event exceeds a force threshold as defined by the touch event rule(s) 406 (e.g., the touch event is indicative of a hard press by the user's finger or a stylus on the screen 104).
As disclosed herein, the display region sampler 201 a size of the display frame sampled can be based on an amount of pressure or force associated with the touch event (e.g., the display region sampler 201 samples a larger area of the display frame in response to increased force associated with the touch event). The touch event analyzer 402 can determine that the touch event is intended for the display region analyzer 126 based on a size of the display region captured by the display region sampler 201.
Thus, the touch event rule(s) 406 can include touch gesture(s) associated with the operating system 112 and/or other user application(s) 113 on the user device 102 and/or control function(s) (e.g., a double tap gesture to select, a pinch gesture to zoom in) that, when identified by the touch event analyzer 402, cause the display region analyzer 126 to refrain from interpreting the display content to prevent interference with output(s) by the user application(s). The touch event rule(s) 406 can also include threshold touch pressure(s) or force(s) and/or touch gesture(s) that indicate that the user is requesting information about the content displayed on the screen 104 and, thus, should trigger analysis of the display frame content by the display region analyzer 126.
The display region analyzer 126 of
The display region analyzer 126 includes a display controller manager 403. The display controller manager 403 provides means instructing the display region sampler 201 of the display controller 120 to capture region(s) of display frame(s). In some examples, the display controller manager 403 activates or instructs the display region sampler 201 of the display controller 120 to capture region(s) of display frame(s) (e.g., the display frame 300 of
As disclosed above in connection with
The example display region analyzer 126 include a touch location mapper 410. The touch location mapper 410 provides means for mapping the location of the touch event (i.e., a verified touch event as determined by the touch event analyzer 402) as indicated by the touch position data 200 relative to the display region data 202 received from the display controller 120. In some examples, the touch location mapper 410 correlates or synchronizes the touch position data stream 200 with the display region data stream 202 based on time-stamps associated with the touch position data 200 and the display region data 202. The touch location mapper 410 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130 executing block 806 of the flowchart of
The example display region analyzer 126 of
In the example of
In some examples, a neural network model to be executed by the display region content recognizer 412 can be generated using end-to-end training of a neural network such that, for each sampled display region in the display region data 202 provided as an input to the display region content recognizer 412, the display region content recognizer 412 generates the audio output and/or haptic feedback output to be provided.
In other examples, two or more neural network models are used. For instance, the display region content recognizer 412 can execute a first neural network model to determine a type of content in the sample display frame (e.g., text, non-text characters, etc.). If the display region content recognizer 412 identifies text in the sampled display region as a result of the first neural network model, a text recognizer 414 of the example display region analyzer 126 can identify (e.g., predict, estimate, classify, recognize) the text using a second neural network model. If the display region content recognizer 412 identifies graphical element(s) in the sampled display region as a result of the first neural network model, the display region content recognizer 412 can separately determine the haptic feedback to be generated based on haptic feedback rule(s) 452, as disclosed herein.
Artificial intelligence (AI), including machine learning (ML), deep learning (DL), and/or other artificial machine-driven logic, enables machines (e.g., computers, logic circuits, etc.) to use a model to process input data to generate an output based on patterns and/or associations previously learned by the model via a training process. For instance, the model may be trained with data to recognize patterns and/or associations and follow such patterns and/or associations when processing input data such that other input(s) result in output(s) consistent with the recognized patterns and/or associations.
Many different types of machine learning models and/or machine learning architectures exist. In examples disclosed herein, a convolutional neural network model is used. Using a convolutional neural network model enables rotation invariant and scale robust classification. In general, machine learning models/architectures that are suitable to use in the example approaches disclosed herein will be feed forward neural networks. However, other types of machine learning models could additionally or alternatively be used such as recurrent neural networks, graph neural networks, generative adversarial networks, etc.
In general, implementing a ML/AI system involves two phases, a learning/training phase and an inference phase. In the learning/training phase, a training algorithm is used to train a model to operate in accordance with patterns and/or associations based on, for example, training data. In general, the model includes internal parameters that guide how input data is transformed into output data, such as through a series of nodes and connections within the model to transform input data into output data. Additionally, hyperparameters are used as part of the training process to control how the learning is performed (e.g., a learning rate, a number of layers to be used in the machine learning model, etc.). Hyperparameters are defined to be training parameters that are determined prior to initiating the training process.
Different types of training may be performed based on the type of ML/AI model and/or the expected output. For example, supervised training uses inputs and corresponding expected (e.g., labeled) outputs to select parameters (e.g., by iterating over combinations of select parameters) for the ML/AI model that reduce model error. As used herein, labelling refers to an expected output of the machine learning model (e.g., a classification, an expected output value, etc.). Alternatively, unsupervised training (e.g., used in deep learning, a subset of machine learning, etc.) involves inferring patterns from inputs to select parameters for the ML/AI model (e.g., without the benefit of expected (e.g., labeled) outputs).
In examples disclosed herein, ML/AI models are trained using stochastic gradient descent. However, any other training algorithm may additionally or alternatively be used. In examples disclosed herein, training is performed until a training set loss falls below a threshold and test/development set performance is acceptable. In examples disclosed herein, training is performed in advance on a server cluster. Training is performed using hyperparameters that control how the learning is performed (e.g., a learning rate, a number of layers to be used in the machine learning model, layer sizes, etc.). In examples disclosed herein, hyperparameters are selected by, for example, exhaustive search. In some examples re-training may be performed. Such re-training may be performed in response to dissatisfaction with model performance, change in model architecture(s), the availability of more and/or improved training data, etc.
Training is performed using training data. In examples disclosed herein, the training data originates from previously generated user interfaces that include content (e.g., text, graphical element(s), blank or empty portion(s) without text or graphical element(s)) in various positions within the user interface. Because supervised training is used, the training data is labeled. Labeling is applied to the training data by expert human labelers, but in some cases training data is labelled by design (e.g., display screen data may be associated with actual text fields of windows being displayed). In some examples, the training data is pre-processed using, for example, hand-written rules or outlier detection to eliminate undesired system outputs. In some examples, the training data is sub-divided into training, development, and test sets and split into mini-batches.
Once training is complete, the model(s) are deployed for use as an executable construct that processes an input and provides an output based on the network of nodes and connections defined in the model(s). The model(s) are stored at one or more databases (e.g., the databases 430, 446 of
Once trained, the deployed model may be operated in an inference phase to process data. In the inference phase, data to be analyzed (e.g., live data) is input to the model, and the model executes to create an output. This inference phase can be thought of as the AI “thinking” to generate the output based on what it learned from the training (e.g., by executing the model to apply the learned patterns and/or associations to the live data). In some examples, input data undergoes pre-processing before being used as an input to the machine learning model. Moreover, in some examples, the output data may undergo post-processing after it is generated by the AI model to transform the output into a useful result (e.g., a display of data, an instruction to be executed by a machine, etc.).
In some examples, output of the deployed model may be captured and provided as feedback. By analyzing the feedback, an accuracy of the deployed model can be determined. If the feedback indicates that the accuracy of the deployed model is less than a threshold or other criterion, training of an updated model can be triggered using the feedback and an updated training data set, hyperparameters, etc., to generate an updated, deployed model.
Referring to
The example first computing system 416 of
The example first computing system 416 of
In the example of
The first training data 424 (e.g., display frame(s)) labeled with content in the user interface that should prompt an audio output (e.g., text) and/or a haptic feedback output (e.g., graphical element(s) such as icon(s) or lines defining menu boxes). In some examples, the content is labeled to prompt both an audio output and a haptic feedback output (e.g., an icon that includes text, a border of a window that includes text). The first training data 424 is also labeled content that should not prompt an audio output and/or a haptic feedback output. For example, blank or empty portions of the user interface(s) that do not include text and/or graphical elements(s) may be labeled as content that should not prompt an output (e.g., classified as a “null output”). In some examples, portions or fragments of word(s), phrase(s) and/or spaces between words may be labeled as content that should not prompt an output (e.g., audio output) to prevent nonsensical output(s) (e.g., audio that does not correspond to an actual word). However, in some other examples, blank or empty portions of the user interface(s) that do not include text and/or graphical elements(s) may be labeled as content that is to cause a haptic feedback output to alert a user that the user's touch has moved away from text.
In some example, the training data can include words such as “file,” “save,” “open” in a user interface associated with a word processing application and words such as “boot” “setting” “configuration” in a user interface associated with the BIOS. In some examples, fragments of words are used in the training data 424 to train the neural network to identify the text in the user interface (e.g., to predict or estimate that the term “say” is most likely the word “save”). In other examples, fragments of words are used in the training data to train the neural network to refrain from attempting to identify a word if the corresponding output would be a nonsensical word (e.g., the phrase “ot set” in “boot settings” could be used to train the neural network to refrain from outputting a predicted word that would not correspond to an actual word).
In examples in which the neural network is trained using end-to-end training, the first training data 424 is also labeled with the audio output (e.g., text-to-speech) and/or haptic feedback output that is to be produced for the predicted content.
The first neural network trainer 420 trains the neural network implemented by the neural network processor 418 using the training data 424. Based on the type of content in the user interface(s) in the training data 424 and associated output response, the first neural network trainer 420 trains the neural network 418 to identify (e.g., predict, estimate, classify, recognize) the type of content in the display region data 202 generated by the display controller 120 and whether the content is to prompt audio and/or haptic feedback output(s). In some examples, the text recognition training is based on optical character recognition. In examples in which end-to-end training is used, the first neural network trainer 420 trains the neural network 418 to identify the content in the display region data 202 generated by the display controller 120 and to generate the corresponding audio and/or haptic feedback output(s).
A content recognition model 428 is generated as a result of the neural network training. The content recognition model 428 is stored in a database 430. The databases 426, 430 may be the same storage device or different storage devices.
The content recognition model 428 is executed by the display region content recognizer 412 of the display region analyzer 126 of
In some examples, the display region content recognizer 412 verifies the content identified in the sampled display region(s) (e.g., text, graphical element(s), blank portion(s)) relative to the location of the touch event as mapped by the touch location mapper 410 and/or based on the touch position data 200. For example, if the sampled display region includes text and a graphical element, multiple items of text (e.g., two words, two sentences, etc.), and/or text proximate to a blank portion, the display region content recognizer 412 uses the mapping of the coordinates of the touch event relative to the sampled display region to identify the content most closely associated with the location of the user's touch (i.e., the content located nearest to the coordinates of the touch event). Thus, the display region content recognizer 412 compares the location of the content identified in the sampled display region to the location of the touch event to improve an accuracy of the identification of the content associated with the touch event and, thus, the accuracy of the output(s).
Based on the neural network analysis of the display region data 202, the display region content recognizer 412 generates predicted content data 427 including content associated with the touch event (e.g., text, graphical element(s), or empty portion(s)) and the corresponding response to be provided (audio output, haptic feedback output, no output). The predicted content data 427 is stored in the database 400.
The display region content recognizer 412 continues to analyze the display region data 202 received from the display controller 120 in response to changes in the location(s) of the touch event(s) occurring on the display screen 104. In some examples, the display region content recognizer 412 analyzes the display region data 202 in substantially real-time as the display region data 202 is received from the display controller 120 to enable the display region analyzer 126 to provide audio and/or haptic feedback output(s) in substantially real-time as the user interacts with the display screen 104 of the user device 102 and the interface(s) display thereon.
As disclosed above, in examples in which the content recognition model 428 is generated based on end-to-end training, the display region content recognizer 412 generates audio and/or haptic feedback output(s) as a result of the execution of the content recognition model 428. For example, the display region content recognizer 412 can generate audio or speech sample(s) in response to the detection text. The example display region analyzer 126 includes an audio controller interface 415. For example, the audio controller interface 415 can be implemented by circuitry that connects the display region analyzer 126 to communication line(s) of the audio controller 122. The audio controller interface 415 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130. The audio controller interface 415 facilitates transmission the audio sample(s) to the audio controller 122.
In some examples in which the content recognition model 428 is generated using end-to-end training, the display region content recognizer 412 determines, using the content recognition model 428, that a user interface in the display region data 202 includes a graphical element (e.g., a border of a window, an icon, etc.). In such examples, the display region content recognizer 412 determines that a haptic feedback output should be provided (e.g., based on the content recognition model 428). The display region content recognizer 412 generates instructions for the haptic feedback controller 124 to cause the haptic feedback actuator(s) 123 of the user device 102 to generate haptic feedback in response to, for example, detection of text and/or graphical elements (e.g., icon(s)) in the display region data. The haptic feedback controller interface 450 of the display region analyzer 126 provides means for communicating with the haptic feedback controller 124 to cause the actuator(s) 123 (
Thus, as a result of end-to-end neural network training, the display region content recognizer 412 executes the content recognition model 428 to identify content in the sampled display region(s) and to generate corresponding output(s). However, in other examples, the display region analyzer 126 of
The example system 100 includes a second computing system 432 to train a neural network to identify or recognize word(s) and/or phrase(s) in image data representative of user interfaces (e.g., display region data). The example second computing system 432 includes a second neural network processor 434. In examples disclosed herein, the second neural network processor 434 implements a second neural network.
The example second computing system 432 of
The example second computing system 432 of
In the example of
The second neural network trainer 436 trains the neural network implemented by the neural network processor 434 using the training data 440. Based on the words in the training data 440, the second neural network trainer 436 trains the neural network 434 to recognize (e.g., predict, estimate, classify, recognize) the word(s) in the portion of the user interface in the display region data 202 associated with the touch event.
A text recognition model 444 is generated as a result of the neural network training. The text recognition model 444 is stored in a database 446. The databases 442, 446 may be the same storage device or different storage devices.
As discussed above, in examples in which end-to-end training is not used to generate the content recognition model 428, the display region content recognizer 412 executes the content recognition model 428 to identify the type of content in the display frame(s). In such examples, if text is identified by the display region content recognizer 412 as a result of execution of the content recognition model 428, then the text recognizer 414 of the display region analyzer 126 of
The example display region analyzer 126 includes a text-to-speech synthesizer 448 to convert the predicted written text data 447 identified by the text recognizer 414 to phonemic representation(s) and, subsequently, to audio waveforms that are transmitted to the audio controller 122 (e.g., the audio output data 204) via the audio controller interface 415. The text-to-speech synthesizer 448 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130 executing block 816 of the flowchart of
In examples in which end-to-end training is not used, the display region content recognizer 412 may determine, using the content recognition model 428, that a sampled display frame in the display region data 202 includes a graphical element (e.g., a border of a window, an icon, etc.). In such examples, display region content recognizer 412 determines the haptic feedback to be generated based on haptic feedback rule(s) 452 stored in the database 400. The haptic feedback rule(s) 452 can define settings for the haptic feedback (e.g., vibration type, vibration intensity) based on user input(s). In some examples, the haptic feedback rule(s) 452 are defined by user input(s) or preference(s) defining haptic feedback setting(s) for the user device 102. The haptic feedback controller interface 450 transmits instructions to cause the haptic feedback controller 124 to output haptic feedback via the actuator(s) 123 (
In some examples, display region content recognizer 412 identifies text and a graphical element in the user interface content at the location of the touch event. For example, an icon representing a user application may include text in addition to a graphical element. As another example, text may be included a header or border of a window. In such examples, audio and haptic feedback output(s) can be provided based on respective analyses performed by one or more of (a) the display region content recognizer 412 (when end-to-end neural network training is used) and/or (b) the display region content recognizer 412 and the text recognizer 414 (when end-to-end neural network training is not used). In other examples in which the graphical element includes text, the display region content recognizer 412 and/or the text recognizer 414 only generate an audio output indicative of the text in the graphical element and no haptic feedback is generated.
In some examples in which the display region includes graphical element (e.g., an icon), the display region content recognizer 412 may generate audio representative of the graphical element, including for graphical elements that do not include text. For instance, in response to detection of an icon illustrating a storage disk, the display region content recognizer 412 may determine that audio including the word “save” should be output to inform the user of the presence of this menu option in the display region and to provide more guidance to the user than would be provided by (only) haptic feedback. Thus, in some examples, the neural network can be trained to correlate graphical element(s) with audio output(s).
As disclosed above, in some examples, the display region content recognizer 412 determines that the sampled display region includes data that should not trigger an audio and/or haptic feedback output. Such data can include, for instance, portion(s) of the display region not associated with text or graphical element(s) (e.g., icon(s)). In such examples, if the display region content recognizer 412 determines that the location of the touch event is most closely associated with the blank or empty portion(s) of the user interface, then no outputs for that sampled display region. However, in other examples, a blank or empty portion(s) of the user interface(s) can prompt haptic feedback output(s) (depending on the labeled training data 424 and the training of the neural network 418) to alert the user that the location of the user's touch has moved away from text and/or other characters.
In some instances, the display region content recognizer 412 fails recognize the word(s) in the text despite execution of the content recognition model 428 (i.e., when end-to-end training of the model 428 is used). In such examples, the display region content recognizer 412 refrains from generating predicted text data for the unidentified or unrecognized word. Similarly, in some examples, the text recognizer 414 fails to recognize the word(s) in the text despite execution of the text recognition model 444. In such examples, the text recognizer 414 refrains from generating predicted text data for the unidentified or unrecognized word. Thus, in these examples, an audio output is not generated to avoid an incorrect or nonsensical word from being presented.
Thus, examples disclosed herein support different neural network schemes including (a) end-to-end training in which the display region content recognizer 412 executes the content recognition model 428 to identify the content in the sampled display frame(s) and to generate output(s) or (b) training of multiple neural network models in which text recognition and text-to-speech conversion are performed separately from the identification of the type of content in the sampled display frame(s).
Although some examples disclosed herein identify portion(s) of the sampled display region(s) in the display region data 202 as prompting haptic feedback output(s) using neural network model(s), in other examples, the determination of the haptic feedback output(s) can be based on image feature analysis. For instance, the example display region analyzer 126 can include an image brightness analyzer 454 to identify portion(s) of the sampled display region(s) associated with changes in luminance (e.g., brightness) or luminance contrasts of content in the sampled display region(s). For instance, a border of a window of a word processing application may be associated with a different color than the white color of the word document in the user interface. Based on this luminance contrast between the word document and the window border, the image brightness analyzer 454 determines that a haptic feedback output should be provided when the touch position data 200 indicates that a touch event has occurred proximate to the window border. Thus, in some examples, the haptic feedback output can be determined without use of a neural network and, instead, based on image feature analysis. The image brightness analyzer 454 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130.
The image brightness analyzer 454 can identify portion(s) of the user interface that should be associated with haptic feedback output(s) based on image brightness rule(s) 456 stored in the database 400. The image brightness rule(s) 456 can define differences or degrees of luminance contrasts (e.g., color contrast ratios) and associated haptic feedback output(s). The image brightness rule(s) 456 can be defined based on user input(s). The haptic feedback controller interface 450 transmits instruction(s) from the image brightness analyzer 454 to cause the haptic feedback controller 124 to output haptic feedback via the actuator(s) 123 (
In some examples, a user interacting with the user device 102 may provide verbal commands that are captured by the microphone(s) 119 of the user device 102 as microphone data 458 in addition to touch input(s). The display region analyzer 126 includes a voice command analyzer 460 to analyze the microphone data 458 received from the microphone(s) 119. The voice command analyzer 460 may be implemented by dedicated hardware circuitry and/or by the microcontroller 130. The voice command analyzer 460 interprets the voice command using speech-to-text analysis and compares the voice command to, for example, text identified by the display region content recognizer 412 and/or the text recognizer 414 in the sampled display region associated with the touch event. If the detected voice command corresponds to the text in the user interface (e.g., within a threshold degree of accuracy), the voice command analyzer 460 confirms that the user wishes to perform the action associated with the voice command and the text in the sampled display region. The voice command analyzer 460 can communicate the verified command to user application(s) associated with the display frame(s). In such examples, the touch event(s) can mimic or replace mouse or keyboard event(s) (e.g., left-click, right-click, escape) in response to the verified voice command.
In some examples, the voice command analyzer 460 cooperates with user application(s) to cause the touch and voice commands to be executed. For example, a user application may perform certain function in response to verbal and/or touch command(s). The voice command analyzer 460 can verify such known functions and associated command(s) based on the microphone data and touch event data. For example, if the voice command analyzer 460 detects the word “save” in the microphone data 458 and display region content recognizer 412 or the text recognizer 414 identifies the text as the word “save,” then the voice command analyzer 460 confirms that the user wishes to save an item such as a word document. The voice command analyzer 460 can send instructions to the user application with which the user is interacting to confirm that the voice command has been verified. Thus, in examples disclosed herein, the use of voice and touch can improve an accuracy with which users, such as visually impaired users, provide commands to user application(s) installed on the user device 102 and receive expected results.
In some examples, one or more of the display region content recognizer 412 and/or the text recognizer 414 provides the outputs of the neural network processing (i.e., the predicted content data 427 and/or the predicted text data 447) to the processor 110 of the user device for further processing by, for instance, the operating system and/or the user applications of the user device 102. The touch position data 200 can also be provided to the processor 110 for use by the operating system and/or user applications of the user device 102 to respond to user interactions with the device 102 in connection with the results of the neural network processing. For example, a user who is motor impaired may not be able to hold his or her hand steady enough to double tap in the same location on the display screen. In examples disclosed herein, the user can keep his or her hand resting on the display screen while the touch position data and/or identified content is provided to the user application. In such examples, the output of the user application and/or operating system can be the same as if the user has performed the double click function.
While an example manner of implementing the display region analyzer 126 of
While an example manner of implementing the first computing system 416 is illustrated in
While an example manner of implementing the second computing system 432 is illustrated in
A flowchart representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing the example display region sampler 201 of
The machine readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a compiled format, an executable format, a packaged format, etc. Machine readable instructions as described herein may be stored as data or a data structure (e.g., portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers) located at the same or different locations of a network or collection of networks (e.g., in the cloud, in edge devices, etc.). The machine readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, compilation, etc. in order to make them directly readable, interpretable, and/or executable by a computing device and/or other machine. For example, the machine readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and stored on separate computing devices, wherein the parts when decrypted, decompressed, and combined form a set of executable instructions that implement one or more functions that may together form a program such as that described herein.
In another example, the machine readable instructions may be stored in a state in which they may be read by processor circuitry, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc. in order to execute the instructions on a particular computing device or other device. In another example, the machine readable instructions may need to be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, machine readable media, as used herein, may include machine readable instructions and/or program(s) regardless of the particular format or state of the machine readable instructions and/or program(s) when stored or otherwise at rest or in transit.
The machine readable instructions described herein can be represented by any past, present, or future instruction language, scripting language, programming language, etc. For example, the machine readable instructions may be represented using any of the following languages: C, C++, Java, C #, Perl, Python, JavaScript, HyperText Markup Language (HTML), Structured Query Language (SQL), Swift, etc.
As mentioned above, the example processes of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, and (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
As used herein, singular references (e.g., “a”, “an”, “first”, “second”, etc.) do not exclude a plurality. The term “a” or “an” entity, as used herein, refers to one or more of that entity. The terms “a” (or “an”), “one or more”, and “at least one” can be used interchangeably herein. Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., a single unit or processor. Additionally, although individual features may be included in different examples or claims, these may possibly be combined, and the inclusion in different examples or claims does not imply that a combination of features is not feasible and/or advantageous.
In response to notification of the occurrence of the touch event(s), the display region sampler 201 of the display controller 120 samples or captures a portion of a display frame (e.g., the display frame 300) displayed via the display screen 104 at the time of the touch event to generate the display region data 202 (e.g., image data) (block 504). The portion of the display frame sampled by the display region sampler 201 can include content at displayed at the coordinates of the user's touch and surrounding content (e.g., content located within a threshold distance of the touch event location). In some examples, the display frame(s) sampled by the display region sampler 201 include display frame(s) associated with the operating system 112 and/or user applications 113 on the device 102. In other examples, the display frame(s) sampled by the display region sampler 201 include display frame(s) associated with the BIOS 114 of the user device 102.
The display region sampler 201 transmits the display region data 202 to the display region analyzer 126 via, for example, wired or wireless communication protocols (block 506). If additional touch position data 200 is received from the touch controller 108 and/or additional instruction(s) are received from the display controller manager 403 (e.g., in response to newly detected touch events by the display screen touch sensor(s) 106), the display region sampler 201 continues to sample the display frame(s) associated with the touch event(s) to output display region data 202 for analysis by the display region analyzer 126 (block 508). The example instructions 500 of
The example instructions 600 of
The example training controller 422 labels the display image data (or portions thereof) with content in the display frame(s) that should prompt an audio output (e.g., text) and/or a haptic feedback output (e.g., graphical element(s) such as icon(s) or lines defining menu boxes) (block 603). The first training data 424 is also labeled content that should not prompt an audio output and/or a haptic feedback output (e.g., fragments or portions or word(s) and/or phrase(s), blank or empty portions of the user interface(s) that do not include text and/or graphical elements(s) may be labeled as content that should not prompt an output.
In examples in which end-to-end neural network training is used (block 604), the first training data 424 is labeled with the output(s) that should be generated (e.g., audio speech sample(s) for text, haptic feedback output(s) for non-text character(s)) (block 605).
The example training controller 422 generates the training data 424 based on the labeled image data (block 606).
The example training controller 422 instructs the neural network trainer 420 to perform training of the neural network 418 using the training data 424 (block 608). In the example of
The example instructions 700 of
The example training controller 438 labels the word(s) and/or phrase(s) in the listing(s) and/or image data to be used for training purposes (block 704). In some examples, the labeled content includes fragments or portions of words. The example training controller 422 generates the training data 440 based on the content in the labeled image data (block 706).
The example training controller 438 instructs the neural network trainer 436 to perform training of the neural network 434 using the training data 440 (block 708). In the example of
The example instructions 800 of
The touch location mapper 410 maps or synchronizes the touch position data 200 with the display region data 202 (block 806). In some examples, the touch location mapper 410 identifies the location of the touch event relative to a sampled display region in the display region data 202. In some examples, the touch location mapper 410 synchronizes the touch position data stream 200 and the display region data stream 202 using time stamps for each data stream.
The display region content recognizer 412 executes the content recognition model 428 to identify content in the sampled display region associated with the touch event and to determine corresponding response to be provided (e.g., audio output, haptic feedback output, not output) (block 808). As result of the execution of the content recognition model 428, the display region content recognizer 412 generates predicted content data 427 that identifies the content associated with the touch event (e.g., text, graphics, empty portion). In some examples, the display region content recognizer 412 verifies the content mostly closely associated with the touch event based on the mapping performed by the touch location mapper 410 and/or the touch position data 200 (e.g., in examples where the display region data includes, for instance, text and a graphical element).
In the example of
If the display region content recognizer 412 is able to identify the word(s) in the identified text (block 814), the display region content recognizer 412 generates audio speech sample(s) including the word(s) as a result of execution of content recognition model 428 (generated via end-to-end neural network training) (block 816). The audio controller interface 415 transmits the audio sample(s) to the audio controller 122 for output via the speaker(s) 121 of the device 102 (block 816).
Alternatively, if the text recognizer 414 is able to identify the word(s) (block 814), the text recognizer 414 generates the predicted text data 447 including the predicted word(s). The predicted text data 447 is used by the text-to-speech synthesizer 448 to convert the text to audio waveforms that are transmitted to the audio controller 122 for output via the speaker(s) 121 of the device 102 (block 816).
In the example of
In the example of
In some examples, the haptic feedback analysis at blocks 818 and 820 is performed by the image brightness analyzer 454 of the device 102. In such examples, the image brightness analyzer 454 analyzes properties of the user interface image data to identify changes in, for instance, luminance, which can serve as indicators of changes between user application windows, menus, etc.
In the example of
The display region analyzer 126 continues to analyze the display region data 202 as the data is received from the display controller 120 (e.g., in response to new touch event(s) detected by the touch controller and/or as part of periodic sampling of the user interface(s) presented via the display screen 104). Thus, the display region analyzer 126 can provide audio and/or haptic feedback output(s) that track user touch event(s) on the display screen relative to the displayed content. The example instructions 800 of
The processor platform 900 of the illustrated example includes a processor 912. The processor 912 of the illustrated example is hardware. For example, the processor 912 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example neural network processor 418, the example trainer 420, and the example training controller 422.
The processor 912 of the illustrated example includes a local memory 913 (e.g., a cache). The processor 912 of the illustrated example is in communication with a main memory including a volatile memory 914 and a non-volatile memory 916 via a bus 918. The volatile memory 914 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 916 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 914, 916 is controlled by a memory controller.
The processor platform 900 of the illustrated example also includes an interface circuit 920. The interface circuit 920 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
In the illustrated example, one or more input devices 922 are connected to the interface circuit 920. The input device(s) 922 permit(s) a user to enter data and/or commands into the processor 912. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
One or more output devices 924 are also connected to the interface circuit 920 of the illustrated example. The output devices 924 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. The interface circuit 920 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 920 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 926. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 900 of the illustrated example also includes one or more mass storage devices 928 for storing software and/or data. Examples of such mass storage devices 928 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.
The machine executable instructions 932 of
The processor platform 1000 of the illustrated example includes a processor 1012. The processor 1012 of the illustrated example is hardware. For example, the processor 1012 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example neural network processor 434, the example trainer 436, and the example training controller 438.
The processor 1012 of the illustrated example includes a local memory 1013 (e.g., a cache). The processor 1012 of the illustrated example is in communication with a main memory including a volatile memory 1014 and a non-volatile memory 1016 via a bus 1018. The volatile memory 1014 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 1016 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1014, 1016 is controlled by a memory controller.
The processor platform 1000 of the illustrated example also includes an interface circuit 1020. The interface circuit 1020 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
In the illustrated example, one or more input devices 1022 are connected to the interface circuit 1020. The input device(s) 1022 permit(s) a user to enter data and/or commands into the processor 1012. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
One or more output devices 1024 are also connected to the interface circuit 1020 of the illustrated example. The output devices 1024 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. The interface circuit 1020 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 1020 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 1026. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 1000 of the illustrated example also includes one or more mass storage devices 1028 for storing software and/or data. Examples of such mass storage devices 1028 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.
The machine executable instructions 1032 of
The processor platform 1100 of the illustrated example includes the system-on-chip (SoC) 128. In this example, the SoC 128 includes logic circuitry (e.g., an integrated circuit) encapsulated in a package such as a plastic housing. As disclosed herein, the SoC 128 implements the example display region analyzer 126 and the neural network accelerator 132. An example implementation of the SoC 128 is shown in
The processor platform 1100 of the illustrated example includes the processor 110. The processor 110 of the illustrated example is hardware (e.g., an integrated circuit). For example, the processor 110 can be implemented by one or more integrated circuits, logic circuits, central processing units, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In the example of
The processor 110 of the illustrated example includes a local memory 1113 (e.g., a cache). The processor 110 of the illustrated example is in communication with a main memory including a volatile memory 1114 and a non-volatile memory 1116 via the bus 118. The volatile memory 1114 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 1116 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1114, 1116 may be controlled by a memory controller.
The processor platform 1100 of the illustrated example also includes an interface circuit 1120. The interface circuit 1120 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
In the illustrated example, one or more input devices 1122 are connected to the interface circuit 1120. The input device(s) 1122 permit(s) a user to enter data and/or commands into the processor 110. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
One or more output devices 1124 are also connected to the interface circuit 1120 of the illustrated example. The output devices 1124 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc.), a tactile output device, a printer and/or speaker. The interface circuit 1120 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor.
The interface circuit 1120 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 1126. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc.
The processor platform 1100 of the illustrated example also includes one or more mass storage devices 1128 for storing software and/or data. Examples of such mass storage devices 1128 include floppy disk drives, hard drive disks, compact disk drives, Blu-ray disk drives, redundant array of independent disks (RAID) systems, and digital versatile disk (DVD) drives.
Machine executable instructions 1132 corresponding to the BIOS 114, the operating system 112, the user application(s) 113, and/or some or all of the instructions of
The SoC 128 includes the neural network accelerator 132. The neural network accelerator 132 is implemented by one or more integrated circuits, logic circuits, microprocessors, or controllers from any desired family or manufacturer. In this example, the neural network accelerator 132 executes the example display region content recognizer 412.
The SoC 128 of the illustrated example includes the processor 130. The processor 130 of the illustrated example is hardware. For example, the processor 130 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor 130 is implemented by a microcontroller. In this example, the microcontroller 130 implements the example touch controller interface 401, the example display controller interface 405, the example neural network accelerator interface 407, the example display controller manager 403, the example text-to-speech synthesizer 448, the example audio controller interface 415, the example haptic feedback controller interface 450, the example image brightness analyzer 454, and the example voice command analyzer 460. In this example, the microcontroller 130 executes the instructions of
The processor 130 of the illustrated example includes a local memory 1213 (e.g., a cache). The processor 130 of the illustrated example is in communication with a main memory including a volatile memory 1214 and a non-volatile memory 1216 via a local bus 1218. The volatile memory 1214 may be implemented by Synchronous Dynamic Random Access Memory (SDRAM), Dynamic Random Access Memory (DRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 1216 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 1214, 1216 is controlled by a memory controller.
The example processor platform of
The machine executable instructions 1232 of
The SoC 128 of
A block diagram illustrating an example software distribution platform 1305 to distribute software such as the example computer readable instructions 800 of
The example software distribution platform 1305 may be implemented by any computer server, data facility, cloud service, etc., capable of storing and transmitting software to other computing devices. The third parties may be customers of the entity owning and/or operating the software distribution platform. For example, the entity that owns and/or operates the software distribution platform may be a developer, a seller, and/or a licensor of software such as the example computer readable instructions 800 of
From the foregoing, it will be appreciated that example methods, systems, apparatus, and articles of manufacture have been disclosed that provide for enhanced user accessibility of an electronic user device for a visually, neurologically, and/or motor impaired user interacting with the device. Examples disclosed herein dynamically respond to touch events on a display screen of the device by generating image data corresponding to a portion of a display frame (e.g., graphical user interface) displayed on the display screen associated with the touch event. Examples disclosed herein execute neural network model(s) to identify content in the portion of the user interface associated with the touch event and to determine a response to be provided by the user device. Some examples disclosed herein generate audio outputs in response to recognition of text in the display frame to provide the user with an audio stream of words and/or phrases displayed on the screen as the user moves his or her finger relative to the screen. Additionally or alternatively, examples disclosed herein can provide haptic outputs that provide the user with feedback when, for instance, the user touch event is proximate to a graphical element such as an icon or line of a menu box. Thus, examples disclosed herein provide a visually impaired user with increased awareness of the content displayed on the screen in response to touches on the display screen. Moreover, examples disclosed herein can be implemented independent of an operating system of the device via, for instance, a system-on-chip architecture. As a result, example disclosed herein can provide user accessibility features in connection with different user applications, operating systems, and/or computing environments such as BIOS mode.
Example methods, apparatus, systems, and articles of manufacture to provide accessible user interfaces are disclosed herein. Further examples and combinations thereof include the following:
Example 1 includes an apparatus including a display region analyzer to identify one or more of text or graphics in display frame image data, the display frame image data corresponding to a portion of a display frame associated with a touch event on a display screen of an electronic device; an audio controller interface to transmit, in response to the identification of the text in the display frame image data, an instruction including audio output corresponding to the text to be output by the electronic device; and a haptic feedback controller interface to transmit, in response to the identification of the graphics in the display frame image data, an instruction including a haptic feedback response to be output by the electronic device.
Example 2 includes the apparatus of example 1, wherein the display region analyzer is to execute a neural network model to identify the one or more of the text or graphics.
Example 3 includes the apparatus of examples 1 or 2, further including a display controller manager to cause a display controller to generate the display frame image data in response to the touch event.
Example 4 includes the apparatus of example 1 or 2, wherein a size of the portion of the display frame is to be based on an amount of force associated with the touch event.
Example 5 includes the apparatus of example 1, wherein the display region analyzer is to determine one or more of the audio output or the haptic feedback response in response to execution of a neural network model.
Example 6 include the apparatus of example 1, wherein the display region analyzer is to execute a first neural network to identify content in the display frame image data as including the text and further including a text recognizer to execute a second neural network model to recognize the text and determine the audio output corresponding to the recognized text.
Example 7 includes the apparatus of examples 1 or 2, wherein the display region analyzer is to identify the text and the graphics in the display frame image data, the audio controller interface is to transmit the instruction including the audio output to be output and the haptic feedback control interface is to transmit the instruction including the haptic feedback response to be output in response to the identification of the text and the graphics.
Example 8 includes an electronic user device including a display screen; one or more sensors associated with the display screen; at least one processor to generate touch position data indicative of a position of a touch event on the display screen in response to one or more signals from the one or more sensors and sample a portion of a display frame in response to the touch event to generate display region data, the display frame to be displayed via the display frame; a system-on-chip to: operate independently of an operating system; determine content in the display region data, the content including one or more of text or a non-textual character; and generate one or more of an audio response or a haptic feedback response based on the content; an audio controller to transmit the audio response to a first output device; and a haptic feedback controller to transmit the haptic feedback response to a second output device.
Example 9 includes the electronic user device of example 8, wherein the at least one processor includes a touch controller and a display controller.
Example 10 includes the electronic user device of example 8, wherein the at least one processor is to sample the portion of the display frame based on the touch position data.
Example 11 includes the electronic user device of any of example 8-10, wherein the system-on-chip is to execute one or more neural network models to analyze the content in the display region data.
Example 12 includes the electronic user device of any of examples 8-10, wherein the content includes text and the system-on-chip is to generate a word corresponding to the text, the audio response including the word.
Example 13 includes the electronic user device of examples 8-10, wherein the content includes the non-textual character and the system-on-chip is to identify the non-textual character based on a change in luminance in the portion of the display frame.
Example 14 includes the electronic user device of any of examples 8-10, wherein a size of the portion of the display frame sampled by the at least one processor is to be based on an amount of force associated with the touch event.
Example 15 includes the electronic user device of any of examples 8-10, wherein the portion of the display frame is a first portion, the touch event is a first touch event, the display region data is first display region data, and the at least one processor is to generate second touch position data indicative of a position of a second touch event on the display screen, the position of the second touch event different than the position of the first touch event; and sample a second portion of the display frame in response to the second touch event to generate second display region data. The system-on-chip is to identify content in the second display region data and generate one or more of the audio response or the haptic feedback response based on the identified content in the second display region data.
Example 16 includes a system including means for displaying a display frame; means for detecting a location of a touch event on the means for displaying; means for sampling a portion of the display frame based on the location of the touch event; means for outputting audio; and means for identifying to execute a neural network model to recognize content in the portion of the display frame, the content including text; and in response to a recognition of the text in the portion of the display frame, cause output of an audio response via the audio output means.
Example 17 includes the system of example 16, wherein the content includes graphics and further including means for generating haptic feedback, the means for identifying to, in response to a prediction recognition of the graphics in the portion of the display frame, cause output of a haptic feedback response via the haptic feedback generating means.
Example 18 includes the system of examples 16 or 17, wherein the sampling means is to sample the portion of the display frame in response to an instruction from the detecting means.
Example 19 includes the system of examples 16 or 17, wherein the sampling means is to sample the portion of the display frame in response to an instruction from the identifying means.
Example 20 includes the system of example 16, wherein the neural network model includes a first neural network model and a second neural network model, the identifying means to execute the first neural network model to identify the content as including the text; and execute the second neural network model to generate an estimated of the text; and determine the audio response corresponding to the estimated text.
Example 21 includes the system of example 16, wherein the neural network model is to be generated using end-to-end training and the identifying means is to execute the neural network model to determine the audio response based on the text.
Example 22 includes the system of example 16, wherein the detecting means is to determine a force associated with the touch event and the sampling means is to sample the portion of the display frame having a size based on the force.
Example 23 includes at least one storage device comprising instructions that, when executed, cause a system-on-chip to at least recognize one or more of text or graphics in a region of a display frame, the region associated with a touch event on a display screen of an electronic device; cause, in response to the recognition of the text in the region, an audio output corresponding to the text to be output by the electronic device; and cause, in response to the recognition of the graphics in the region, a haptic feedback response to be output by the electronic device.
Example 24 includes the at least one storage device of example 23, wherein the instructions, when executed, cause the system-on-chip to cause a display controller to sample the display frame to generate display frame image data in response to the touch event, the display frame image data including the region.
Example 25 includes the at least one storage device of examples 23 or 24, wherein a size of the region is to be based on an amount of force associated with the touch event.
Example 26 includes the at least one storage device of examples 23 or 24, wherein the instructions, when executed, cause the system-on-chip to execute a neural network model to determine the audio output corresponding to the text.
Example 27 includes the at least one storage device of example 23, wherein the instructions, when executed, cause the system-on-chip to execute a first neural network model to identify content in the region as including the text and execute a second neural network model to recognize the text and determine the audio output corresponding to the recognized text.
Example 28 includes the at least one storage device of examples 23 or 24, wherein the instructions, when executed, cause the system-on-chip to recognize the text and the graphics in the region and cause the audio output and the haptic feedback response to be output.
Example 29 includes a method including identifying, by executing an instruction with at least one processor of a system-on-chip, one or more of text or graphics in display frame image data, the display frame image data corresponding to a portion of a display frame associated with a touch event on a display screen of an electronic device; causing, in response to identification of the text in the display frame image data and by executing an instruction with the at least one processor, an audio output corresponding to the text to be output by the electronic device; and causing, in response to the identification of the graphics in the display frame image data and by executing an instruction with the at least one processor, a haptic feedback response to be output by the electronic device.
Example 30 includes the method of example 29, further including causing a display controller to generate the display frame image data in response to the touch event.
Example 31 includes the method of examples 29 or 30, wherein a size of the portion is to be based on an amount of force associated with the touch event.
Example 32 includes the method of examples 29 or 30, further including executing a neural network model to identify the text.
Example 33 includes the method of example 29, further including executing a first neural network model to identify content in the display frame image data as including the text and executing a second neural network model to recognize the text and determine the audio output corresponding to the recognized text.
Example 34 includes the method of example 29, further including identifying the text and the graphics in the display frame image data and causing the audio output and the haptic feedback response to be output in response to the identification of the text and the graphics.
Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.
The following claims are hereby incorporated into this Detailed Description by this reference, with each claim standing on its own as a separate embodiment of the present disclosure.
This patent claims priority under 35 U.S.C. § 119 to U.S. Provisional Patent Application No. 63/105,021, filed on Oct. 23, 2020, and to U.S. Provisional Patent Application No. 63/105,025, filed on Oct. 23, 2020. U.S. Provisional Patent Application No. 63/105,021 and U.S. Provisional Patent Application No. 63/105,025 are hereby incorporated by reference in their entries. Priority to U.S. Provisional Patent Application No. 63/105,021 and U.S. Provisional Patent Application No. 63/105,025 is hereby claimed.
Number | Date | Country | |
---|---|---|---|
63105025 | Oct 2020 | US | |
63105021 | Oct 2020 | US |