The present specification generally relates to vision-assist devices, and more particularly, to vision-assist devices that provide auditory qualifying information regarding detected objects in an environment.
Blind or visually impaired persons have difficulty navigating within their environment because of their inability to detect the location and type of objects within the environment. Blind or visually impaired persons often use a cane to assist them in navigating a space. Although computer-based vision systems are able to detect objects present within image data, such vision systems are often incorrect in the type of object that is detected. If the computer-based vision system leads a person to believe that the object is one type of object, but it turns out that the object is something different, the person may no longer trust the information provided by the computer-based vision system. Therefore, such computer-based vision systems may be ineffective in assisting the blind and visually impaired because of a lack of trust regarding the information with respect to detected objects.
Accordingly, a need exists for alternative vision-assist devices for blind or visually impaired persons.
In one embodiment, a vision-assist device includes at least one image sensor for generating image data corresponding to a scene, a processor, and an auditory device. The processor is programmed to receive the image data from the at least one image sensor, perform object recognition on the image data to determine a classification of a detected object that is present within the scene, and determine a confidence value with respect to the classification of the detected object. The confidence value is based on a confidence that the classification of the detected object matches an actual classification of the detected object. The processor is further programmed to generate an auditory signal based on the confidence value. The audio device receives the auditory signal from the processor and produces an auditory message from the auditory signal. The auditory message is indicative of the classification of the detected object and the confidence value.
In another embodiment, a method of detecting a classification of an object includes receiving image data of a scene from at least one image sensor, determining, by a processor and from the image data, a classification of a detected object that is present within the scene, and determining a confidence value with respect to the classification of the detected object. The confidence value is based on a confidence that the classification of the detected object matches an actual classification of the detected object. The method further includes producing an auditory message that is indicative of the classification of the detected object and the confidence value.
These and additional features provided by the embodiments described herein will be more fully understood in view of the following detailed description, in conjunction with the drawings.
The embodiments set forth in the drawings are illustrative and exemplary in nature and not intended to limit the subject matter defined by the claims. The following description of the illustrative embodiments can be understood when read in conjunction with the following drawings, where like structure is indicated with like reference numerals and in which:
Referring generally to the figures, embodiments of the present disclosure are directed to vision-assist devices for assisting blind or visually impaired individuals navigate their environment. Generally, embodiments described herein may be configured as devices that capture image data of the user's environment using one or more image sensors (e.g., one or more cameras), and perform object recognition analysis to detect objects or people within the user's environment. Such information may be useful to the blind or visually impaired individual as he or she navigates the environment. Because object recognition analysis may not always yield 100% accurate results, the embodiments described herein produce an auditory message to the user of not only the type of object that is detected by the device, but also the degree of confidence that the type or classification of object determined by the device is actually the type or classification of the object physically present within the environment. In this manner, the user may decide for herself whether or not to trust or otherwise accept the auditory information provided by the vision-assist device.
As a non-limiting example, the vision-assist device may detect that a staircase is in front of the user, and may therefore produce an auditory message that says “I am 60% sure that a staircase is directly in front of you.” As described in more detail below, the auditory message may also provide the object recognition information in more general terms of degree, such as: “I am fairly certain that a staircase is directly in front of you,” or “I think that the staircase is in front of you, but I am not sure.” The user may then decide to investigate the object further, disregard the auditory message, or take any other action.
Various embodiments of vision-assist devices and methods of detecting the classification of an object are described in detail herein.
Referring now to
The memory component 140 may be configured as volatile and/or nonvolatile non-transitory computer readable medium and, as such, may include random access memory (including SRAM, DRAM, and/or other types of random access memory), flash memory, registers, compact discs (CD), digital versatile discs (DVD), magnetic disks, and/or other types of storage components. Additionally, the memory component 140 may be configured to store, among other things, operation logic, object recognition logic, and auditory message generation logic, as described in more detail below. The memory component 140 may also store data, such as image data captured by the one or more image sensors or externally acquired image data, for performing the object recognition analysis described hereinbelow.
A local interface 120 is also included in
The one or more processors 110 may include any processing component configured to receive information and execute instructions (such as from the memory component 140).
The one or more image sensors 130 are configured to capture image data of the environment (i.e., scene) in which the vision-assist device 100 operates. The image data digitally represents the scene in which the vision-assist device 100 operates, such as objects and people within the scene. The image sensor 130 may be configured as any sensor operable to capture image data, such as, without limitation, a charged-coupled device image sensors or complementary metal-oxide-semiconductor sensors capable of detecting optical radiation having wavelengths in the visual spectrum, for example. The one or more image sensors 130 may be configured to detect optical radiation in wavelengths outside of the visual spectrum, such as wavelengths within the infrared spectrum. In some embodiments, two image sensors 130 are provided to create stereo image data capable of capturing depth information.
The one or more auditory devices 150 may be configured as speakers capable of receiving auditory signals from the processor 110 (either directly or indirectly from other hardware, such as amplifiers, drivers, digital-to-analog converts, and the like) to produce auditory message capable of being heard by the user. In some embodiments, the one or more auditory devices 150 include a first speaker and a second speaker so that the auditory message is provided to the user in stereo.
The one or more user input devices 160 are provided for the user to communicate with the vision-assist device 100. The one or more user input devices 160 may be used by the user to complete tasks such as program preferences or settings, provide commands, and provide feedback to the vision-assist device 100. The one or more user input devices 160 may take on any appropriate form. For example, the one or more user input devices 160 may be configured as a keyboard, buttons, switches, touch-sensitive pads, microphones, and the like. Any appropriate user input device may be utilized.
It should be understood that the vision-assist device 100 may include additional components not illustrated in
Referring now to
In some embodiments, the housing 180 is made from a pliable material, such as, without limitation, ethylene-vinyl acetate. In other embodiments, the housing 180 is made from a rigid material.
Referring specifically to
The first and second image sensors 130A, 130B are configured to capture image data to produce three-dimensional images of the scene as the user navigates the environment that are used by the object recognition algorithm(s) to detect objects and people, as described in detail below. As shown in
The first and second audio devices 150A, 150B produce auditory messages that are intended to be received by the user 170. The auditory messages may provide menu navigation options to the user so that the user may program or otherwise set parameters of the vision-assist device 100. Auditory messages also include environmental information about the scene, as described in detail below. Although two audio devices are illustrated, more or fewer audio devices may be provided. In some embodiments, a microphone is also provided as a user-input device to enable voice-control of the vision-assist device 100. In this manner, the user may provide feedback to the vision assist device 100 using voice commands. As an example and not a limitation, first and/or second audio device 150A, 150B may be configured as a combination speaker/microphone device capable of both receiving voice commands and emitting auditory messages/sounds.
Referring now to
The illustrated vision-assist device 200 further includes an earpiece 290 configured to be worn around the ear of a user. The earpiece includes an audio device 250 that is inserted into the user's ear and produces the auditory messages described herein. The example earpiece 290 further includes a microphone 260 as a user input device for inputting information into the vision-assist device 200 (i.e., voice controls). Accordingly, the earpiece 290 acts as an input/output device for the vision-assist device 200. As shown by symbol 295, the earpiece 290 may be in wireless communication with the components (e.g., the processor) within the housing 280. In other embodiments, the earpiece 290 is integrated into the eyeglass frame housing 280.
Operation of a vision-assist device 100 will now be described.
Any known or yet-to-be-developed object recognition algorithms may be utilized to detect objects within the image data representing the environment. Example object recognition algorithms include, but are not limited to, edge detection algorithms, corner detection algorithms, blob detection algorithms, and feature description algorithms (e.g., scale-invariant feature transform (“SIFT”), speeded up robust features (“SURF”), gradient location and orientation histogram (“GLOH”), and the like. It should be understood that the phrase “object recognition algorithm” as used herein also includes facial recognition algorithms used to detect people present within image data.
At block 430, the classification of the object is determined as a result of the object recognition process. The classification represents the type of object that is detected. For example, the image data representing the environment 300 shown in
Object recognition algorithms are not one hundred percent accurate and may misclassify objects for a variety of reasons. Non-limiting reasons that objects may be classified incorrectly is low ambient light, errors in the image data, the pose of the object in the environment, similarity in shape between different types of objects, and unfamiliarity with the object. Object recognition algorithms of the present disclosure produce a confidence value with respect to the objects that are detected from the image data (block 440). The confidence value represents a degree of confidence regarding whether or not the actual object in the physical environment is the type of object determined from the image data. The confidence value may range from a minimum value (i.e., lowest confidence) to a maximum value (i.e., highest confidence). Low confidence values are produced when the object recognition cannot determine the proper classification of the detected object with high confidence. Conversely, when the object recognition is sure of the classification of the detected object, it produces a high confidence value. In some embodiments, the confidence value is a percentage ranging from 0% as the minimum value to 100% as the maximum value. In other embodiments, the confidence value may not be in the form of a percentage, but rather a number on a predetermined scale (e.g., 0 to 1.0, or 0 to 20).
A user of the vision-assist device 100 should trust the information it provides. If the vision-assist device 100 repeatedly tells the user that objects are something different than the actual objects in the environment, then the user may not wish to use the vision-assist device 100. Embodiments of the present disclosure convert the confidence value produced by the object recognition algorithm(s) into an auditory message capable of being heard and understood by the blind or visually impaired user. The auditory message provides a degree of certainty as to whether or not the classification of the detected object is actually the type of object physically present in the environment. By providing the user with auditory information in such a qualifying manner, the user may decide whether or not to trust the environmental information, and may trust the information provided by the vision-assist device 100 as a whole and may continue to want to use it.
At block 450, the processor 110 generates an auditory signal that includes at least the classification of the detected object as well as a representation of the confidence value for the detected object. An auditory signal, as used herein, is a digital or analog signal produced by the processor (alone or in conjunction with additional circuits or integrated circuits) that is provided to the one or more audio devices 150 (block 460) and represents the auditory message that is emitted by the one or more audio devices 150.
In one embodiment, the auditory signal produced by the processor 110 may be generated by any known or yet-to-be-developed computer speech synthesis processes. In some embodiments, the auditory signal produces an auditory message that is conversational in nature (i.e., in a complete sentence). For example, auditory signal provided to the audio device 150 may state: “I am 40% sure that there is a trashcan in front of you.” Accordingly, the auditory message includes both the classification of the object (e.g., “trashcan”) and an auditory representation of the confidence value (e.g., “40% sure”).
Other types of auditory messages produced by the audio device 150 at block 460 do not recite an actual percentage of confidence but rather qualifies the classification of the object detected by the vision-assist device 100 in other ways. As a non-limiting example, the auditory messages may qualify the classification by use of words and phrases such as “not sure,” “I think,” “pretty sure,” “fairly certain,” “positive,” “absolutely sure,” and the like. For example, the auditory message may state “I think that there is a trashcan in front of you but I am not sure” when the confidence value is very low, and “I am absolutely sure that there is a trashcan is in front of you” when the confidence value is very high. With this type of qualifying information, the user of the vision-assist device 100 may decide whether or not to trust the information, and may use it to determine how he or she may respond (e.g., disregard the information, get a closer look, etc.).
A non-limiting method of selecting conversional qualifying phrases for the confidence of the auditory message will now be described. Referring to
Each sub-increment 502A-502E has one or more auditory descriptions associated therewith. The auditory descriptions are the qualifying statements regarding the confidence value. Auditory descriptions in sub-increment 502A represent less of a degree of certainty with respect to the classification of the detected object than the auditory descriptions in sub-increment 502E. As an example and not a limitation, auditory descriptions associated with sub-increment 502A may include “not sure,” and “I think,” and auditory descriptions associate with sub-increment 502E may include “I am sure,” and “I am positive,” as non-limiting examples. It should be understood that many auditory descriptions are possible.
When generating the auditory signal, the sub-increment associated with confidence value of the classification of the particular detected object is determined. Using the example from above, if the object recognition algorithm has detected object 301A in
The processor 110 may also incorporate directional information into the auditory message, such as how many degrees to left and right, for example. Distance information may also be provided (e.g., “I am fairly certain that there is a trashcan 10 yards away from you and about 30 degrees to your left”).
In some embodiments, the vision-assist device 100 may learn the preferences of the user based on user feedback. The vision-assist device 100 may then adjust how and/or what type of auditory information is provided. Additionally, user feedback may be used in storing image data that is used by the object recognition algorithm in classifying detected objects, as described in more detail below.
Referring to the flowchart 600 of
However, if at block 620 the user feedback does not confirm the classification provided by the vision-assist device 100, the process moves to block 640, where it is evaluated whether or not the user feedback rejects the classification of the detected object provided by the vision-assist device 100. If the user feedback does not reject the classification provided by the vision-assist device 100 (e.g., the user feedback is unrelated to classification of objects), the process ends at block 650, where perhaps some other action is taken. If the user feedback does in fact reject the classification provided by the vision-assist device 100 (either the user indicates he is going to ignore the auditory message or he says it is an object that is different from what the vision-assist device 100 said that it was), then the process moves to block 660 where auditory messages are filtered or altered. For example, the vision-assist device may have indicated that it is 40% sure that detected object is a trashcan. If the user rejects the information, the vision-assist device may then not provide any more auditory messages to the user where the vision-assist device 100 is less than 40% sure that the object is a trashcan.
In other scenarios, if the user feedback indicates that the target object had a classification different than what the vision-assist device 100 said that it was, the information associated with the rejected classification may be stored and used to improve object recognition in the future.
It should now be understood that embodiments described herein are directed to vision-assist devices for use by blind or visually impaired persons. The vision-assist devices described herein detect objects within the user's environment. The vision-assist devices produce auditory messages regarding the types of objects detected in the environment. Further, the auditory messages include qualifying statements regarding the confidence in which the vision-assist device believes that it is correct with respect to the type of object it has detected. The user may then know how much faith she should place in the information provided by the vision-assist device, thereby increasing the trust that the user has in the vision-assist device.
While particular embodiments have been illustrated and described herein, it should be understood that various other changes and modifications may be made without departing from the spirit and scope of the claimed subject matter. Moreover, although various aspects of the claimed subject matter have been described herein, such aspects need not be utilized in combination. It is therefore intended that the appended claims cover all such changes and modifications that are within the scope of the claimed subject matter.
Number | Name | Date | Kind |
---|---|---|---|
5774841 | Salazar | Jun 1998 | A |
6023688 | Ramachandran | Feb 2000 | A |
6542809 | Hehls, III | Apr 2003 | B2 |
6554987 | Gilchrist et al. | Apr 2003 | B1 |
6710706 | Withington et al. | Mar 2004 | B1 |
7336226 | Jung et al. | Feb 2008 | B2 |
7751937 | Sabe et al. | Jul 2010 | B2 |
7778732 | Adachi et al. | Aug 2010 | B2 |
7840326 | Yamada | Nov 2010 | B1 |
8063929 | Kurtz | Nov 2011 | B2 |
8140188 | Takemitsu et al. | Mar 2012 | B2 |
8154578 | Kurtz | Apr 2012 | B2 |
8154583 | Kurtz | Apr 2012 | B2 |
8159519 | Kurtz | Apr 2012 | B2 |
8237771 | Kurtz | Aug 2012 | B2 |
8253770 | Kurtz | Aug 2012 | B2 |
8274544 | Kurtz | Sep 2012 | B2 |
8325263 | Kato | Dec 2012 | B2 |
8438127 | Kurata et al. | May 2013 | B2 |
8442714 | Matsukawa et al. | May 2013 | B2 |
8467133 | Miller | Jun 2013 | B2 |
8655440 | Adachi | Feb 2014 | B2 |
8660734 | Zhu et al. | Feb 2014 | B2 |
8676937 | Rapaport | Mar 2014 | B2 |
8706298 | Goulding | Apr 2014 | B2 |
8775341 | Commons | Jul 2014 | B1 |
8825488 | Scrobbins, II et al. | Sep 2014 | B2 |
8849391 | Adachi | Sep 2014 | B2 |
8850597 | Gates | Sep 2014 | B1 |
8929612 | Ambrus | Jan 2015 | B2 |
9171389 | Lee | Oct 2015 | B2 |
20050096839 | Nakano et al. | May 2005 | A1 |
20070153091 | Watlington | Jul 2007 | A1 |
20070279494 | Aman | Dec 2007 | A1 |
20090190797 | McIntyre | Jul 2009 | A1 |
20110188664 | Morikawa | Aug 2011 | A1 |
20110238212 | Shirado et al. | Sep 2011 | A1 |
20120053826 | Slamka | Mar 2012 | A1 |
20120127291 | Mahoney | May 2012 | A1 |
20120143808 | Karins | Jun 2012 | A1 |
20120309407 | Cazier | Dec 2012 | A1 |
20130007662 | Bank | Jan 2013 | A1 |
20130029443 | Pance | Jan 2013 | A1 |
20130115579 | Taghavi | May 2013 | A1 |
20130127980 | Haddick | May 2013 | A1 |
20130170752 | Ramnath Krishnan et al. | Jul 2013 | A1 |
20130261796 | Park | Oct 2013 | A1 |
20130272548 | Visser | Oct 2013 | A1 |
20130326209 | Dommalapati | Dec 2013 | A1 |
20140019522 | Weng | Jan 2014 | A1 |
20140055229 | Amedi et al. | Feb 2014 | A1 |
20140057232 | Wetmore et al. | Feb 2014 | A1 |
20140101757 | Gnesda | Apr 2014 | A1 |
20140126877 | Crawford | May 2014 | A1 |
20140160250 | Pomerantz | Jun 2014 | A1 |
20140161345 | Djugash | Jun 2014 | A1 |
20140184384 | Zhu et al. | Jul 2014 | A1 |
20140199041 | Blanco | Jul 2014 | A1 |
20140247206 | Grokop | Sep 2014 | A1 |
20150002808 | Rizzo, III et al. | Jan 2015 | A1 |
20150063713 | Yang et al. | Mar 2015 | A1 |
20150278224 | Jaber et al. | Oct 2015 | A1 |
20150358594 | Marshall | Dec 2015 | A1 |
20170153331 | Gum | Jun 2017 | A1 |
Number | Date | Country |
---|---|---|
2363251 | Sep 2011 | EP |
2012104626 | Aug 2012 | WO |
Entry |
---|
Takayuki Kanda, et al: Who will be the customer? A social robot that anticipates people's behavior from their trajectories; UbiComp '08 Proceedings of the 10th International Conference of Ubiquitous Computing; ACM, Sep. 21, 2008, ISBN: 978-1-60558-136-1. |
Number | Date | Country | |
---|---|---|---|
20160225286 A1 | Aug 2016 | US |