1. Field of the Invention
This invention relates generally to computer networks, and, more particularly, to computer networks including multiple computer systems, wherein one of the computer systems sends screen image information to another one of the computer systems.
2. Description of the Related Art
The United States government has enacted legislation that requires all information technology purchased by the government to be accessible to the disabled. The legislation establishes certain standards for accessible Web content, accessible user agents (i.e., Web browsers), and accessible applications running on client desktop computers. Web content, Web browsers, and client applications developed according to these standards are enabled to work with assistive technologies, such as screen reading programs (i.e., screen readers) used by visually impaired users.
There is one class of applications, however, for which there is currently no accessible solution for visually impaired users. This class includes applications that allow computer system users (i.e., users of client computer systems, or “clients”) to share a remote desktop running on another user's computer (e.g., on a server computer system, or “server”). At least some of these applications allow a user of a client to control an input device (e.g., a keyboard or mouse) of the server, and display the updated desktop on the client. Examples of these types of application include Lotus® Sametime®, Microsoft® NetMeeting®, Microsoft® Terminal Service, and Symantec® PCAnywhere® on Windows® platforms, and the Distributed Console Access Facility (DCAF) on OS/2® platforms. In these applications, bitmap images (i.e., bitmaps) of the server display screen are sent to the client for rerendering. Keyboard and mouse inputs (i.e., events) are sent from the client to the server to simulate the client user interacting with the server desktop.
An accessibility problem arises in the above described class of applications in that the application resides on the server machine, and only an image of the server display screen is displayed on the client. As a result, there is no semantic information at the client about the objects within the screen image being displayed. For example, if an application window being shared has a menu bar, a sighted user of the client will see the menu, and understand that he or she can select items in the menu. On the other hand, a visually impaired user of the client typically depends on a screen reader to interpret the screen, verbally describe that there is a menu bar (i.e., menu) displayed, and then verbally describe (i.e., read) the choices on the menu.
With no semantic information available at the client, a screen reader running on the client will only know that there is an image displayed. The screen reader will not know that there is a menu inside the image and, therefore, will not be able to convey that significance or meaning to the visually-impaired user of the client.
Current attempts to solve this problem have included use of optical character recognition (OCR) technology to extract text from the image, and create an off-screen model for processing by a screen reader. These methods are inadequate because they do not provide semantic information, are prone to error, and are difficult to translate.
A computer network is described including a first computer system and a second computer system. The first computer system transmits screen image information and corresponding speech information to the second computer system. The screen image information includes information corresponding to a screen image intended for display within the first computer system. The speech information conveys a verbal description of the screen image, and, when the screen image includes one or more objects (e.g., menus, dialog boxes, icons, and the like) having corresponding semantic information, the speech information includes the corresponding semantic information.
The second computer system may receive the speech information, and respond to the received speech information by producing an output (e.g., human speech via an audio output device, a tactile output via a Braille output device, and the like). When the screen image includes an object having corresponding semantic information, the output conveys the semantic information. The semantic information conveyed by the output allows a visually-impaired user of the second computer system to know intended purposes of the one or more objects in the screen image.
The second computer system may also receive user input, generate an input signal corresponding to the user input, and transmit the input signal to the first computer system. In response to the input signal, the first computer system may update the screen image. Where the user of the second computer system is visually impaired, the semantic information conveyed by the output enables the visually-impaired user to properly interact with the first computer system.
The invention may be understood by reference to the following description taken in conjunction with the accompanying drawings, in which like reference numerals identify similar elements, and in which:
Illustrative embodiments of the invention are described below. In the interest of clarity, not all features of an actual implementation are described in this specification. It will, of course, be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which will vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.
As will become evident, the computer network 100 requires only 2 computer systems to operate as described below: the server 102, and one of the clients, either the client 104A or client 104B. Thus, in general, the computer network 100 includes 2 or more computer systems.
As indicated in
In general, the screen image information is information regarding a screen image generated within the server 102, and intended for display within the server 102 (e.g., on a display screen of a display system of the server 102). The corresponding speech information conveys a verbal description of the screen image. The speech information may include, for example, general information about the screen image, and also any objects within the screen image. Common objects, or display elements, include menus, boxes (e.g., dialog boxes, list boxes, combination boxes, and the like), icons, text, tables, spreadsheets, Web documents, Web page plugins, scroll bars, buttons, scroll panes, title bars, frames split bars, tool bars, and status bars. An “icon” is a picture or image that represents a resource, such as a file, device, or software program. General information about the screen image, and also any objects within the screen image, may include, for example, colors, shapes, and sizes.
More importantly, the speech information also includes semantic information corresponding to objects within the screen image. As will be described in detail below, this semantic information about the objects allows a visually-impaired user of the client 104A to interact with the objects in a proper, meaningful, and expected way.
In general, the server 102 and the clients 104A–104B communicate via signals, and the communication medium 106 provides means for conveying the signals. The server 102 and the clients 104A–104B may each include hardware and/or software for transmitting and receiving the signals. For example, the server 102 and the clients 104A–104B may communicate via electrical signals. In this case, the communication medium 106 may include one or more electrical cables for conveying the electrical signals. The server 102 and the clients 104A–104B may each include a network interface card (NIC) for generating the electrical signals, driving the electrical signals on the one or more electrical cables, and receiving electrical signals from the one or more electrical cables. The server 102 and the clients 104A–104B may also communicate via optical signals, and communication medium 106 may include optical cables. The server 102 and the clients 104A–104B may also communicate via electromagnetic signals (e.g., radio waves), and communication medium 106 may include air.
It is noted that communication medium 106 may, for example, include the Internet, and various means for connecting to the Internet. In this case, the clients 104A–104B and the server 102 may each include a modem (e.g., telephone system modem, cable television modem, satellite modem, and the like). Alternately, or in addition, communication medium 106 may include the public switched telephone network (PSTN), and clients 104A–104B and the server 102 may each include a telephone system modem.
In the embodiment of
In the embodiment of
The screen image information is information regarding a screen image generated within the server 102, and intended for display to a user of the server 102. Thus the screen image would expectedly be displayed on a display screen of a display system of the server 102. The screen image information may include, for example, a bit map representation of the screen image, wherein the screen image is divided into rows and columns of “dots,” and one or more bits are used to represent specific characteristics (e.g., color, shades of gray, and the like) of each of the dots.
In the embodiment of
It is noted that where the server 102 includes a display system similar to that of the display system 208 of the client 104A, the screen image is expectedly displayed on the display screens of the user 102 and the client 104A at substantially the same time. (It is noted that communication delays between the server 102 and the client 104A may prevent the screen image from being displayed on the display screens of the user 102 and the client 104A at exactly the same time.)
The communication path or channel 206 is formed through the communication medium 106 of
In the embodiment of
During execution, the assistive technology application 212 also produces speech information corresponding to the screen image information. In the embodiment of FIG. 2, the speech information conveys human speech which verbally describes general attributes (e.g., color, shape, size, and the like) of the screen image and any objects (e.g., menus, dialog boxes, icons, text, and the like) within the screen image, and also includes semantic information conveying the meaning, significance, or intended purpose of each of the objects within the screen image. The speech information may include, for example, text-to-speech (TTS) commands and/or audio output signals. Suitable assistive technology applications are known and commercially available.
In the embodiment of
In the embodiment of
During execution, the generic application 216 also produces accessibility information, and provides the accessibility information to a screen reader 218. Further, the screen reader 218 may monitor the behavior of the generic application 216, and produce accessibility information dependent upon the behavior of the generic application 216. In general, a screen reader is a software program that uses screen image information to produce speech information, wherein the speech information includes semantic information of objects (e.g., menus, dialog boxes, icons, and the like) within the screen image. This semantic information allows a visually impaired user to interact with the objects in a proper, meaningful, and expected way. The screen reader 218 uses the received accessibility information, and the screen image information available within the server 102, to produce the above described speech information. The screen reader 218 provides the speech information to the speech application program interface (API) 214. Suitable screen reading applications (i.e., screen readers) are known and commercially available.
It is noted that the server 102 need not include both the assistive technology application 212, and the combination of the generic application 216 and the screen reader 218, at the same time. For example, the server 102 may include the assistive technology application 212, and may not include the generic application 216 and the screen reader 218. Conversely, the server 102 may include the generic application 216 and the screen reader 218, and may not include the assistive technology application 212. This is supported by the fact that in a typical multi-tasking computer system operating environment, only one software program is actually being executed at any given time.
In the embodiment of
The distributed console access application 200 provides the input signals to either the assistive technology 212 or the generic application 216 (e.g., just as if the user activated a similar input device of the server 102). In response to the input signals, the assistive technology 212 or the generic application 216 typically responds to the input signals by updating the screen image information, and proving the updated screen image information to the distributed console access application 200 as described above. As a result, a new screen image is typically displayed on the display screen 210 of the client 104A.
For example, where the input device 220 is a mouse used to control the position of a pointer displayed on the display screen 210 of the display system 208, the user of the client 104A may move the mouse to position the pointer over an icon within the displayed screen image. Where the icon represents a software program (e.g., the assistive technology program 212 or the generic application 216), the user of the client 104A may initiate execution of the software program by activating (i.e., clicking) a button of the mouse. In response, the distributed console access application 200 of the server 102 may provide the mouse click input signal to the operating system of the server 102, and operating system may initiate execution of the software program. During this process, the screen image, displayed on the display screen 210 of the client 104A, may be updated to reflect initiation of the software program execution.
In the embodiment of
As described above, the speech information may include text-to-speech (TTS) commands. In this situation, the text-to-speech (TTS) engine 228 converts the text-to-speech (TTS) commands to audio output signals, and provides the audio output signals to an audio output device 230. The audio output device 230 may include, for example, a sound card and one or more speakers. As described above, the speech information may include also include audio output signals. In this situation, the text-to-speech (TTS) engine 228 may simply pass the audio output signals to the audio output device 230.
The speech information transmitter 222 may also transmit audio information (e.g., beeps) to the speech information receiver 224 of the client 104A in addition to the speech information. The text-to-speech (TTS) engine 228 may simply pass the audio information to the audio output device 230.
When the user of the client 104A is visually impaired, the user may not be able to see the screen image displayed on the display screen 210 of the client 104A. However, when the audio output device 230 produces the verbal description of the screen image, the visually-impaired user may hear the description, and understand not only the general appearance of the screen image and any objects within the screen image (e.g., color, shape, size, and the like), but also the meaning, significance, or intended purpose of any objects within the screen image as well (e.g., menus, dialog boxes, icons, and the like). This ability for a visually-impaired user to hear the verbal description of the screen image and to know the meaning, significance, or intended purpose of any objects within the screen image allows the user of the client 104A to interact with the objects in a proper, meaningful, and expected way.
The various components of the server 102 typically synchronize their actions via various handshaking signals, referred to generally herein as response signals, or responses. In the embodiment of
As indicated in
It is noted that the speech information transmitter 222 may transmit speech information to, and receive responses from, multiple clients. In this situation, the speech information transmitter 222 may receive the multiple responses, possibly at different times, and provide a single, unified, representative response to the speech application program interface (API) 214 (e.g., after the speech information transmitter 222 receives the last response).
As indicated in
It is noted that the speech information transmitter 222 and/or the speech information receiver 224 may be embodied within hardware and/or software. A carrier medium 236 may be used to convey software of the speech information transmitter 222 to the server 102. For example, the server 102 may include a disk drive for receiving removable disks (e.g., a floppy disk drive, a compact disk read only memory or CD-ROM drive, and the like), and the carrier medium 236 may be a disk (e.g., a floppy disk, a CD-ROM disk, and the like) embodying software (e.g., computer program code) for receiving the speech information corresponding to the screen image information, and transmitting the speech information to the client 104A.
Similarly, a carrier medium 238 may be used to convey software of the speech information receiver 224 to the client 104A. For example, the client 104A may include a disk drive for receiving removable disks (e.g., a floppy disk drive, a compact disk read only memory or CD-ROM drive, and the like), and the carrier medium 238 may be a disk (e.g., a floppy disk, a CD-ROM disk, and the like) embodying software (e.g., computer program code) for receiving the speech information corresponding to the screen image information from the server 102, and providing the speech information to an output device of the client 104A (e.g., the audio output device 230 via the TTS engine 228).
In the embodiment of
Further, the server 102 may send speech information to the client 104A without updating the screen image displayed on the display screen 210 of the client 104A (i.e., without sending corresponding screen image information). For example, where the input device 220 of the client 104A is a keyboard, the user of the client 104A may enter a key sequence via the input device 220 that forms a command to the screen reader 218 in the server 102 to “read the whole screen.” In this situation, the key sequence input signals may be transmitted to the server 102, and passed to the screen reader 218 in the server 102. The screen reader 102 may respond to the command to “read the whole screen” by producing speech information indicative of the contents of the current screen image. As a result, the speech information indicative of the contents of the current screen image may be passed to the client 104A, and the audio output device 230 of the client 104A may produce a verbal description of the contents of the current screen image. During this process, the screen image, displayed on the display screen 210 of the client 104A, expectedly does not change, and no new screen image information is transferred from the server 102 to the client 104A. In this situation, the screen image transmitting process is not involved.
In the peer-to-peer embodiment, any one the computer systems of the computers network 100 may generate and provide the screen image information and the speech information to one or more of the other computer systems, and receive input signals and/or responses from the one or more of the other computer systems, and thus be viewed as the master computer system as described above. In this situation, the one or more of the other computer systems are considered slave computer systems.
In the embodiment of
When the Braille output device 402 produces the Braille characters, the visually-impaired user of the client 104A may understand not only the general appearance of the screen image and any objects within the screen image (e.g., color, shape, size, and the like), but also the meaning, significance, or intended purpose of any objects within the screen image as well (e.g., menus, dialog boxes, icons, and the like). This ability allows the visually-impaired user to interact with the objects in a proper, meaningful, and expected way.
The particular embodiments disclosed above are illustrative only, as the invention may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. Furthermore, no limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope and spirit of the invention. Accordingly, the protection sought herein is as set forth in the claims below.
Number | Name | Date | Kind |
---|---|---|---|
5186629 | Rohen | Feb 1993 | A |
5223828 | McKiel, Jr. | Jun 1993 | A |
5630060 | Tang et al. | May 1997 | A |
6055566 | Kikinis | Apr 2000 | A |
6088675 | MacKenty et al. | Jul 2000 | A |
6115686 | Chung et al. | Sep 2000 | A |
6138150 | Nichols et al. | Oct 2000 | A |
6288753 | DeNicola et al. | Sep 2001 | B1 |
6442523 | Siegel | Aug 2002 | B1 |
20010032074 | Harris et al. | Oct 2001 | A1 |
20010056348 | Hyde-Thomson et al. | Dec 2001 | A1 |
20020129100 | Dutta et al. | Sep 2002 | A1 |
20020178007 | Slotznick et al. | Nov 2002 | A1 |
20030124502 | Chou | Jul 2003 | A1 |
20040113908 | Galanes et al. | Jun 2004 | A1 |
Number | Date | Country |
---|---|---|
2 296 951 | Jul 2001 | CA |
2001-100976 | Apr 2001 | JP |
Number | Date | Country | |
---|---|---|---|
20030208356 A1 | Nov 2003 | US |