The present invention relates to a system and method for presenting and browsing information.
Visually impaired people or those that temporarily do not have the ability to “look” at a text, for example due to lighting conditions or requirements of a task being performed, e.g., driving, today can “read” or perceive a textual document by using “variable speed” Text-To-Speech translating devices. Similarly, a person can listen to a speech pre-recorded on a particular medium, like an audiotape or a compact disk (CD), which can be played back, perhaps under variable speed control.
The listening process, however, is, by nature, a sequential scan of an audio stream. It requires the listener to listen to the information being transmitted in a linear manner, from a beginning of the text to an end, to obtain an overall understanding of the information being presented. Listeners cannot effectively browse or navigate through a textual document using some device interfacing with a tape or CD player, for example a human speech recognition or switch interface. Additionally, and most importantly, an audio signal comes from its source, which is fixed in space in one perceived direction.
The ability to precisely control the perceived direction of a sound has been described in U.S. Pat. No. 5,974,152, titled “SOUND IMAGE LOCALIZATION CONTROL DEVICE”. That patent describes how a sound image localization control device reproduces an acoustic signal on the basis of a plurality of simulated delay times and a plurality of simulated filtering characteristics as if a sound image ware located on an arbitrary position other than positions of separately arranged transducers.
Several patents describe various techniques for achieving such control, for example U.S. Pat. No. 5,974,152, and U.S. Pat. No. 5,771,041, titled “SYSTEM FOR PRODUCING DIRECTIONAL SOUND IN COMPUTER BASED VIRTUAL ENVIRONMENT”, which describes the sound associated with the sound source is then reproduced from a sound track at the determined level, to produce an output sound that creates a sense of place within the environment.
Another patent, U.S. Pat. No. 5,979,586, titled “VEHICLE COLLISION WARNING SYSTEM” describes a vehicle collision warning system that converts collision threat messages from a predictive collision sensor into intuitive sounds, which are perceived by the occupant of the vehicle, the sounds are directed from the direction of a potential or imminent collision.
Human beings live in a three-dimensional space and can benefit or take special advantage of auditory cues that emanate from different locations in that space.
As the current technology lacks in any system or method for directing the delivery of auditory information to be perceived as coming from specific directions in the perceived auditory field based on a predetermined classification of the type of information that is being transmitted, and the ability to directionally navigate the information, thus increasing in difficulty and cost the ability to facilitate tasks, recognition, and recall, an object of the present invention is to substantially solve at least the above problems and/or disadvantages and to provide at least the advantages below.
Accordingly, an object of the present invention is to provide a system and method for presenting and browsing information, comprising the steps of classifying the information into a plurality of classes and sub-classes, each class having at least one sub-class; and presenting the plurality of classes of information to a user.
A further object of the present invention is to provide a system and method for presenting and browsing information, comprising the step of interactively controlling the presentation of the sub-classes.
The foregoing and other objects, aspects, and advantages of the present invention will be better understood from the following detailed description of preferred embodiments of the invention with reference to the accompanying drawings that include the following.
Several preferred embodiments of the present invention will now be described in detail herein below with reference to the annexed drawings. In the drawings, the same or similar elements are denoted by the same reference numerals even though they are depicted in different drawings. In the following description, a detailed description of known functions and configurations incorporated herein has been omitted for conciseness.
The present invention describes a system that can present categorized audio information to specific locations in a listener's aural field and allows the listener to navigate through this directionally “tagged” or “annotated” information, attending to details in sections that may be of interest while skipping over others that are not. Using this inventive navigation system the listener can quickly assess the “nature” of the information, can hierarchically ascend or descend into sections to explore them in more detail, and can navigate through the information to review previously read sections or study them in greater detail.
One embodiment of the present invention presents categorized information perceived in different locations of the listener's aural field and allows navigation through speech or other interface devices. The listeners can easily navigate the presented information and can associate certain information as coming from a particular location thus aiding recall. The listeners can also index or ask for replay of the information by referring to the location where they perceived such information has originated. For example, when traveling in a car, news can come from the perceived left of the listener, while stock exchange notifications can come from the right. Navigation directions from an in-car navigation system may come from the rear, or even from the direction that the driver/listener is suppose to turn. For example, when a left turn is suggested the notification comes from the left of the driver/listener's perceived auditory field. The advantage of the present invention is that listeners can quickly browse and navigate information in a more “random access” or hierarchical manner, allowing the listeners to more quickly assess their interest, to focus on parts of the audio information that are relevant to them, and to be able to quickly navigate the information that they have explored to attend to information of interest.
Many existing documents and other information sources today are classified into sections and the content can be interpreted as being hierarchical. For example, word processing document files typically have an abstract, headings, and paragraph tags, which define a hierarchical structure of a given document. Hyper text markup language (HTML) files have a similar classification structure that can be interpreted as hierarchical. Document headings, for example, are hierarchical in nature and their label or associated text can be interpreted as a description of the content of the document. Content (any information that is to be presented) may be classified based on the source/origin of the content. For example, news may come from a “News Service”, stock quotes may come from a “Stock Service”, and email may come from a “Message Service”. The origin of the content may be enough of a classification to determine its presentation. The user, for example, may define a profile for the system that tags the content, which in turn determines where in the aural field the information is delivered. In the above examples, the different content is output from a different direction.
Hierarchical content such as technical papers that exist in a classification form (e.g. HTML or any mark up language format) can also be easily presented to the user based on a user-specified profile. The system could be delivered with a set of default locations for information delivery to facilitate easy use. The sections are tagged and sequentially mapped, based on the directional tagging, to appear to be coming from locations that are separated by 60 degrees in the users aural field. The tagging and mapping are arbitrary and definable by the user through a profile. It is possible to take any unstructured document, classify it according to its hierarchical structure using annotation systems, and then directionally tag the classifications. A “Section/Hierarchy” annotator “markups” the document with hierarchy classifications that could be used for presentation. The present invention then interprets this classification and assists the user in examining the document. Another Section/Hierarchy annotator could use many heuristics and could be a very complex text analysis component depending on the type of documents processed. It could use some simple heuristics, such as, looking for section numbers that often appear in technical documents. For example, these documents often have sections that are numbered and subsections have successive numberings. For example,
As can be seen, “classification” herein relates to the preset or user defined section or hierarchy of the input data, whereas “directional tagging” or “tagging” relates to how the system according to the present invention will direct the output of the data.
As another example, the first sentence of a paragraph is usually a topic sentence describing what will be elaborated in the following paragraph. The last sentence often makes the major point. So, by classifying this inherent hierarchy that exists in many documents, the present invention enables the listener or user to preview or skim the structure of a document by listening to just the abstract and the headings. The abstract or heading can be considered the top level of the hierarchy. The user can then “jump” to other levels, e.g. the “abstract”, “summary”, “conclusion” or the heading of interest, and examine the sub-headings in the section. Similarly, the user can examine the topic sentence (first sentence) of each paragraph of a terminal sub-heading for a quick overview of that section. Additionally, the user can listen to each sentence of the paragraph for the fine grain details.
Many existing documents have a structure that can be interpreted as hierarchical and can be used directly using such a system. However, it is also possible to annotate any information input into the system of the present invention with meta-information, for example related to hierarchy, meaning or category, to afford presentation, browsing and navigation, especially useful for the blind or those that can not afford to look at written text due to the task that they are performing. Information sources may also be used to create a category for a piece of information. For example, all information coming from a stock quote service falls into the category “stocks”, news originating from a news service may fall into the category “news”, etc. The classification of “stock” or “news” can then be used to directionally tag the information and direct the output of the information and control the browsing commands.
In addition and according to another embodiment of the present invention, the user can directly control the ability to classify and tag the information and access these classifications and tags, thus giving the user greater ability to navigate previously explored information. Extending the system to support annotation and editing provides a powerful tool for the generation of documents facilitating their reading, browsing, and reuse.
According to another embodiment of the present invention, to facilitate recall and browsing, in addition to the hierarchical information associated with specific locations in the aural field, for example, each specific heading label and associated sub information may be presented as coming from a unique direction in the aural field, navigation could then be performed by taking advantage of this association. For example, the document could be browsed by jumping to a specific “Heading” by, for example, a pointing gesture (interpreted by an associated gesture recognition system) to a specific location in space associated with where that information originated upon first listening; turning an indicator dial that points to that location; or using speech to go to that named location, e.g., 35 degrees left. Ascending and descending the hierarchy can be achieved by similar methods referring however to an orthogonal axis, e.g., up, down. Humans, especially the blind, have an exceptionally well-developed spatial auditory memory and will greatly benefit from the present invention as a powerful mechanism for textual “landmarking” and navigation.
Input device 305 and the set of commands for navigation will now be described. Three input modalities will be elaborated: speech, electro/mechanical devices, and virtual reality gestures.
Speech is particularly useful in environments where the user is engaged in some other activity and does not have his hands free, such as when driving. Speech input systems are well known in the art. These speech input systems generally include a microphone for receiving the spoken words of a user, and a processor for analyzing the spoken words and performing a specific command or function based on the analysis. For example, many mobile telephones currently on the market are voice activated and will perform calling functions based on an input phrase, such as dialing a telephone number of a person stored in memory. The system according to the present invention can be programmed to respond to spoken degrees in the aural field. As shown in
Input devices are also contemplated as electro/mechanical devices that may include dials, buttons or graphical user interface devices (e.g. a computer mouse, etc . . . ) These electro/mechanical or standard computer input devices are quite common, and are all contemplated herein. By turning a dial to point in a predefined direction, or moving a joystick to point in a predefined direction, the system can navigate the information accordingly.
A third input device that is contemplated is a virtual reality input device. The virtual reality input device of the preferred embodiment is a device that will recognize the direction that a user is pointing and translate that direction into a command. The industry is replete with devices that can recognize a hand gesture of a user, whether that device is a user-worn glove, finger contacts, or an external recognition system. Whichever virtual reality input device is used, the object is to translate the direction of the user's gesture into a browsing command through the browsing manager 204.
Returning again to
In the above example where the user desires to hear section 2, it is possible that section 2 has been sub-tagged into further sections or categories as discussed above, the system can be programmed to output the section 2 classifications or playback of the section itself. These sub-processes can be preset or user defined, and can also be controlled by particular user input. For example, the user can have the option to input several commands based on the directional output, such as, “read 60 degrees” or “highlight 60 degrees”. If “read 60 degrees” is input the system would begin full playback of section 2, but if “highlight 60 degrees” is input the system would playback the section headings of section 2. The classification and tagging of the data, and range of input commands, are only limited to system design and resources.
The example illustrated in
While the invention has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.