The present invention relates generally to extracting geographic information from TV images using optical character recognition (OCR) or from audio to superimpose a relevant map on the image.
Present principles understand that when viewing a TV show of a scene, e.g., a news show reporting a fire or an ongoing police chase, a viewer may wish to know where the event is occurring apart from a verbal report by the TV reporter. As also understood herein, merely extracting geographic information from a TV image as it is being recorded is insufficient to satisfy the viewer'S real-time curiosity.
Furthermore, present principles understand that simply obtaining a map image that might be related to a TV show likewise impedes a viewer's understanding derived from a visual representation of the event location if the map is displayed in an inconvenient manner.
A TV system includes a TV display, a processor controlling the TV display to present TV images, and one or more audio speakers which are caused by the processor to present audio associated with the TV images. A computer-readable medium is accessible to the processor and bears instructions to cause the processor to extract text information from the audio and/or images. The instructions also cause the processor to determine whether the text information represents a geographic place name, and if the text information represents a geographic place name, to present a map of a geographic place corresponding to the geographic place name in a picture-in-picture window on the TV display.
In some embodiments the processor can receive user input indicating whether maps should be presented during operation. If the user input indicates maps are to be presented the processor can prompt the user to enter a desired time period defining how long a map is presented on the TV display. The processor then presents maps on the TV display for time periods conforming to a user-entered desired time period. Similarly, if the user input indicates maps are to be presented, the processor can prompt the user to enter a desired map scale and then present maps on the TV display conforming to the desired map scale. If desired, the processor may extract text information from both the audio and images and only if text from the audio representing a geographic place name matches text in the video, present a corresponding map.
In another aspect, a TV system includes a TV display, a processor controlling the TV display to present TV images, and one or more audio speakers which are caused by the processor to present audio associated with the TV images. A computer-readable medium is accessible to the processor avid bears instructions to cause the processor to receive user input indicating whether a map feature is to be enabled and only if the user input indicates that the map feature is to be enabled, to extract text information from the audio and/or images. The processor correlates the text information to a map of a geographic place corresponding to the text information.
In yet another aspect, a TV processor executes a method that includes receiving a TV signal, presenting the TV signal on a TV display and at least one TV speaker, and analyzing the TV signal for geographic words. In response to detecting a geographic word, the method executed by the processor includes presenting on the TV display, along with the TV signal and in real time without first recording the TV signal, an image of a map showing the geographic location indicated by the geographic word.
The details of the present invention, both as to its structure and operation, can best be understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which:
Referring initially to
The processor 20 can access one or more computer-readable media 24 such as solid state storages, disk storages, etc. The media 24 may include instructions executable by the processor 20 to undertake the logic disclosed below. Also, the media 24 may store map information. In addition or alternatively the TV system 10 may include a computer interface 26 such as but not limited to a modem or a local area network (LAN) connection such as an Ethernet interface that establishes communication with a wide area network 28, and map information can be downloaded from one or more servers on the WAN 28 in real time on as needed-basis.
To support the below-described text extraction from audio, a microphone 30 may be provided and may be in communication with the processor 20. The processor alternatively may process the received electrical signal representing the audio without need for a microphone. Also, a wireless command signal receiver 32 such as an RF or infrared receiver can receive user input from, e.g., a remote control 34 and send the user input to the processor 20.
Cross-referencing
The user may also be given the opportunity to select a desired map scale. For example, the user can be given the opportunity to input a textual scale designation (e.g., “neighborhood”, “city”, “county”, “region”, “state”, etc.) Or, a drop-down menu with predefined scales can be presented from which the user can select a desired scale.
These selections are shown on the example user interface of
Now referring to
As recognized herein, it may be desirable to limit map display to only geographic place names that appear on both the image and in the audio of the TV signal, underscoring the importance of the particular place name. If this is determined to be the case as represented by decision diamond 44, the logic flows to block 46 to enter a DO loop when the match feature is active. At block 48, it is determined for text in the image whether the same word is in the accompanying audio. To this end, the output of the microphone 30 shown in
At optional block 50, the logic classifies text extracted at block 42 into genres using classification engine techniques. For example, an index of geographic place names may be stored in the medium 24 or accessed on the WAN 28 and if text matches an entry in the index it is classified as “geographic”. In addition or alternatively if text contains geo-centric terms such as “lake”, “township”, “burg”, “street”, it may be classified as geographic.
If the text is determined to be a geographic place name at decision diamond 52, the logic moves to block 54 to obtain a computer-stored map of the place name. The map may be accessed from a map database in the medium 24 and/or downloaded from the WAN 28 through the network interface 28.
Proceeding to block 56, the map obtained at block 54 is presented on the TV display 14 for the user-selected time duration and at the user-selected scale. To this end, the processor 20 scales the map according to the user selection, if enabled.
Referring briefly to
Instead of using OCR to extract text from the TV image for map selection, present principles may apply to using voice recognition to extract words from the audio for map selection. Such an embodiment is shown in
As recognized herein, it may be desirable to limit map display to only geographic place names that appear on both the image and in the audio of the TV signal, underscoring the importance of the particular place name. If this is determined to be the case as represented by decision diamond 66, the logic flows to block 68 to enter a DO loop when the match feature is active. At block 70, it is determined for text extracted from the audio whether the same word is in the accompanying image. To this end, the processor 20 can execute an OCR engine that can be stored on the medium 24. Only if a match is found when this feature is activated does the logic proceed to block 72. If the matching feature is not activated the logic moves from decision diamond 66 to block 72.
At optional block 72, the logic classifies text extracted at block 64 into genres using classification engine techniques. For example, an index of geographic place names may be stored in the medium 24 or accessed on the WAN 28 and if text matches an entry in the index it is classified as “geographic”. In addition or alternatively if text contains geo-centric terms such as “lake”, “township”, “burg”, “street”, it may be classified as geographic.
If the text is determined to be a geographic place name at decision diamond 74, the logic moves to block 76 to obtain a computer-stored map of the place name. The map may be accessed from a map database in the medium 24 and/or downloaded from the WAN 28 through the network interface 28.
Proceeding to block 78, the map obtained at block 76 is presented on the TV display 14 for the user-selected time duration and at the user-selected scale. To this end, the processor 20 scales the map according to the user selection, if enabled.
While the particular EXTRACTING GEOGRAPHIC INFORMATION FROM TV SIGNAL TO SUPERIMPOSE MAP ON IMAGE is herein shown and described in detail, it is to be understood that the subject matter which is encompassed by the present invention is limited only by the claims.