In many different scenarios, a user of a portable electronic device may retrieve data via an interface of the device; the interface may oblige the user to enter a search query to identify the data to be retrieved. The user may be a motorist wishing to retrieve driving instructions from a navigation device, for example, or to play a song from the library of a portable music player. In these and other examples, the query may be entered directly as text, or it may be entered in some other form—e.g., handwriting or speech—and then converted to text. Text entry, however, whether direct or indirect, may be inconvenient, tedious, and/or prone to user error. This is true especially when the interface requires precise entry of long or hard-to-remember search queries. Naturally, the user that enters an erroneous search query on such an interface may have difficulty retrieving the desired data, which may cause frustration.
Some user interfaces automatically invoke an auto-correction, auto-completion, or so-called “partial search” method to modify search-query input from a user. However, some such methods rely on extensive network resources and services, making them more applicable to server systems than to portable devices. Other methods may be implemented on portable devices, but are less robust; some may be undone by the initial entry of a single erroneous character.
Therefore, one embodiment of this disclosure provides a data-retrieval method suitable for use on a portable electronic device, the device having a user interface and a database where a plurality of data items are indexed each to a corresponding index string The method comprises receiving a query string at the user interface and displaying one or more index strings on the user interface such that the relative prominence of each index string displayed increases with increasing resemblance of that index string to the query string. The method further comprises displaying an index string with greater prominence when a fixed-length substring of the query string occurs anywhere in the index string, regardless of position. In this manner, the relevance of prominently displayed index strings increases as more characters are appended to the query string, even if the query string contains errors.
It will be understood that the summary above is provided to introduce in simplified form a selected part of this disclosure, which is further described hereinafter. It is not meant to identify key or essential features of the claimed subject matter. Rather, the claimed subject matter is defined only by the claims and is not limited to implementations that solve any disadvantages noted herein.
The subject matter of this disclosure is now described by example and with reference to the illustrated embodiments listed above. Components, process steps, and other elements that may be substantially the same in one or more embodiments are identified coordinately and are described with minimal repetition. It will be noted, however, that elements identified coordinately may also differ to some degree. It will be further noted that the drawing figures included in this disclosure are schematic and generally not drawn to scale. Rather, the various drawing scales, aspect ratios, and numbers of components shown in the figures may be purposely distorted to make certain features or relationships easier to see.
Keypad 20, irrespective of its particular configuration, enables user 12 to enter text in the form of a character string—i.e., a sequence of characters. The characters of the character string may include alphanumeric characters (e.g., 0 through 9 and A through Z) in addition to punctuation characters and control characters, such as a line-feed character. In one embodiment, the characters forming a character string may be coded according to the ASCII standard, while other standards are equally contemplated. Throughout this disclosure, the terms “string” and “character string” are used interchangeably. “Query string” refers to a character string provided as input to specify an item to be retrieved from a database. “Index string” refers to a character string included in a database and used to index a particular data item therein.
Continuing in
Computer system 26 may be configured to enact any computation, processing, or control function of portable device 14. The computer system may be configured to receive input from keypad 20 and/or microphone 22 and to direct output to display 18 and/or loudspeaker 24. In one embodiment, the computer system may receive the electrical signal from the microphone and translate the audible speech received by the microphone into text. More specifically, the computer system may be configured to construct a query string from the translated, audible speech and use that query string in the various data-retrieval methods described hereinafter.
Aspects of data retrieval from portable device 14 will now be described with reference to an example scenario. In this scenario, the portable device is a motor-vehicle navigation system, and the user of the portable device is a motorist in Honolulu. The user is preparing to drive to 123 Kamehameha Street. If no auto-completion, auto-correction, or partial search feature were available on the portable device, the user would be obliged to enter the complete street address, which could be tedious and/or prone to error.
Suppose, however, that portable device 14 includes a database listing every street address on Oahu. If primitive auto-completion, auto-correction, or partial searching were available on the portable device, then the short query “123 KA” may result in the desired address appearing on display 18 as one of several options—e.g.,
Primitive auto-completion, auto-correction, and partial searching for portable devices may depend on the query string matching an index string from the database over the first N characters of the query string. However, if the query include a spelling error early in the word—e.g., “123 KHA”—such primitive methods may fail, and the desired address may not be among the options displayed. This is true no matter how many correct characters are subsequently entered. Instead of the desired address, the user will see options that match the erroneous query string over the first N characters—e.g.,
In view of this issue and others, primitive auto-completion, auto-correction, and partial search methods that rely on perfect agreement over the first N characters of a query string may not provide robust data retrieval.
In another scenario, portable device 14 may be configured to enact so-called “regular expression” or wildcard searching. These methods may be used to accommodate uncertainties in spelling and improve efficiency in data retrieval. However, they too are not robust and cannot remedy unexpected errors in the query string. In the above example, the query string “123 K*MEHA” would return the desired street address, but “123 KH*MEHA” would not.
In yet another scenario, portable device 14 could be configured, in principle, to enact so-called “typo-detection” or “query suggestion.” These methods are more robust and can be used to remedy unexpected errors in the query string. However, they may require connection of portable device 14 to an extensive database on a server. To function properly, the server may be configured to learn from search queries entered by multiple users. Accordingly, this approach may be difficult, slow, or costly to adapt to some data-retrieval environments.
To address the issues noted above and to secure still other advantages, the configurations here illustrated may be adapted to enable various data-retrieval methods suitable for use on a portable electronic device. As described hereinabove, one contemplated portable electronic device has a user interface and a database where a plurality of data items are indexed each to a corresponding index string. It will be understood, however, that the methods here described, and others fully within the scope of this disclosure, may be enabled via other configurations as well. The methods here described may be entered upon any time portable device 14 is operating, and may be executed repeatedly. Naturally, execution of any method may change the entry conditions for subsequent execution and thereby invoke a complex decision-making logic. Such logic is fully contemplated in this disclosure.
At 36 a new query string is received via the user interface of the portable device, or, an existing query string is augmented via the user interface. In one embodiment, the query string may be received or augmented by typographic character entry on keypad 20. In another embodiment, the query string may be received or augmented through translation of audible speech into text, as noted above. In another embodiment, the user interface may be configured to receive handwriting as a form of input. Using a stylus, for example, the user may write an initial part of a query string on a touch-sensitive display, and the computer system may translate the user's handwriting into text.
At 38 one or more index strings are displayed on the user interface such that the relative prominence of each index string displayed increases with increasing resemblance of that index string to the query string. For example, candidate index strings may be selected from the database and displayed in the form of a list. The index strings that better resemble the query string may be promoted to higher positions in the list. Likewise, the index string that most resembles the query string may be displayed in a larger or bolder typeface. In a more particular embodiment, display of the index strings may proceed as described below in the context of
In these and other embodiments, the user may have the option of typing a query string in its entirely or selecting from one or more index strings chosen from the database. At 40, therefore, it is determined whether the user has accepted any query string. The user may signal acceptance of a query string, for example, by pressing the enter key on keypad 20. If the query string is not accepted, then the method returns to 36. However, if the query string is accepted, then the method advances to 42. At 42 the desired data item is retrieved from the database based on the query. The result of data retrieval will vary according to the particular embodiment being enacted. In the case of navigation, for example, matching the query string to the desired street address (e.g., a destination address) may allow the portable device to begin searching for an advisable route. In the case of media play, matching the query string to the desired song title may allow the desired song to be played. From 42, method 32 returns.
No aspect of
At 44 a set of substrings is enumerated for each index string of the database. In one embodiment, the enumerated substrings may be fixed-length substrings—e.g., two-character or three-character substrings, each beginning at a different character position of the string. In one embodiment, the set of substrings may be enumerated as described below in the context of
Accordingly, if the database contain only two index strings—e.g.,
then the following three-character substrings may be enumerated:
At 46 an inverted index is compiled based on the set of substrings enumerated, and the method returns. The inverted index groups together all of the database entries that contain a given enumerated substring. For the example given above, a suitable inverted index based on the substrings would be:
From 46, method 34 returns.
In the next example, suppose that the string being processed in method 48 is an index string of a database that includes a music library. In its original form, the index string may be the complete title of a song in the library—e.g.,
After 50, this index string becomes:
At 52 a control character is prepended to the string. In one embodiment, the control character may comprise a carrot symbol “̂”. This control character, or another, may be used in subsequent processing to identify (viz., to left-delimit) the starting character of the string. In some cases, the starting character of an index string (song title, street address, etc.) will be remembered particularly as being the starting character. The starting character may be especially useful, therefore, in distinguishing one index string from another.
After 52, the index string in the current example becomes:
At 54 a set of fixed-length substrings of the string are enumerated. As noted above, the enumerated substrings may be fixed-length substrings—e.g., two-character or three-character substrings, each beginning at a different character position of the string. In one embodiment, where N is the length of the string, and M is the length of the fixed-length substrings, the set of substrings may include N−M+1 substrings. These substrings may begin at positions spanning the first N−M+1 characters in the string. For the current example, one possible set of enumerated substrings is:
From 54, method 48 returns.
At 62, the index strings found at 60 are ranked based on increasing resemblance to the query string. In particular, the rank of a given index string may be increased when a fixed-length substring of the query string occurs anywhere in the index string, regardless of position. However, because the starting character of the query string and of each index string are also specifically identified, the rank of an index string may also increase when a fixed-length substring of the query string starting at an initial character position of the query string occurs at an initial character position of the index string.
At this stage of the method, a suitable weighting algorithm may be used to rank the various index strings from the database. In one embodiment, a term frequency-inverse document frequency (TF-IDF) weighting approach may be used. Specifically, the rank may increase with the number of times that the fixed-length substring of the query string occurs in the index string, and decrease with the number of times that the fixed-length substring occurs in all index strings of the database. In another embodiment, a language model for information retrieval approach may be used. Other embodiments may invoke still other weighting/ranking algorithms. These algorithms help to determine how much each of the found substrings is ‘worth’ by correcting for the prevalence of the found substring in the database at large.
At 64 each of the index strings found is displayed on the user interface, with a relative prominence adjusted according to the ranking determined at 62. In one embodiment, the one or more index strings may be displayed in the form of a list with higher ranked index strings occupying higher positions on the list. In another embodiment, the highest-ranked index string may be rendered in a larger or bolder typeface. Thus, in view of the ranking described hereinabove, adjusting the relative prominence involves computing the resemblance of each of the one or more index strings to the query string and adjusting the prominence of the one or more index strings based on the resemblance computed. In this embodiment, the computed resemblance is increased with every fixed-length substring of the query string that occurs in the index string. From 64, method 56 returns.
It will be understood that some of the process steps described and/or illustrated herein may in some embodiments be omitted without departing from the scope of this disclosure. Likewise, the indicated sequence of the process steps may not always be required to achieve the intended results, but is provided for ease of illustration and description. One or more of the illustrated actions, functions, or operations may be performed repeatedly, depending on the particular strategy being used.
As noted above, the methods and functions described in this disclosure may be enacted via computer system 26, shown schematically in
Memory subsystem 30 may include one or more physical, non-transitory, devices configured to hold data and/or instructions executable by logic subsystem 28 to implement the methods and functions described herein. When such methods and functions are implemented, the state of the memory subsystem may be transformed (e.g., to hold different data). The memory subsystem may include removable media and/or built-in devices. The memory subsystem may include optical memory devices, semiconductor memory devices, and/or magnetic memory devices, among others. The memory subsystem may include devices with one or more of the following characteristics: volatile, nonvolatile, dynamic, static, read/write, read-only, random access, sequential access, location addressable, file addressable, and content addressable. In one embodiment, the logic subsystem and the memory subsystem may be integrated into one or more common devices, such as an application-specific integrated circuit (ASIC) or so-called system-on-a-chip. In another embodiment, the memory subsystem may include computer-system readable removable media, which may be used to store and/or transfer data and/or instructions executable to implement the herein-described methods and processes.
The terms “module” and “engine” may be used to describe an aspect of computer system 26 that is implemented to perform one or more particular functions. In some cases, such a module or engine may be instantiated via logic subsystem 28 executing instructions held by memory subsystem 30. It will be understood that different modules and/or engines may be instantiated from the same application, code block, object, routine, and/or function. Likewise, the same module and/or engine may be instantiated by different applications, code blocks, objects, routines, and/or functions in some cases.
Display 18 may be used to present a visual representation of data held by memory subsystem 30. As the herein-described methods and processes change the data held by the memory subsystem, and thus transform the state of the memory subsystem, the state of the display may likewise be transformed to visually represent changes in the underlying data. The display may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic subsystem 28 and/or memory subsystem 30 in a shared enclosure, or such display devices may be peripheral display devices.
Finally, it will be understood that the articles, systems, and methods described hereinabove are embodiments of this disclosure—non-limiting examples for which numerous variations and extensions are contemplated as well. Accordingly, this disclosure includes all novel and non-obvious combinations and sub-combinations of the articles, systems, and methods disclosed herein, as well as any and all equivalents thereof.