The present invention relates to an aural similarity measuring system and method for text and to a software product for measuring aural similarity of texts. The present invention is particularly suited for, but is not limited to, use in the assessment of trademark similarity.
The general function of trademarks is to distinguish a person's or an organisation's products or services from those of other people and other companies or organisations in order to engender customer loyalty. It is important therefore that a trademark is capable of being recognised by customers and of not being confused with other trademarks. When a person, company or other organisation is deciding upon a new trademark, it is usual for searches to be conducted to check whether their preferred new trademark is not identical to or confusingly similar to an existing trademark. Such searches usually involve checks with individuals familiar with the relevant industry, through relevant trade journals to identify trademarks in use in that industry, as well as checks through national trademark registers. In addition, where an application is filed to officially register a new trademark with a registration authority, many registration authorities conduct searches through their own registers to identify earlier registrations or pending applications of trademarks identical or similar to the new trademark. When considering the potential for confusion between two trademarks, not only must visual similarity be considered but also conceptual and aural similarity.
In the past, searches through the official trademark registers have been carried out manually. In view of the vast number of registered trademarks such manual searches are therefore time consuming and are also potentially unreliable. A person manually searching through such a plethora of registered trademarks is liable to overlook a potentially similar trademark. Such a failure can prove extremely costly where a person or organisation in the process of adopting a new trademark is forced to abandon their new trademark and to destroy all packaging or other material bearing that new trademark because of a, previously unidentified, conflict with a similar prior-existing trademark.
Because of the problems inherent in manual searching, attempts have been made to computerise the searching of trademark data on official trademark registers. Whilst such searching software is effective in the identification of identical trademarks, the identification of similar marks remains semi-manual. To identify similar trademarks a user of the searching software is required to identify permutations of the trademark being searched so that identical searches can be performed in respect of these permutations and to identify key, distinctive elements of a trademark, e.g. its suffix or prefix, for which identical searches are then performed for those elements, irrespective of any other elements that might be present.
The semi-manual nature of searches for confusingly similar trademarks means that such searches remain prone to error. Also, the decision on whether or not two trademarks are similar remains the decision of the user and, as such, is subjective.
So-called fuzzy matching program techniques are known for determining the similarity of two objects automatically, for example in DNA sequence matching, in spell checker ‘suggested correction’ generation and in directory enquiries database searches. Such techniques have not, though, been employed in automated trademark searching.
Conventional fuzzy matching techniques fall broadly into two categories, which might be called ‘edit distance methods’ and ‘mapping methods’. In the case of edit distance methods, the similarity of two words A and B are measured by answering a question along the lines of “what is the minimum number of key strokes it would take to edit word A into word B using a word processor?” The Levenshtein distance is the most popular of these measures. Edit distance methods are essentially a measure of visual similarity and are not directly suitable for measuring aural similarity. They also lack flexibility and are not very discriminating.
Mapping methods work by assigning a key value to each possible word. However, there are many times fewer different keys than different words, and so several words are mapped onto each key. The mapping is designed so that similar-sounding words receive identical keys, and so a direct look-up from the key is possible. Popular mapping methods include Soundex™, Metaphone™ and Double Metaphone™. It is possible to imagine the space of words divided into regions, where each region contains all the words mapped to a given key: mapping methods work poorly with words near to the edge of a region of this space as their similarity to nearby words that happen to be in an adjacent region is not recognised. They have the further disadvantage of simply providing a yes-no answer to the similarity question rather than assigning a value representative of similarity. Mapping methods are also unsuitable for matching substrings.
The present invention seeks to overcome the disadvantages in the trademark searching procedures described above and thus seeks to provide a trademark searching system and a trademark searching software product which automates assessment of aural and/or visual similarity between trademarks.
Moreover, with the present invention substrings (where the sound of one mark is contained entirely within that of another) can be more readily matched, which is an important aspect of trademark similarity assessment.
Also, the present invention seeks to identify permutations of a trademark to be searched.
Furthermore, the present invention seeks to provide a trademark searching system and method which provides comparison between its own results and those generated through alternative search strategies and methodologies.
The present invention therefore provides an aural similarity measuring system for measuring the aural similarity of texts comprising: a text input interface; a reference text source; an output interface; and a processor adapted to convert the input text into a string of phonemes, to adjust the phoneme string of the input text and/or a phoneme string of a reference text so that the two phoneme strings are equal in length, and to assign a score to the reference text representative of the similarity of the two phoneme strings, which score is output via the output interface.
Preferably the system includes a data store in which is stored a plurality of reference texts. More preferably, the data store may further contain a plurality of phoneme strings each string being associated with a reference text.
In a preferred embodiment the processor is further adapted to select one or more reference texts from the plurality of reference texts for outputting via the output interface, the selection being based on the score assigned to each reference text.
Also, ideally the processor is adapted to determine all possible adjustments of one or both of the input text phoneme string and the reference text phoneme string and is adapted to identify the score representing the highest measure of similarity with respect to all of the possible phoneme string adjustments.
In the preferred embodiment, the processor is adapted to adjust the phoneme strings of either or both of the input text and/or the reference text by inserting gaps into the phoneme strings. Furthermore, the processor may be adapted to identify aligned phonemes which differ; to allocate predetermined phoneme scores for each pair of differing aligned phonemes; and to summing the individual phoneme scores to thereby assign a score to the reference text.
With the preferred embodiment, the processor may be adapted to weight the phoneme scores in dependence upon the position of the pair of phonemes in the phoneme strings. Also, the processor may be adapted to weight the phoneme scores such that phoneme scores arising from partial text predetermined as less relevant which is present in the input text are lower than equivalent phoneme scores arising from other partial text in the input text and the processor may be adapted to allocate a higher phoneme score to a grouping of non-adjusted aligned phonemes than to an equivalent grouping of aligned phonemes which have been adjusted.
In a preferred embodiment the processor may be adapted to weight the phoneme scores on the basis of user indicated descriptiveness of a word of which a phoneme forms at least a part or on the basis of automated identification of descriptiveness of a word of which the phoneme forms at least a part.
In this way those phonemes which form at least a part of words of the input text that a user has indicated are wholly descriptive are assigned a lower weight than equivalent phoneme scores for phonemes which form at least a part of words of the input text that the user has indicated are non-descriptive. Similarly, those phonemes arising from words of the input text and/or the reference text that the processor determines to be wholly descriptive are assigned a lower weight than equivalent phoneme scores arising from words of the input text that the processor has determined are non-descriptive.
The processor may be adapted to determine which words of the input text and/or the reference text are descriptive by reference to the frequency with which a word occurs in ordinary language.
In a second aspect the present invention provides a trademark searching system comprising an aural similarity measuring system as described above, wherein the reference text source comprises a trademark data source and the processor is adapted to generate a similarity score with respect to the aural similarity between an input trademark and at least one reference trademark from the trademark data source.
The processor may be adapted to determine which words of the input trademark and/or which words of the trademark data source are descriptive, by reference to the frequency with which words occur in ordinary language. Separately or in combination with the above, the processor may be adapted to determine which words of the input trademark and/or which words of the trademark data source are descriptive, by reference to the frequency with which words occur in registered trademarks in general and/or by reference to the frequency with which words occur in registered trademarks associated with goods or services identical or similar to those with which the input text is associated. Insofar as automatically determining which words are descriptive is concerned, the processor may be adapted to determine which words of the input trademark and/or which words of the trademark data source are descriptive by reference to the number of unrelated proprietors owning registered trademarks containing the word.
In a third aspect the present invention provides an aural similarity measuring server comprising: an input/output interface adapted for communication with one or more remote user terminals and further adapted to receive an input text and to output one or more reference texts each associated with a similarity score; a data store in which is stored a plurality of reference texts; and a processor adapted to convert an input text into a string of phonemes, to adjust the phoneme string of the input text and/or a phoneme string of a reference text so that the two phoneme strings are equal in length, and to assign a score to the reference text representative of the similarity of the two phoneme strings, which score is output via the input/output interface.
In a fourth aspect the present invention provides a trademark searching server comprising an aural similarity server as described above, wherein a plurality of reference trademarks are stored in the data store and the processor is adapted to generate a similarity score with respect to the aural similarity between an input trademark and at least one reference trademark from the data store.
In a fifth aspect the present invention provides an aural similarity measuring software program product comprising program instructions for performing the following steps: a receiving step in which an input text for which an aural similarity score is required is received; a conversion step in which the input text is converted into a string of phonemes; an adjustment step in which the phoneme string for the input text and/or a phoneme string associated with a reference text is adjusted so that the two phoneme strings are equal in length; a ranking step in which the similarity of the two phoneme strings is assigned a score; and an output step in which the reference text and the its ranking is output to the user.
Ideally, the adjustment step and the ranking step are repeated for a plurality of reference texts.
In a preferred embodiment the program product further comprises program instructions for a selection step in advance of the output step in which one or more reference texts are selected from the plurality of reference texts for outputting, the selection being based on the ranking assigned to each reference text.
The adjustment step may comprise determining all possible adjustments of one or both of the input text phoneme string and the reference text phoneme string and the ranking step identifies the lowest similarity score, which represents the best similarity, with respect to all of the possible phoneme string adjustments.
Preferably, the adjustment step comprises adding one or more gaps in the phoneme string and the one or more gaps made may be added to the beginning or end of a phoneme string.
With the preferred embodiment the ranking step comprises identifying aligned phonemes which differ; allocating predetermined phoneme scores for each pair of differing aligned phonemes and summing the individual phoneme scores. Also, the phoneme scores may be weighted in dependence upon the position of the pair of phonemes in the phoneme strings and/or the phoneme scores may be weighted such that phoneme scores arising from partial text predetermined as less relevant in the input text are lower than equivalent phoneme scores arising from other partial text in the input text. Furthermore, a higher phoneme score may be allocated to a grouping of non-adjusted aligned phonemes than to an equivalent grouping of aligned phonemes which have been adjusted.
The phoneme scores may be weighted on the basis of user indicated descriptiveness of a word of which a phoneme forms at least a part or on the basis of automated identification of descriptiveness of a word of which the phoneme forms at least a part.
Which words of the input text and/or which words of the reference text are descriptive may be determined by reference to the frequency with which words occur in ordinary language. Similarly, in the case where the input text is a trademark, which words of the input trademark and/or which words of the trademark data source are descriptive may be determined by reference to the frequency with which words occur in ordinary language. Alternatively, or in combination with the former, the descriptiveness of words of the input trademark and/or the descriptiveness of words of the trademark data source may be determined by reference to the frequency with which words occur in registered trademarks in general and/or by reference to the frequency with which words occur in registered trademarks associated with goods or services identical or similar to those goods or services with which the input trademark is associated. Insofar as automatically determining which words are descriptive is concerned, which words of the input trademark and/or which words of the trademark data source are descriptive may be determined by reference to the number of distinct or unrelated proprietors owning registered trademarks that contain the word.
In a sixth aspect the present invention provides an aural similarity measuring method for measuring the aural similarity of texts, the method comprising the steps of: receiving an input text for which an aural similarity score is required; converting the input text into a string of phonemes; adjusting the phoneme string for the input text and/or a phoneme string associated with a reference text so that the two phoneme strings are equal in length; ranking the similarity of the two phoneme strings to assign a score; and outputting the reference text and the its ranking to the user.
In a seventh aspect the present invention provides a trademark searching method for measuring the aural similarity of trademarks, the method comprising the steps of: receiving an input trademark for which an aural similarity score is required; converting the input trademark into a string of phonemes; adjusting the phoneme string for the input trademark and/or a phoneme string associated with a reference trademark so that the two phoneme strings are equal in length; ranking the similarity of the two phoneme strings to assign a score; repeating the adjusting and ranking steps for further reference trademarks; and outputting the reference trademarks and their associated rankings to the user.
Thus, with the present invention the aural similarity measurement of texts and the searching of aurally similar trademarks is wholly automated which significantly reduces the risk of errors and omissions and also provides an objective assessment of similarity. The present invention is also adapted to say whether one trademark is more similar than another and thus enables similar trademarks to be ranked. This has the further benefit of helping the user weigh the relative merits of, for example, a distant match with a trademark associated with the same goods or services and a close match with a trademark associated with different goods or services.
The present invention is also suitable for use in the training of individuals in the performance of trademark searching. In this case an individual may perform semi-manual trademark searching in which they program the searching strategy and select similar trademarks found through the performance of their searching criteria. The results are then compared against those generated using the system and method of the present invention for the purposes of identifying errors or omissions in the semi-manual trademark searching strategy. In a similar way, the system and method of the present invention may be used to provide a ‘second opinion’ of the search results produced using semi-manual searching strategies.
Exemplary embodiments of the present invention will now be described by way of example with reference to the accompanying drawings, in which:
The trademark searching system 1 illustrated in
An overall block diagram of the trademark search system for performing aural similarity searching is shown in
In order to perform the comparison of the target with a reference trademark the following sequence of steps is performed. Both the target and each reference are converted 15 from a plain textual form into a phonetic form by means of a conversion unit. Although two conversion units are illustrated in
Thus, the similarity search generally comprises an inputting step in which the trademark to be searched (the target) is input into the system; a conversion step in which the target trademark is converted into a string of phonemes; an alignment step in which the phoneme string for the target trademark is aligned with a plurality of phoneme strings associated with a respective plurality of reference trademarks; a ranking step in which the similarity of the aligned phoneme strings are assigned a score; and an output step in which the reference trademarks and their assigned similarity scores are output to the user.
The text-to-phoneme conversion 15 is illustrated in more detail in
The word-to-phoneme conversion 18 is illustrated in
The choice between the two alternative phonetic representations of a word is made as follows. If the word consists of a single letter or contains digits it is spelt out. (This case is omitted from
An alternative, but less desirable, method for determining whether a word to be converted is a pronounceable word or a series of individually pronounced letters, bases the decision on whether the word to be converted is present in a dictionary. This method is less desirable because a very large number of trademarks include, for example, proper names which might not be covered by a dictionary and also made-up, but nevertheless pronounceable, words.
Returning to
The two trademarks being examined are ‘stripe’ and ‘trumps’. As strings of phonemes these might be represented respectively as ‘/s/ /t/ /r/ /ai/ /p/’ (five phonemes in total) and ‘/t/ /r/ /uh/ /m/ /p/ /s/’ (six phonemes in total). To make these the same length by inserting gaps into them the aligner 16 needs to insert one more gap into the first string than into the second. For example, it could insert a gap at the start of the first string and leave the second string alone: this alignment is illustrated in
Ideally, all possible alignment permutations between the target string of phonemes and the reference string of phonemes are considered. This generates a plurality of sets of aligned pairs of phoneme strings with the alignment within each set being different. Each set of phoneme strings is then assigned a score and the lowest score (representing the highest possible similarity) of all of the sets of phoneme strings is then allocated to the reference trademark as a similarity ranking.
As mentioned above, each set of aligned strings is input into the comparator 12 (see
The phoneme-by-phoneme difference element is calculated as the sum of difference values between phonemes in corresponding positions in the two aligned strings. Two phonemes have a difference value of zero if they are identical; otherwise the difference value is a small offset plus a combination of individual phonetic feature difference values. These phonetic feature differences include whether the phoneme is a consonantal or vowel sound, whether the sound is voiced or not, and the position in the mouth where the sound is made. Where a phoneme in one string is aligned with a gap in the other, the contribution to the score is based on the features of that phoneme in a similar way. The example in
As mentioned above, the gap positions chosen by the aligner 16 can contribute to the score. The exact contribution depends on the relative and absolute positions of the gaps. Gaps inserted at the beginning or end of either string are given a smaller difference value than normal: the effect of this is to reduce the total difference score when one string is a substring or similar to a substring of the other. Gaps inserted between consecutive pairs of phonemes incur a greater difference score than normal: the effect of this is to reduce the total difference score when there are consecutive runs of matching or similar phonemes in the two strings. The gap positions shown in
As an aid in understanding the scoring of phoneme alignment, Tables 1 and 2 below set out an example of the scoring respectively for each of the two alignments illustrated in
The similarity ranking for the phoneme alignment of
Thus, the similarity ranking for the phoneme alignment illustrated in
In practice a large number of different alignments between the same set of two phoneme strings is analysed and only the one with the best (i.e., lowest) similarity ranking is retained.
In some cases the efficiency of the alignment and scoring process can be improved using a conventional algorithm known as ‘dynamic programming’. This algorithm, optionally, may be used to examine all possible alignments of the target and reference strings in an efficient manner.
The scoring rules described above can be modified so that scores derived from parts of the alignment nearer to the beginning of the strings are amplified and scores derived from later parts of the alignment attenuated. The effect of this is to bias the similarity ranking to favour (other things being equal) those matches whose initial parts are similar over those whose final parts are similar. This is in accordance with how the similarity of trademarks is judged manually and leads to more accurate results.
The scoring rules described above can be modified to enable the user of the searching system to identify parts of a target trademark which are more significant than others. This indication is preserved by the text-to-phoneme units 15 so that some of the phonemes in the target phoneme string are marked as significant. The scores derived from parts of the alignment involving these more significant phonemes are amplified. The effect of this is to bias the scoring to favour (other things being equal) those strings where there is an aural match in the parts indicated as more significant. The system can therefore emulate more accurately the manual process of judging the similarity of trademarks, where generic parts of a trademark e.g. “company” or wholly descriptive words are normally given less weight.
Which parts of a trademark are generic or descriptive and which are not may be indicated manually to the system by the user. Alternatively, it is possible for the system to make an automatic determination. Preferably, automatic determination of descriptiveness is applied by the searching system to both the input trademark and the reference trademark.
In the case of automatic determination of descriptiveness, the descriptiveness of a given word is calculated as a simple combination of its frequency of occurrence in ordinary language and the number of distinct or unrelated proprietors holding already-registered trademarks incorporating the given word for identical and/or similar goods or services e.g. in the same or related classifications of goods or services, biased towards the latter. This ensures that the system can determine, for example, that “ALE” is a descriptive word in trademarks relating to alcoholic beverages in addition to the determination that certain other everyday words offer little distinctiveness in a trademark irrespective of the goods or services involved. The approach of counting distinct or unrelated proprietors ensures that the searching system is not unduly biased by the existence of a single trademark proprietor holding a large number of registrations including a distinctive brand name or ‘house’ name.
The scored results are then sorted into increasing order of score, in the unit marked ‘sort’ in
Although the ranking is described in relation to a low score representing high similarity, it is possible for the reciprocal of the score to be determined in which case a low score will represent a low degree of similarity.
Registered trademarks are assigned to one or more classes which are representative of the specific businesses in relation to which the registered trademark is intended to be used. Where the trademark class (or classes) of the target trademark is known, it is possible to divide the search results into three groups: those reference marks registered in the same trademark class as the (or a) class of the target trademark; reference marks registered in classes related to the (or a) class of the target trademark, i.e., those classes in which a cross-search (pre-defined associations between classes) would be triggered; and reference marks in other classes.
It can be seen that the searching system described above can readily be adapted to analyse the visual similarity of two marks by omitting the text-to-phoneme conversion units and treating letters of the alphabet as if they were phonemes in the subsequent components of the system.
As already noted, in many instances the identification of similar marks remains semi-manual. To identify similar trademarks a user of the searching software may wish to devise their own searching strategy in which they identify permutations of a trademark to be searched and to then perform identical searches in respect of each permutation. Also the user may choose to identify key, distinctive elements of a trademark, e.g. its suffix or prefix, for which an identical search is then performed for those elements, irrespective of any other elements that might be present. This semi-manual search for confusingly similar trademarks means that such searches remain prone to error, and not all potentially similar trademarks may be identified.
For a variety of reasons some users will wish to continue to perform semi-manual searches. However, the trademark searching system described herein may be combined with a series of semi-manual identical searches and thus may be used to identify similar trademarks which fail to be identified in the semi-manual searches: in effect the trademark similarity system and method described herein can be employed as a back-up or training service to more conventional semi-manual searching. In this regard, the results of the trademark similarity search may be combined with the results from one or more semi-manual identical searches to identify omissions from either set of results.
The results obtained for each semi-manual identical search and for the automated similarity search may be stored in individual data stores. This enables the results to be combined automatically or reserved for combining at the user's request. In the latter case the contents of the individual data stores may be compared and where the similarity search results identify trademarks not to be found in the search results of the semi-manual searches, the user may be informed that additional results are available for combining with the original semi-manual search results, if desired.
Although in
The specific example given above of a trademark searching system contains details which are not essential to the present invention and which may be altered and adjusted where necessary. In particular, to aid understanding the searching method has been described in relation to functional units. In practice, such functional units are preferably implemented in a software program product or alternatively in an ASIC. The scope of the present invention is defined solely in the accompanying claims.
Number | Date | Country | Kind |
---|---|---|---|
0704772.3 | Mar 2007 | GB | national |
This application is a continuation in part of and claims priority from U.S. patent application Ser. No. 12/042,690 which in turn claims priority from United Kingdom Patent Application Serial No. 0704772.3, filed 12 Mar. 2007, inventor Mark Owen, entitled “Aural Similarity Measuring System For Text”, the contents of which are incorporated herein by reference, and with priority claimed for all commonly disclosed subject matter.
Number | Date | Country | |
---|---|---|---|
Parent | 12042690 | Mar 2008 | US |
Child | 12537498 | US |