1. Field of the Invention
The present invention relates generally to a personalized arrangement method of data, and more particularly, to a personalized ranking method of audio and video data on Internet.
2. Description of the Related Art
US Patent No. 2010/0138413A1 disclosed a system and method for personalized search, which include a search engine that receives an input from a user, processes a user identification and generates a search result based on the input; and a profiling engine to gather profile data, generate a user profile associated with a user, and rank the search result personalized to the specific user using the user profile.
European Patent No. 1647903A1 disclosed systems and methods that employ user models to personalize queries and/or search results according to information that is relevant to respective user characteristics. The user model may be assembled automatically via an analysis of a user's behavior and other features, such as the user's past events, previous search history, and interactions with the system. Additionally, the user's address or e-mail address can come up with the city where the user is located. For example, when the user looks for “weather,” the information about the weather in the city where the user is located can be automatically found.
Taiwan Patent No. 579478 disclosed that the users' Internet behaviors were recorded and statistically processed via a variety of Internet services where the users' frequencies of utilization, semantic correlation, and satisfaction with the services were compared and analyzed, and then the result of the analyses were employed to recommend which Internet services were applicable for the users.
U.S. Pat. No. 7,620,964 disclosed a recommended television (TV) or broadcast program search device and method, which record the user's viewing programs and viewing time to recommend the user's favorite programs and channels. In this patent, the recommendation refers to the program types and viewing time, and the viewing history information will be erased while a period of time passes.
Taiwan Patent No. 446933 disclosed a device capable of analyzing voice for identifying emotion and this device could be applied to multimedia applications, especially in lie detection.
However, none of the above devices or methods is directed to searching video and audio data on Internet and arranging or ranking the data according to the user's personal preference after they are downloaded.
The primary objective of the present invention is to provide a personalized ranking method, which can rank the audio and video data located in and downloaded from Internet according to the user's preference to meet the user requirement.
The foregoing objective of the present invention is attained by the personalized ranking method having the steps of a) locating and downloading video and audio data corresponding to at least one key word selected by the user on Internet; b) getting a user index from the user's input or picking a history behavior index if the user does not input the user index where the user index and the history behavior index indicate one of the user activity preference, audio emotion type, and video content type or a combination thereof; c) capturing one of or more characteristics from the aforesaid downloaded audio and/or video data according to the user index or the history behavior index; d) comparing the captured characteristics with a user profile or a history behavior for similarity to attain a similarity score corresponding to each audio and/or video datum where the similarity score is one of the user activity preference, audio emotion type, and video content type or a combination thereof; and e) ranking the audio and/or video data according to the corresponding similarity scores to accomplish a ranking outcome of the audio and/or video data.
Referring to
a) Enter at least one keyword selected by the user via an Internet-accessible device to locate corresponding audio and/or video data on specific websites through Internet and then download the corresponding audio and/or video data into the Internet-accessible device. The Internet-accessible device can be but not limited to a computer, a smart phone, and an Internet television (TV); in this embodiment, it is a computer as an example. Besides, each audio and video datum has a metadata, including category, tags, keywords, duration, rating, favorite count, view count, publishing date and so on.
b) Obtain a user index from the user's input or pick a history behavior index if the user does not decide the user index. Each of the user index and the history behavior index indicates one of the user activity preference, audio emotion type and video content type or a combination thereof.
c) Capture one of characteristics from the aforesaid downloaded audio and/or video data subject to the user index or the history behavior index via a computing device. If the user index is the user activity preference, the captured characteristic is a metadata tag of each audio and/or video datum. The user activity preference contains the history of keywords, frequency and time that the user has listened to and/or watched this kind of audio and/or video. If the user index is the audio emotion type, the captured characteristic is an emotion type corresponding to the audio part of each audio and video datum. If the user index is the video content characteristics corresponding to the type, the captured characteristics are the movement and brightness of video part of each audio and video data. Please refer to
In this step, the audio characteristic capturing and the audio emotion-type identification include sub-steps of audio preprocessing, characteristic capturing and sorter classification, referring to
Besides, referring to
For example, in the process of capturing the characteristic from the movement and brightness corresponding to the video part of audio and video data, the movement and brightness are categorized into four classes “swift/bright”, “swift/dark”, “slow/bright”, and “slow/dark” and scored 0-100. The swift/bright class indicates highly moving and bright video data and the slow/dark class indicates lowly moving and dark video data. According to such classification, the movement and bright degrees of each audio and video datum can be acquired.
d) Compare the captured characteristics of each audio and/or video datum with a user profile or a history behavior for similarity, as shown in
The aforesaid similarity analysis can employ a cosine similarity method to figure out an audio emotion score SIMemotion in a film via the audio emotion type identification of the audio part and the emotion type value in the user index indicated in the step b). The formula is presented as follows:
where S=(s1, . . . , s8), which is a vector composed of initial scores of eight emotion categories; si is an emotion type value in the user index or the history behavior index. In the audio emotion analysis, audio and video data of a film can be analyzed to come up with the ratio of eight emotion types where the result of the analysis is presented by a vector E. E=(e1, . . . , e8) indicates the vector of coverage ratios of the following eight emotion types after the audio emotion is analyzed. ei indicates the coverage ratio of the emotion i of the audio and video data of one film. An example is indicated in the following Table 1.
If the emotion type value in the user index or the history behavior index is set “calm”, it (s4) will score 8 (the highest score) and the similar emotion types, like “surprised” and “sad”, will score 7 (the second highest score) each. The other emotion types follow the same rule, so the initial score vector of eight emotion types is presented by S=(5, 6, 7, 8, 7, 6, 5, 4). Another example is indicated in the following Table 2.
If the emotion type value in the user profile is set “excited”, it (s1) will score 8 (the highest score) and the adjacent emotion type “happy” will score 7 (the second highest score). The other emotion types follow the same rule, so the initial score vector of eight emotion types is presented by S=(8, 7, 6, 5, 4, 3, 2, 1). The audio part of audio and video data of a film is then processed by the audio emotion analysis to come up with the vector (E). Provided that the audio part is analyzed, the ratios of the eight emotions are 10%, 30%, 10%, 20%, 10%, 5%, 10%, and 5% separately, so the vector of the ratios of the audio emotion can be inferred as E=(0.1, 0.3, 0.1, 0.2, 0.1, 0.05, 0.1, 0.05) and finally an audio emotion score of audio and video data can be figured out via the aforesaid formula.
e) Rank the audio and video data according to the corresponding similarity scores separately via a computing device to get a ranking outcome of the audio and video data. The ranking can be based on either one of the three kinds of similarity scores (i.e. tags, audio, and video) indicated in the step d) or multiple similarity scores. When the ranking is based on the multiple similarity scores, weight allocation can be applied to any one or multiple of the three kinds of similarity scores according to an operator's configuration.
As indicated above, in the first embodiment, the present invention includes the steps of downloading a plurality of audio and video data after at least one keyword is defined by the user on Internet, capturing the characteristic from each of the aforesaid downloaded audio and/or video data to obtain the information such as metadata tags, emotion types, and brightness and movement of each audio and video datum, further comparing the aforesaid information with the user profile via the Internet-accessible device (e.g. computer) to get the similarity scores based on the user's preference, and finally ranking the audio and video data according to the user's preference to get the sorting of the audio and video data according to the user's preference.
In this embodiment, keywords, metadata tags, emotion types, and movement and brightness are acted as conditions for comparison to get a ranking outcome; however, if the movement and brightness are not taken into account and only the audio emotion type and the tag are used for comparison and ranking, an outcome in conformity with the user's preference can also be concluded. The comparison based on the movement and brightness in addition to the other aforesaid conditions can come up with more accurate outcome for audio and video data. In other words, the present invention is not limited to the addition of the movement and brightness of the video part.
In addition, in this embodiment, only the metadata tags, or the emotion types, or the movement and brightness are used in coordination with keywords as the conditions for comparison to come up with a ranking outcome, which can also conform to the user's preference. Although such outcome is worse than what all of the three conditions are used for comparison, it is still based on the user's preference.
A personalized ranking method of audio and/or video data on Internet in accordance with a second preferred embodiment is similar to that of the first embodiment, having the difference recited below.
A sub-step d1) is included between the steps d) and e) and weight ranking method or hierarchy ranking method can be applied to this sub-step d1).
When the weight ranking method is applied, the similarity scores corresponding to the tag, the audio emotion type, and/or the movement and brightness of the video part can be processed by a combination operation to get a synthetic value. In the step e), all of the audio and/or video data can be ranked subject to the synthetic values instead of the corresponding similarity scores.
When the weight ranking method is applied, for example, provided K films are intended to be ranked, the film A is ranked A1 in the sequence based on the metadata tags combined with the emotion type, its video movement and brightness are ranked A2, and the weight values of such two ranking methods are R1 and R2 separately, so the final ranking of the film A is Ta=A1×R1+A2×R2 and the final ranking values for K films will be Ta, Tb . . . Tk. As the final ranking value of the film is less, that film will be firstly recommended.
An example is indicated in the following Table 3. Three currently available films A, B & C are listed for the ranking. The rankings based on the metadata tags in combination with the emotion types for the three films A-C are 1, 2, and 3 separately. The rankings based on the movement and brightness for the three films A-C are 2, 1, and 3 separately. What each of the rankings based on the metadata tags in combination with the emotion types times a weight 0.7 and what each of the rankings based on the movement and brightness times a weight 0.3. Then, adding these two yields a final value. The film with a smaller value is ranked prior to that with a larger value, so the final rankings for the three films are still 1, 2 & 3. The weighted rankings for multiple films can follow such a concept to get final rankings.
When the hierarchy ranking method is applied, the user index is categorized into three levels—(1) emotion type of audio part, (2) metadata tag, and (3) movement and brightness of video part, and then the recommended films are ranked based on such levels of the user index. Provided K films are listed for ranking, in the first level of emotion type, the K films will be classified into two groups “conformable to what the user selects or previously used emotion” and “not conformable to what the user selects or previously used emotion”. The group “conformable to what the user selects or previously used emotion” needs to be ranked in front of the other group “not conformable to what the user selects or previously used emotion”. In the second level of tag classification, the films are ranked subject to how high/low the scores of the tags are; the films with high scores are ranked high. In the process of the second-level classification, when the tags score the same, proceed to the third-level comparison. In the third-level classification of the movement and brightness of the video part, apply one more ranking to the films with the tags of the same score according to the user's preference for the movement and brightness of the video part. If the scores of the movement and brightness of the video part conform to the user's preference, the films will be prioritized.
In conclusion, the present invention can rank the audio and/or video data located in and downloaded from Internet according to the user's preference to meet the user requirement.
Although the present invention has been described with respect to two specific preferred embodiments thereof, it is in no way limited to the specifics of the illustrated structures but changes and modifications may be made within the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
100127105 | Jul 2011 | TW | national |