The present application relates to a method of conducting a DJ commentary analysis for indexing and search.
Just-in-time (JIT) disc jockey (DJ) snippet services such as vendor-branded Internet radio, JIT near-live DJ for Internet radio and on-demand DJ service for personal media players benefit from having contextual information and metadata to provide relevant media snippets or segments to listeners.
However, a method is needed whereby such information is automatically extracted so that the large volume of DJ commentary produced by broadcast radio stations or other sources can be tagged, stored, and searched.
According to one aspect, a method consistent with the present invention provides for automatically generating metadata related to commentary of media segments to enable tagging, storing and context relevant searching. Speech-to-text conversion technology and audio/video analysis are used to generate content and metadata. Subject matter is then identified and filtered to a predetermined set of subjects. Metadata tags and context profiles for the media segments are generated to index the media segments.
According to another aspect of the present invention, context information of the user is used to generate a context profile of the user in a format similar to that of the media segment. Indexed commentary media segments are searched to match with the user context profile and a relevant commentary media segment is presented to the user.
Thus, the present invention provides a method of generating metadata for disc jockey (DJ) commentary media segments to enable contextually relevant searches, the method comprising: generating data including using at least one of speech-to-text conversion or audio/video analysis; analyzing the generated data to extract subject matters; filtering the extracted subject matters such that they only refer to a pre-determined set of subjects; accepting any other contextual information; generating metadata tags for each of the media segments using the predetermined set of subjects referenced during the filtering step; generating a context profile for each media segment using the metadata tags and the other contextual information; and indexing the media segments using at least one of the metadata tags or the context profile.
The predetermined set of subjects of the filtering step includes at least one of: media content, artist or category, events and conditions, time, location, or opinions.
The method of the present invention may further comprise: receiving user context information, including time, location and interests; building a context profile from the received user context information in the same format as the metadata tag generating step; finding one or more commentary media segments by searching the index constructed by the metadata tag generating step using a profile of the extracted subject matters of the analyzing step; and identifying a most relevant commentary media segment by determining that the most relevant commentary media segment's profile most matches the profile of the analyzing step.
The present invention also contemplates a system and a computer readable medium comprising a program for instructing the system to perform the above-described operations.
There has thus been outlined, some features consistent with the present invention in order that the detailed description thereof that follows may be better understood, and in order that the present contribution to the art may be better appreciated. There are, of course, additional features consistent with the present invention that will be described below and which will form the subject matter of the claims appended hereto.
In this respect, before explaining at least one embodiment consistent with the present invention in detail, it is to be understood that the invention is not limited in its application to the details of construction and to the arrangements of the components set forth in the following description or illustrated in the drawings. Methods and apparatuses consistent with the present invention are capable of other embodiments and of being practiced and carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein, as well as the abstract included below, are for the purpose of description and should not be regarded as limiting.
As such, those skilled in the art will appreciate that the conception upon which this disclosure is based may readily be utilized as a basis for the designing of other structures, methods and systems for carrying out the several purposes of the present invention. It is important, therefore, that the claims be regarded as including such equivalent constructions insofar as they do not depart from the spirit and scope of the methods and apparatuses consistent with the present invention.
The accompanying drawing figures incorporated in and forming a part of this specification illustrate several aspects of the invention, and together with the description serve to explain the principles of the invention.
The embodiments set forth below represent the necessary information to enable those skilled in the art to practice the invention and illustrate the best mode of practicing the invention. Upon reading the following description in light of the accompanying drawing figures, those skilled in the art will understand the concepts of the invention and will recognize applications of these concepts not particularly addressed herein. It should be understood that these concepts and applications fall within the scope of the disclosure and the accompanying claims.
More specifically, DJ commentary is composed of audio snippets gathered from a number of sources, such as a specialized snippet providing service, audio archives of actual satellite and terrestrial radio stations, user-generated comments, text-to-speech of online textual commentary, etc.
With reference to
With reference to
In parallel to the tagging functions, unique identification (ID) 85 and (optionally) digital rights management (DRM) encryption keys 86 are generated for each snippet. As content is sent for tagging, it is also encrypted to the key as at 87. Finally, the snippet ID, key, and encrypted content are sent to a content packet generation function 90 and output to, for example, a holding buffer (not shown). Likewise, metadata, IDs, and keys are sent to a metadata packet generation function 91 and output to, for example, a repository (discussed in more detail below with respect to
The method of the present invention is related to providing server-side enablement for metadata and context extraction for DJ-provided commentary snippets as discussed above to enable indexing and search for Just-in-time (JIT) near live DJ for internet and search, JIT near live DJ for internet radio, and Protected Distribution and location based aggregation service. It is comparable to a Google-like search service, except that the method according to the present invention accepts textual as well as speech data and extracts only those keywords that are highly relevant for media commentary search. Since only those keywords are indexed, searches for snippets using those keywords, are fast and accurate.
With reference to the flow diagram of
More specifically, in step 101, the method receives DJ media snippets, along with any transcripts, identification data, content metadata and context information (such as the DJ's current location, current data and time, etc.). Depending on the source of the snippet, the transcript may be automatically available, or the snippet itself may originally be in text form (for example, if the source is textual commentary such as text-to-speech converter or server 20 in
In step 102, if transcripts are not available, speech-to-text conversion is performed on the media snippet, and also extracts any metadata from context and/or audio and/or video components of the snippet.
In step 103, techniques such as voice recognition and laughter detection are used to assign a “tone”.
In step 104, voice classification techniques are used to categorize the voice of the snippet, e.g., “gruff”, “soft”, etc.
In step 105, the data from steps 102-104 are analyzed, to extract subject matters, using: semantic analysis, keyword analysis, natural language processing, and other techniques known in the art.
In step 106, the method checks to determine if at least one of the subject matter covers at least one media content item, category or artist. It can even check references to such things using semantic analysis and an ontology.
In step 107, the method analyzes the remaining subject matters as well as the metadata and context information of step 101 to check for references to: events or conditions (“weather”, “traffic”, “rain”, “concert”), of: i) the present (look for keywords “today”, “now”, “right now” etc.), ii) the past (look for keywords “yesterday”, “last week”, “last year” etc.), iii) (anticipated) of the future (look for keywords “tomorrow”, “now”, “right now”), iv) a location (look for geographical keywords like “Raleigh”, “1-40”, “downtown” etc.).
In step 108, the method checks for references to certain subjects, such as politics, products, movies, or people, and classifies these remarks as good, bad or neutral, if possible.
In step 109, the method checks if the snippet is usable for future use by checking if, for example, it makes references to: a) events or conditions of the present, past or future (from step 107) that will not apply beyond a certain specified date, b) events or conditions of a location (from step 107) that will not apply to other locations.
In step 110, a profile is generated for the media snippet content, which could be an Extensible Markup Language (XML) or other indexable data structure describing the subject matters of the snippet and their contexts. The profile is built using information from steps 107-109, comprising: a) metadata about media content, category or artist from step 107, b) an “expiry date” if step 109a applies, c) “validity location” date if step 109b applies, d) other relevant metadata.
In step 111, it is determined if the snippet is too narrow in context to be used in other contexts using: a) heuristics, b) keyword filtering using pre-configured keywords (for instance, references to local celebrities), c) pre-configured rules that operate on the extracted metadata (for example, the snippet 20 talks about traffic at a given date and location), d) other inference techniques known in the art.
In step 112, all or part of the media snippet and the information of step 101 is indexed using the keywords or metadata in the profile of step 110 and then stored in a repository such as snippet database 32 as described in more detail below with respect to
Referring to
It receives the user's context information, which includes: a) location, b) time, c) interests and preferences, d) current activity, e) mood, etc. (see step 201).
In step 202, the service searches the index using one or more of the received items of information.
It identifies one or more media snippets based on the results of the index search, and ranks them if necessary (step 203).
It forwards the identified snippet (or one or more of the top ranked snippets) to the personal media player or client device (step 204).
A snippet search service 34 performs the search function depicted in
The DJ snippet and ad server 30 also includes a snippet request/response interface 38 which is an interface for receiving snippet queries from client devices 50 of users 40 and thereafter responding with results, typically over a WAN or LAN network. In an exemplary embodiment, the snippet request/response interface 38 may be, for example, an HTTP server.
In an example of the method according to the present invention, Bob the Blade talks about his Guns N' Roses concert experience: “I was there at November 2002 concert at Columbus Ohio. Axl Rose was great, the band flawless, the video presentation superb and the set list a definite crowd pleaser. Guitar solos usually bore me, but Buckethead treated the crowd to a variety of songs and musical styles, from funk to twangy banjosounding licks. Never have I heard a crowd sing along to a guitar solo—but tonight they did. Also showing a sense of humor, the solo went into ‘Old McDonald’ and the crowd responded with the E-I-E-I-O's. As the song played, Buckethead passed out things to the audience from two huge bags. It was like a twisted Santa moment.”
The snippet analysis and organization service analyzes the speech-to-text (or transcript of the commentary) and generates an exemplary snippet profile as shown below in XML format:
Listing 1. Example of a Snippet Profile in XML Format.
This profile is indexed using the metadata and contextual information and stored in a repository (e.g., snippet database 32).
At a later date, Joe (representing a user 40) is listening to music on his iPod® while driving to work. He has subscribed to the “On-Demand DJ Service”, and “November Rain” comes up. Such an “On demand DJ Service” provides DJ-like commentary for songs played locally on the user's 40 device. His device knows from his interests and past history that he is an avid concert-goer. He has also configured it to prefer male DJs because he feels they make the best Rock commentaries, and he prefers serious comments to sad attempts at humor. Hence, it prepares a snippet request with the media information, his context, preferences, as well as IDs of previously received snippets annotated with their play-through (or “success”) information ([Y]=played, [N]=skipped.) The exemplary request in XML format is shown below (note that “!=” means “not equal to”):
Listing 2. Example of a Snippet Request in XML Format.
The search service receives this request and searches the repository using the media keywords, user interests and user context against the indexed metadata and snippet contexts. The request of listing 2 hence returns the profile of listing 1, which is used to retrieve the appropriate snippet to forward to Joe.
In another example, people at the Wal-mart in Columbus, Ohio, get to subscribe to Wal-mart's “WM.FM” LBS Internet radio. The playlist strategy brings up “Civil War”. The near-real-time DJ service generates a request for all its users, and since it is for a large collection of users, personal preferences and interests are either not included, or aggregated to find the statistically most common interests. In the exemplary request XML of Listing 3, only the location context is included, since that is common for all users:
Listing 3. Example of a Snippet Request in XML Format.
Since the current user location is in Columbus, Ohio, it has a strong contextual relation to the snippet with the profile of Listing 1 via the user-context/location field in the request and the snippet-context/location reference field in the profile. Hence, that snippet is returned and inserted into the WM.FM radio stream.
The present invention has substantial opportunity for variation without departing from the spirit or scope of the present invention. For example, while the embodiments discussed herein are directed to DJ media snippet content profiles generated in XML format, the present invention is not limited thereto.
It should be emphasized that the above-described embodiments of the invention are merely possible examples of implementations set forth for a clear understanding of the principles of the invention. Variations and modifications may be made to the above-described embodiments of the invention without departing from the spirit and principles of the invention. All such modifications and variations are intended to be included herein within the scope of the invention and protected by the following claims.