1. Field of the Invention
This invention relates to a distributed digital media metering and reporting system; the system meters the use of digital media files, such as digital music tracks.
2. Description of the Prior Art
On-line, web-based music store, such as iTunes, have become the dominant mechanism for consumers to obtain media files, such as music and video tracks. But these typically work on a pay per track downloaded basis—i.e. you pay to download the music track or video file, but can then listen/view as often as you like; in particular, there is no feedback to the computer-based infrastructure (e.g. servers) supplying the media files of any metering or measurement data relating to whether the tracks you have downloaded are in fact played back, or the extent to any such playback.
The present invention is a method of metering the use of digital media files, comprising the steps of:
In an implementation of the present invention, we monitor actual plays of a media file that last beyond a certain extent and automatically share that information with the computer-based infrastructure that supplied the media file. This is better because it gives much richer data concerning the files that are really of interest to the listening public and hence can allow the technical infrastructure that provides the downloads (e.g. handling and delivery of media files) to be optimised.
For example, top 20 charts are normally based on downloads. Say two different tracks are both downloaded 10,000 times in a week. Both would be given the same position in a weekly chart. But say one track was played twice as often as the other—we can measure that and hence support that track with more technical resources, such as greater server capacity and higher priority downloads so that later consumers of that track get a better download experience and the technical resources of the music download infrastructure and utilised more efficiently.
Tracks that are downloaded soon after release but played very heavily by the first wave of listeners are also more likely to be hits than those that are not played so heavily; more technical resources can then be made available for those potential hit tracks—for example, more server capacity, prioritised downloading, more prominence in on-line music sites visited by potential listeners etc. The ‘track play’ information can also be used for accounting and reporting purposes at the infra-structure side.
The metering data can be used to automatically:
The predefined extent of the playback can be configurable; it can be sufficiently long to distinguish a user playing a track from a user skipping past tracks.
Another aspect of the invention is a system for metering the use of digital media files, including:
In the preferred embodiment, there is a content ingestion engine that includes a highly scalable and adaptable content ingestion services framework. The ingestion services framework supports a full double-byte character set throughout and can ingest and prepare content for any part of the world in any character set including APAC territories. Content is ingested directly from the digital catalogues of the four major labels, the world's largest Indies and from major music content aggregators.
The enterprise-class content ingestion service framework enables the rapid integration of new content sources and quickly facilitates service deployment in new territories. The framework supports the rapid visual and programmatic building of new ingestion connections dealing with multiple transport mechanisms, handshakes and metadata formats. Automatic verification, validation and loading of content and metadata is supported, along with integration into third-party content metadata sources (e.g. Muze™, AGM™, Gracenote™) for value added validation and verification.
In-built process monitoring is supported, to ensure correct operations and completion of scheduled task cycles, while integrated monitoring and alerting exception conditions are provided for high process visibility and response.
There are many challenges in the area of content ingestion and consolidation, such as:
An implementation of the present invention resolves all of these issues via a sophisticated suite of data cleansing tools and human supported processes.
After the cleansing and consolidation of music catalogues from multiple sources, the content files themselves need preparation and management so that the content provided by a service is compatible with and relevant to the plethora of devices which will access it.
Delivery of services to multiple devices on multiple platforms requires the content to be available in many formats, such as AAC+, eAAC+, WMA and MP3, in assorted bitrates, as required for a specific device or territory or as a result of a particular contractual obligation. Sometimes the final content format is available from the music label, sometimes the format needs to be created (transcoded) from a high quality reference version.
Different platforms have different Digital Rights Management solutions (e.g. Windows DRM™, OmniPlay™, OMAv2™, PlayReady™) Content files also have different containers/wrappers which are particular to different platforms.
Before publishing music content into a live service various checks need to be performed. Including:
An implementation of the present invention provides the infrastructure and services required to achieve all of these goals and deliver a highly capable multi-device, multi-platform unlimited download music content service.
The stages of the overall process are:
The phases of the ingestion and publication process are broken down below, and are illustrated in
In
Each stage in the staging area 24 consists of a tools-supported manual process, whereby the tools analyse the metadata from the various sources available and, where possible, automatically identify duplicated data (i.e. descriptive metadata entries which refer to the same piece of digital media) and some items are flagged for manual correction where the automated process does not have sufficient information available from the data sources to perform a de-duplication and consolidation automatically.
Incoming data to be ingested may arrive in a variety of different forms, including XML of differing formats (according to the internal standards of the source data holder), plain text files and Excel spreadsheets. All such formats are loading in a Loading Area 23 and are then passed through a variety of Staging Areas 24, each of which increases the standardisation of that metadata. In the description of the process which follows, the various types of analysis, transformation and de-duplication of metadata is presented as if it takes place within a single Staging Area prior to ingesting the cleansed data into a production database for distribution and use. In the preferred embodiment, those actions take place across multiple Staging Areas, each utilising its own data store.
Supplementary data—such as images and digital media files—may accompany metadata, and needs to be analysed and, if necessary, transcoded where appropriate. For example, in the preferred embodiment the track duration specified in metadata would be cross-checked against the track duration extracted from the actual digital media file as one method of validating the metadata.
Incoming data is cleansed by checking for common typographical/transcription errors—such as transposed letters and variant spellings (such as US and UK English)—and by comparison to a known clean dataset, where possible.
The known clean dataset is a reference database which includes information, which is known to be accurate, concerning variant artist names—for example, that “George Scott” and “George C. Scott” refer to the same artist—together with variant album titles and other hints to assist with data de-duplication and cleansing. As additional volumes of metadata are ingested and cleansed, the reference database increases in size and coverage accordingly, essentially permitting the system to “learn” from previous data ingestion experiences.
Where data is provided from multiple different sources, the tool compares the different versions and selects the “correct” metadata item based on a majority-vote system, weighted according to the information available in the reference database.
For example, suppose that three data sources provide information about a given track, the incoming data may be as given in the table below, the FINAL column of which indicates the final data selected for inclusion by the tool:
In the example above, it can be seen that Source A contains correct information for all elements except for the Track Number, while Source B and Source C contain incorrect or missing information in other fields. The reference database and transcription errors assessment protocols assist in identifying that Source B refers to the same track and the other two data sources, while majority voting ensures that the FINAL column picks up the best quality (i.e. the most common, and therefore most likely to be correct) metadata descriptions for each element.
Where a, user-configurable, threshold of similarity is reached (typically 65%-85% similarity in the preferred embodiment), the final data is flagged for manual confirmation before being passed into the core database for production use. Items which exhibit similarity values outside of that range are automatically discarded as being duplicates of existing content or passed automatically into the core database as having been clearly identified as new content.
The purpose of manual confirmation is to ensure that similar but interesting variants—such as a release of an album with additional bonus tracks—are preserved in the system, as well as to provide an additional check where automated analysis results in sufficiently ambiguous data as to require human judgement.
The threshold of similarity is calculated as a statistical function of the relationship between the FINAL data and the source data from which it was derived and by making use of the clean reference database disclosed previously, using a variety of fuzzy logic pattern matching techniques, including but not limited to one or more of the following, where the relevant data is available:
During the data cleansing process, the procedure makes use of both a clean “reference database”, as described above, and also references the “core” content database, which in the preferred embodiment is the same database, though accessed for a slightly different purpose.
The core content database is accessed to distinguish new data—data which is not previously present in the core content database—from data updates when ingesting metadata from a data source. Similar fuzzy logic matching techniques are used to identify where incoming data is an update to an existing media content descriptor.
Such updates may constitute actual changes required to the metadata—such as a change of album title—or the “backfilling” of additional information about an existing album, track or other digital media release, whereby newly-ingested metadata is to be added to an existing metadata record.
During the ingestion process, such updates are subject to the same checks as provided for new metadata.
Content ingestion data is, in the preferred embodiment, recorded in audit database tables, for subsequent report generation. Recorded details include one or more of: artist, title, success or a reason for failure of the ingestion process for the item, a notation indicating whether this represents new, updated, backfilled or deleted items, the source(s) of the metadata and a notation as to which items of metadata were modified as a result.
This auditing provides both for rollback of a given ingestion, for report generation as to the published content available at any given time and for analyses to be performed to determine coverage of, for example, popular music or the contents of local or international charts in the currently published content database.
Data Management Tools Suite 32—Each box indicates a particular type of metadata management required for the overall process of dealing with metadata. The only two which are directly relevant to this system are Deduplication and Release Versioning 41 and, for metering/reporting activities, the Content tracking 55.
The loading areas include:
The overall process is that raw metadata is obtained from the loading areas 33, 34, 35 and 36 and reaches the various staging areas 37. That metadata is then cleansed (Validation and preparation 38) using Fuzzy logic services 39 including automatic cleansing using the reference database (OMNI data warehousing services database 40) and manual cleansing where indicated (Deduplication and Release Versioning 41). Also, any additional media file formats are produced by transcoding from a reference file, if necessary (Encoding services 42)
Additional metadata, such as Charts data, is obtained from reference metadata data sources (Chart Ripper 43) and from various additional source (HTTP 44, feeding into the Volumes/Chart Comps 45) and also ingested and consolidated/de-duped with the generally ingested metadata to form the Consolidated Content Universe 46.
The, now cleansed, data is then published to the pre-production (Headquarters 47) database for testing and then to the production databases (Publishing Services 48), leading to Data Centres 49. That data is accessible using a variety of services, such as the Gracenote™ Batch Services 50, and publishable to external locations (Publishers/Collecting Societies 51).
Content Enhancement 52 indicates the metering, reporting and data analysis procedures (track playing stats, synchronisation of user- and supplier-generated track ratings, the generation of charts and so on). The Audit Database 53 indicates the storage of metering/auditing data which feeds into that process. Finally, DRM services 54 is both the publication of the DRM-protected media files and the mechanism for generating the audit data for that Audit database 53.
In the main implementation, digital media files are made available from the main production database (e.g. database 27 in
In addition:
In an example embodiment, the system supports the creation, collection, consolidation and administration of content usage metering files across multiple platforms and reporting facilities including, but not limited to calculating and reporting the complex financial and usage statistics to the plethora of stakeholders requiring reports in multiple territories. Stakeholders requiring reports include major music labels, independent music labels, content aggregators, publishing societies and business partners. In the preferred embodiment, the reporting analysis also provides highly sophisticated analysis such as churn analysis and subscriber behaviour reporting.
The core metering action in this system is the recording of a track play, or the playing of some other digital media file, such as a movie, a game, an article or a news story. For convenience, all such digital media content will be referred to herein as “tracks”, with defined collections of “tracks” being referred to as “albums” or “releases.”
The system identifies a track as having been played on a client device when some minimum portion of that track has been played, the minimum portion being configurable based on media type but in the case of music files would typically be either 4%-5% of the track length or 30 seconds. Track plays below the defined threshold would not be recorded for metrics or reporting purposes, since such brief plays may be generated by user's skipping past tracks.
The context of a track play is also recorded in the metrics. Contextual information includes, in an example embodiment, the album/release, playlist, chart or other context from which the played track originated as well as basic information including, but not limited to, one or more of: the client device on which the track was played, the user who played that track, the duration/proportion of the track which was in fact played and the internal session context of the track play, such as the tracks played immediately prior to or after that track.
Metering information (“metrics”) is gathered on the client device and is communicated to the server. The frequency and method of transport of metrics to the server is dependent on the type of device but, in the preferred embodiment, typical scenarios would include:
The method of transportation, in the preferred embodiment, is to piggyback the metrics on an existing communication which the client device would have had to send to the server in any event, such as a request for recommendations or for a media file or a polling event asking the server for messages to be delivered to the client device's inbox. Another example embodiment may send specific messages to deliver metrics, and that approach may be taken in the preferred embodiment if the client device has metrics but no other requests queued for sending to the server in excess of some configurable period of time (typically 60 minutes).
Metrics received by the server are, in the preferred embodiment, stored in auditing database tables. Such metrics may also be enriched with one or more items of additional metadata, including the genre, artist, era, music publisher, copyright holder, demographic information about the user, downloaded or streamed file sizes, bandwidth available to a client device at the time and any additional information about which reporting analyses are desired. In the preferred embodiment, metrics stored for reporting purposes are anonymised in order to protect the user's privacy.
A second major area for which metrics are recorded is that of user subscriptions and purchasing. Specifically, the system provides a mechanism whereby it is recorded when a user performs one or more of the following actions: signing up to a subscription service, purchasing one or more digital media files, modifying or cancelling a subscription or playing a preview of a track. All such requests made to the server are stored, suitably anonymised in the preferred embodiment, in the audit database tables for subsequent report generation.
The auditing database tables may then be used to generate reports, both internally and for third parties such as music labels or movie studios.
Typical reports generated by the present invention in its preferred embodiment include:
Reports may also, in the preferred embodiment, capable of being broken down by one or more of the following classifications: genre, adult content status, era, publication or other dates, artist, publisher, copyright holder, time period, chart rankings, director, writer/composer, client device type, digital media service or any other stored metadata.
Numeric details may be presentable as overall figures, averages, medians, some other statistical measure or a combination thereof. The reporting period, the format of generated reports and the frequency with which they are generated is also, in the preferred embodiment, configurable.
Report formats may be updated frequently, typically used for realtime reports which may update at intervals defined in seconds or fractions thereof, or generated as documents intended for viewing on a computer or for printing.
Application server 47 uses the metering data to provide usage reporting to support services 48. User recommendations are also made based on gathered playing metrics, using Content Team tools 49.
Number | Date | Country | Kind |
---|---|---|---|
0815651.5 | Aug 2008 | GB | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/GB2009/051091 | 8/28/2009 | WO | 00 | 5/24/2011 |