The present invention relates to the automatic management of digital archives, in particular to the automatic management of archives of files relating to audio and/or video sequences.
The ever increasing spread of computer networks (especially the Internet), together with the availability of huge amounts of audio and video contents, has made it extremely easy and common to exchange audio and/or video contents among network nodes, in particular among users.
Of course, such a huge amount of data should be managed appropriately.
In a big digital archive, e.g. the file system stored on a hard disk of a personal computer, it may happen that there are several copies of the same file (in general having the same file name); it is also possible that there are several copies or several slightly different versions of the same audio and/or video sequence (in general having different file names). The user is often unaware of this situation; and even if he/she ever notices it, the user avoids “cleaning” the digital archive because it is a time-consuming and difficult task (especially for those audio and/or video sequences for which it is not possible to use the file name as a choice criterion in order to identify two identical or similar sequences).
Audio and/or video files often include descriptive data (also referred to as “metadata”) which is added to the audio and/or video data in order to provide information about the files themselves, such as: title, duration, image resolution, compression and coding algorithms, quality, etc.
This data is used by electronic audio and/or video players for decoding the file correctly and for providing information about the audio and/or video sequence being played.
Software packages are available on the market which allow the user to search for files based on one or more descriptive features. The operating systems of the Microsoft Windows family include a tool that allows to search the file system for a file on the basis of the file name and/or of words contained in its text. Many e-mail programs have a function that allows to search the message archive for an e-mail message according to subject, date, sender, receiver, or words contained in its text. Even though these packages can help the user manage his/her digital archives, the management of great amounts of files is however burdensome for the user, and therefore it is practiced only to a small extent or not at all.
The general object of the present invention is to facilitate and improve the management of great amounts of files, in particular of files relating to audio and/or video sequences.
More specifically, the object of the present invention is to facilitate and improve the storage of great amounts of files, in particular of files relating to audio and/or video sequences.
Said objects are substantially achieved through selection and deletion methods having the functionalities set out in the appended method claims, which are intended as an integral part of the present description.
The present invention is based on the idea of finding substantially duplicate files, selecting the best one and deleting the others, this process being carried out in an automatic or quasi-automatic manner, i.e. with the user having to answer one or more confirmation requests.
The methods according to the present invention are executed, for example, upon the user's request or whenever a new file is stored on the medium, or else at preset time intervals.
The method according to the present invention may provide for extracting a subset of parameters contained in the descriptive data of the audio-video files, and for calculating one or more significant values for each file depending on the criteria specified for choosing the file to be retained: best quality, best compromise between quality and occupied space, type of compression algorithm used, etc. These choice criteria can be set and configured by the user who, through a suitable interface, selects the criteria to be applied and the respective parameters, and then assigns a priority to each one of them.
The analysis of the extracted parameters and of the calculated significant values allows these criteria to be applied, and thus provides the automatic selection of a single file out of the set of duplicates, resulting in all other duplicates being eliminated or moved to another area of the medium.
According to a further aspect, the present invention also relates to an electronic apparatus adapted to implement said methods and having the features set out in the appended apparatus claims, which are intended as an integral part of the present description.
Further objects, features and advantages of the present invention will become apparent from the following detailed description.
For a better understanding of the invention, some embodiments thereof will now be described by way of non-limiting examples with reference to the annexed drawings, wherein:
Said method carries out a selection of one file in a subset of N files found to be duplicates of the same audio or video or audio-video sequence by using prior-art methods, or else specified as duplicates by the user.
The method consists in the sequential application, to the N duplicates, of different selection criteria according to a decreasing priority order.
In the chart of
If no file meets the requirements imposed by the function block, then the process will proceed to the next block.
If nk files meet the requirements imposed by function block Ck, then all other nk-1−nk files will be removed.
Each block Ck is immediately followed by a check of the number of duplicates nk remaining after the application of the function block. If nk=1, then the file that meets the requirements best has been found, and the other nk−1 files can be removed. If nP-1>1 after the application of P−1 function blocks, then block P will make a random selection of one file among the remaining nP-1 files, and all other nP-1−1 files will be removed.
At the end of the chain, only one file of those belonging to the initial set of N duplicates will remain stored on the medium (i.e. nP=1).
In an example of embodiment, the method according to the invention makes a selection of one file in a subset of N audio files found to be duplicates of the same audio sequence by using prior-art methods, or else specified as duplicates by the user.
The flow chart of
Block C1 makes a selection among the N duplicate files based on sequence duration. The application of this criterion aims at removing incomplete duplicate files.
This function operates as follows: first, the longest audio sequence having the longest duration dmax is found (2-1 in the chart). Then block 2-2 initializes the variable i=1; for each file xi having a duration di (i being a whole number, 1≦i≦N) in the set of duplicates, the following value is calculated:
which is compared with a customizable value R, R being a real number, 0<R≦1 (block 2-3 in the chart). The files xi with the parameter ri smaller than R are considered to be incomplete and are automatically removed (2-4). Block 2-5 increments said variable by one unit in order to analyze the next file. Check block 2-6 verifies if i>N; if yes, it means that all N files have been analyzed and the process can proceed to the next function; otherwise, the control will return to check block 2-3.
The selection thus carried out allows to reduce the number of duplicates from N to n1, wherein n1≦N.
If n1=1, the file to be retained has been selected and the process is complete.
The flow chart of
Block C2 makes a selection among the n1 duplicate files based on file format, i.e. depending on how the information is coded in order to be stored on the medium.
The format fe (wherein 1≦e≦n1) of each file xe in the set of n1 duplicates is compared with a list of J preferential formats Fj (wherein 1≦j≦J) created by the user; all files xe in a format fe not included in said list will be automatically removed. In the event that none of the duplicate files falls within the preferential list, no file will be removed since it will be necessary to use another selection criterion having lower priority.
The selection thus carried out allows to reduce the number of duplicates from n1 to n2, wherein n2≦n1.
If n2=1, the file to be retained has been selected and the process is complete.
In
Block C3 makes a selection among the n2 duplicate files based on the quality of the audio sequence; this means that the best, worst or average file in terms of perceived quality (as desired by the user) will be retained.
The quality of an audio file can be estimated roughly by considering the following factors: algorithm used for data compression, sampling frequency (hereafter referred to as fs, measured in Hz) and bit-rate (referred to as BRa, measured in bit/s), i.e. the number of bits used for representing one second of audio sequence.
The first step compares BRa and fs with user-definable thresholds, which represent minimum and maximum levels of BRa and fs. This comparison results in the removal of those duplicates having parameters outside the preset limits. In the event that no file complies with the imposed limits, no file will be removed and a quality estimation must be carried out.
If only one file among the n2 duplicates is within the limits, the file to be retained has been selected and the process is complete.
The quality estimation of files which have not been removed during the above step can be obtained by calculating for each file the following value:
The value of qa represents the mean number of binary digits used for representing a single audio sample.
When the compression algorithm used changes from file to file, the values qa of different files cannot be compared directly; it is in fact known that, BRa and fs being equal, different compression algorithms may lead to appreciably different qualitative levels.
It is then necessary to use a corrective factor ka and calculate:
Qa=ka*qa
the value of the factor ka depends on the type of algorithm used, and must be obtained empirically based on psycho-acoustic studies.
If the compression algorithm used is the same for all duplicate files, this step will not be required (ka=1 for all files).
At this point, it will be necessary to find the maximum (or minimum, or mean, as desired by the user) value of Qa and retain all files associated with this value, while removing all other duplicates.
Files having the same Qa are considered to be equivalent.
The selection thus carried out allows to reduce the number of duplicates from n2 to n3, wherein n3≦n2.
If n3=1, the file to be retained has been selected and the process is complete.
Block Cp makes a selection among the nP-1 duplicates not yet removed by the previous blocks. All these files comply with the selection criteria set by the user. Since this is the last block, the selection criterion is not important, and it is possible to remove nP-1−1 files randomly (e.g. the first nP-1−1 files in alphabetical order), or the user may be asked to choose the files to be retained and those to be removed.
In another example of embodiment, the method according to the invention makes a selection of one file in a subset of video files found to be duplicates of the same video sequence by using prior-art methods, or else specified as duplicates by the user.
Blocks C1 and C2 remain the same, the only difference being that the parameters used (file duration and format) refer to the video sequence, not to the audio sequence. Block CP remains unchanged.
Function block C3 makes a selection among n2 duplicate files based on the quality of the video sequence; this means that the best (worst or average file as desired by the user) in terms of perceived quality will be retained.
The quality of a video file can be estimated roughly by considering the following factors: algorithm used for data compression, frame refresh frequency (hereafter referred to as fr, measured in frame/s), bit-rate (referred to as BRv, measured in bit/s), i.e. the number of bits used for representing one second of video sequence, and video resolution (referred to as ris, measured in pixel/frame).
The first step compares BRv, fr and ris with user-definable thresholds: this represents minimum and maximum levels of BRv, fr and ris. This comparison results in the removal of those duplicates having parameters outside the preset limits. In the event that no file complies with the imposed limits, no file will be removed and a quality estimation must be carried out.
If only one file among the n2 duplicates is within the limits, the file to be retained has been selected and the process is complete.
The quality estimation of files which have not been removed during the above step can be obtained by calculating for each file the following value:
This value represents the mean number of binary digits used for representing a single video sample, i.e. one pixel in a frame.
When the compression algorithm used changes from file to file, the values qv of different files cannot be compared directly; it is in fact known that, BRv, fr, and ris being equal, different compression algorithms may lead to appreciably different qualitative levels.
It is then necessary to use a corrective factor kv and calculate:
Qv=kv*qv
wherein kv is a factor depending on the type of algorithm used, and must be obtained empirically based on psycho-visual studies.
If the compression algorithm used is the same for all duplicate files, this step will not be required (kv=1 for all files).
At this point, it will be necessary to find the maximum (or minimum, or mean, as desired by the user) value of Qv and retain all files associated with this value, while removing all other duplicates.
Files having the same Qv are considered to be equivalent.
The selection thus carried out allows to reduce the number of duplicates from n2 to n3, wherein n3≦n2.
If n3=1, the file to be retained has been selected and the process is complete.
In another example of embodiment, the method according to the invention makes a selection of one file in a subset of N files found to be duplicates of the same sequence comprising both video and audio by using prior-art methods, or else specified as duplicates by the user.
Blocks C1 and C2 remain the same, the only difference being that the parameters used (file duration and format) refer to the video-audio sequence, not to the audio sequence. Block CP remains unchanged.
Block C3 evaluates the quality of the audio and video streams separately according to the above-described methods, and the file to be retained is chosen on the basis of either stream as desired by the user.
In another example of embodiment, the audio-video file to be retained is chosen by calculating for each duplicate the following parameter:
Qva=Qv*Qa
which takes into account the video and audio quality starting from the values of Qa associated with the audio stream and of Qv associated with the video stream.
It is therefore possible to find the maximum (or minimum, or mean, as desired by the user) value of Qva and retain all files associated with this value, while removing all other duplicates.
Files having the same Qva are considered to be equivalent.
The selection thus carried out allows to reduce the number of duplicates from n2 to n3, wherein n3≦n2.
If n3=1, the file to be retained has been selected and the process is complete.
Sometimes, files representing audio and/or video works are accompanied by additional data describing the license granted for using said works by specifying what is allowed and what is forbidden, thus limiting the use of said works (digital rights management).
Block C4 (not shown in
In a variant of the invention, the order of the function blocks changes according to the priority assigned to each selection criterion by the user. For example, if the user assigns a higher priority level to file quality than file format, block C2 must precede C3 in the chain.
In another variant of the invention, the file to be retained is proposed to the user, whom is then requested to confirm the choice before the duplicates are actually removed; the selection of the file to be retained is still automatic, but it is guided or conditioned by the user, who only has to give his/her final approval.
In other words, the various function blocks make a selection without deleting the file from the medium and propose said selection to the user: in this manner, the user can keep control of the process.
As aforesaid, the selection and/or deletion methods according to the present invention can advantageously be implemented and/or integrated in an electronic apparatus, e.g. in a program executed in the apparatus.
Typical apparatuses whereto the deletion method according to the present invention may be applied are, for example, audio and/or video reproduction devices such as the so-called “MP3 players” with semiconductor memory; in these portable devices, the memory available for storing sequences is rather limited (though it is constantly growing—nowadays it holds about 1 Gbyte), and it is therefore important to avoid keeping several copies of the same sequence.
In devices like those mentioned above, it is very advantageous that the deletion method is executed in an essentially automatic manner, so that the user is not bothered or required to do anything.
The device may execute repetitively a cycle for finding duplicates, possibly followed by the deletion thereof, preferably at regular time intervals. Such a solution may become very burdensome (from a data processing viewpoint), especially when applied to a large number of files; in such a case, it may be provided that the duplicate file deletion cycle is only executed upon a user's command.
Alternatively or additionally, it is very effective and efficient to carry out a verification every time a new file is stored in the device; in other words, when a new file is stored in the device, the device will search the old, previously stored files for a file being a duplicate of the new one; if such a file is found, the device will automatically or semi-automatically decide which one to retain and which one to delete.
The above-described embodiments of the present invention are merely exemplificative; the principles of the present invention may find application in other embodiments as well.
The scope and extent of the present invention are therefore determined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
TO2006A0534 | Jul 2006 | IT | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2007/002010 | 7/17/2007 | WO | 00 | 10/19/2009 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2008/012619 | 1/31/2008 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5974503 | Venkatesh et al. | Oct 1999 | A |
6070160 | Geary | May 2000 | A |
6230200 | Forecast et al. | May 2001 | B1 |
6389473 | Carmel et al. | May 2002 | B1 |
6445460 | Pavley | Sep 2002 | B1 |
6473756 | Ballard | Oct 2002 | B1 |
6687726 | Schneider | Feb 2004 | B1 |
6990628 | Palmer et al. | Jan 2006 | B1 |
7062158 | Ayaki | Jun 2006 | B1 |
7225249 | Barry et al. | May 2007 | B1 |
7296231 | Loui et al. | Nov 2007 | B2 |
7552474 | Gurda et al. | Jun 2009 | B2 |
7680853 | Yu et al. | Mar 2010 | B2 |
7761427 | Martin et al. | Jul 2010 | B2 |
7930301 | Marcjan et al. | Apr 2011 | B2 |
20010047290 | Petras et al. | Nov 2001 | A1 |
20020087383 | Cogger et al. | Jul 2002 | A1 |
20020169780 | Mraz | Nov 2002 | A1 |
20030093790 | Logan et al. | May 2003 | A1 |
20040153457 | Fischer et al. | Aug 2004 | A1 |
20040243588 | Tanner et al. | Dec 2004 | A1 |
20050033757 | Greenblatt et al. | Feb 2005 | A1 |
20050060643 | Glass et al. | Mar 2005 | A1 |
20050104976 | Currans | May 2005 | A1 |
20050193335 | Dorai et al. | Sep 2005 | A1 |
20060085828 | Dureau et al. | Apr 2006 | A1 |
20060129768 | Pferdekaemper et al. | Jun 2006 | A1 |
20060136446 | Hughes et al. | Jun 2006 | A1 |
20060155704 | Fischer et al. | Jul 2006 | A1 |
20060218127 | Tate et al. | Sep 2006 | A1 |
20060242203 | Pferdekaemper et al. | Oct 2006 | A1 |
20070005581 | Arrouye et al. | Jan 2007 | A1 |
20070088690 | Wiggen et al. | Apr 2007 | A1 |
20070185848 | Farber et al. | Aug 2007 | A1 |
20070239949 | Childs et al. | Oct 2007 | A1 |
20070288715 | Boswell et al. | Dec 2007 | A1 |
20080065587 | Iwasaki et al. | Mar 2008 | A1 |
20080065707 | Iwasaki et al. | Mar 2008 | A1 |
20080066191 | Farber et al. | Mar 2008 | A1 |
20080071855 | Farber et al. | Mar 2008 | A1 |
20090234878 | Herz et al. | Sep 2009 | A1 |
20100299536 | Martin et al. | Nov 2010 | A1 |
Entry |
---|
Later publication of International Search Report. May 8, 2008, (WO/2008/012619) Automatic Management of Digital Archives, In Particular of Audio and/or Video Files. |
Written Opinion of the ISA, Jan. 20, 2009, (WO/2008/012619) Automatic Management of Digital Archives, In Particular of Audio and/or Video Files. |
International Preliminary Report on Patentability Chapter II, Jan. 20, 2009, (WO/2008/012619) Automatic Management of Digital Archives, In Particular of Audio and/or Video Files. |
Piernas J et al: “Dualfs: A New Journaling File System Without Meta-Data Duplication” A Conference Proceedings of the 2002 International Conference on Supercomputing. ICS'02. New York, NY, Jun. 22-26, 2002, ACM International Conference on Supercomputing, New York, NY : ACM,US, . vol. CONF. 16, Jun. 22, 2002, pp. 137-146,XP001171510—ISBN: 1-58113-483-5. |
Number | Date | Country | |
---|---|---|---|
20100049768 A1 | Feb 2010 | US |