Claims
- 1. A method for partitioning a digital sequence comprising:
performing a hash function on at least a portion of said digital sequence; monitoring hash values produced by said hash function for a first predetermined numeric pattern found in a range of numeric values; and marking a breakpoint in said digital sequence when said first predetermined numeric pattern occurs; wherein said step of performing said hash function comprises a rolling hash function adapted to scan portions of said digital sequence and to progressively reduce a contribution of more distant bits in said digital sequence.
- 2. The method of claim 1, wherein said first predetermined numeric pattern is a bit pattern.
- 3. The method of claim 1, wherein said rolling hash function comprises a 32-bit hash function.
- 4. The method of claim 1, wherein said first predetermined numeric pattern comprises a consecutive sequence of bits.
- 5. The method of claim 4, wherein during said monitoring said consecutive sequence of bits is found in said hash values, the hash values comprising hash sequences that are not numerically consecutive.
- 6. The method of claim 4, wherein said consecutive sequence of bits comprises a plurality of endmost bits.
- 7. The method of claim 1, further comprising:
determining a threshold restriction for said step of monitoring said hash values; and increasing a probability of said marking of said breakpoint in said digital sequence.
- 8. The method of claim 7, wherein said step of increasing said probability of said marking of said breakpoint in said digital sequence is a function of at least a desired chunk size.
- 9. The method of claim 7, wherein said step of increasing said probability of said marking of said breakpoint in said digital sequence is a function of at least a length of a current portion of said digital sequence.
- 10. The method of claim 7, wherein said step of increasing said probability of said marking of said breakpoint in said digital sequence is carried out by the step of:
utilizing a second predetermined numeric pattern for said step of monitoring said hash values; and alternatively marking said breakpoint when said second predetermined numeric pattern occurs.
- 11. The method of claim 7, wherein said step of increasing said probability of said marking of said breakpoint in said digital sequence is a function of some content portion of said sequence.
- 12. A method for determining a first breakpoint in a first digital sequence comprising:
determining a subset group of said first digital sequence; performing a hash function on said subset group of said first digital sequence beginning at a starting position in said first digital sequence until a first predetermined numeric pattern, which is found in a range of numeric values, in said hash value is obtained; and marking said first breakpoint when said first predetermined numeric pattern in said hash value is obtained.
- 13. The method of claim 12, wherein said numeric pattern comprises a bit pattern.
- 14. The method of claim 12, wherein said step of performing a hash function comprises a rolling hash function that scans portions of said digital sequence and progressively reduces an impact of more distant bits in said digital sequence.
- 15. The method of claim 12, further comprising the steps of:
further performing a hash function on a subset group of said first digital sequence from said first breakpoint until said first predetermined numeric pattern in said hash value is again obtained; and marking another breakpoint in said first digital sequence when said first predetermined numeric pattern in said hash value is again obtained.
- 16. The method of claim 15, wherein said step of further performing said hash function is carried out by means of a rolling hash function.
- 17. The method of claim 12, further comprising the steps of:
determining a second predetermined numeric pattern in said hash value; and continuing said step of performing said hash function on said subset group of said first digital sequence until an established threshold restriction has been met.
- 18. The method of claim 12, further comprising the steps of:
performing a hash function on said subset group beginning at a starting position in a second digital sequence until said first predetermined numeric pattern in said hash value is obtained; marking a second breakpoint in said second digital sequence when said first predetermined numeric pattern in said hash value is obtained; and comparing said predetermined hash value at said first breakpoint with that at said second breakpoint.
- 19. A computer program product comprising:
a computer usable medium having computer readable code embodied therein for determining a first breakpoint in a first digital sequence comprising: computer readable program code devices configured to cause a computer to effect determining a subset group of said first digital sequence; computer readable program code devices configured to cause a computer to effect performing a hash function on said subset group of said first digital sequence beginning at a starting position in said first digital sequence until a first predetermined numeric pattern in said hash value is obtained; computer readable program code devices configured to cause a computer to effect marking said first breakpoint when said first predetermined numeric pattern in said hash value is obtained; computer readable program code devices configured to cause a computer to effect performing a hash function on said subset group beginning at a starting position in a second digital sequence until said first predetermined numeric pattern in said hash value is obtained; computer readable program code devices configured to cause a computer to effect marking a second breakpoint in said second digital sequence when said first predetermined numeric pattern in said hash value is obtained; and computer readable program code devices configured to cause a computer to effect comparing said predetermined hash value at said first breakpoint with that at said second breakpoint.
- 20. The computer program product of claim 19, wherein said numeric pattern comprises a bit pattern.
- 21. The computer program product of claim 19, wherein said computer readable program code devices configured to cause a computer to effect performing a hash function is carried out by means of a rolling hash function.
- 22. The computer program product of claim 19, further comprising:
computer readable program code devices configured to cause a computer to effect further performing a hash function on a subset group of said first digital sequence from said first breakpoint until said first predetermined numeric pattern in said hash value is again obtained; and computer readable program code devices configured to cause a computer to effect marking another breakpoint in said first digital sequence when said first predetermined numeric pattern in said hash value is again obtained.
- 23. The computer program product of claim 22, wherein said computer readable program code devices configured to cause a computer to effect further performing said hash function is carried out by means of a rolling hash function.
- 24. The computer program product of claim 19, further comprising:
computer readable program code devices configured to cause a computer to effect determining a second predetermined numeric pattern in said hash value; computer readable program code devices configured to cause a computer to effect continuing said step of performing said hash function on said subset group of said first digital sequence until an established threshold restriction has been met; and computer readable program code devices configured to cause a computer to effect alternatively marking said first breakpoint when said second predetermined numeric pattern in said hash value is obtained.
- 25. A method for determining a first breakpoint in a first digital sequence comprising:
determining a subset group of said first digital sequence; performing a hash function on said subset group of said first digital sequence beginning at a starting position in said first digital sequence until a first predetermined numeric pattern is found in the hash values from the said hash function is obtained, wherein the hash function is a 32-bit rolling hash function and the subset group is a 32-bit pattern derived directly or indirectly from the said first digital sequence, and the performing comprises shifting the 32-bit pattern over one bit, reading a character from the first digital sequence and deriving directly or indirectly another 32-bit pattern, and repeating the shifting and the reading, until there are 32-bits in sequence that can be hashed together to provide one said hash value; marking said first breakpoint when said first predetermined numeric pattern in said hash value is obtained; further performing a hash function on a subset group of said first digital sequence from said first breakpoint until said first predetermined numeric pattern in said hash value is again obtained; and marking another breakpoint in said first digital sequence when said first predetermined numeric pattern in said hash value is again obtained.
- 26. The method of claim 25, wherein said step of further performing said hash function is carried out by means of a rolling hash function.
- 27. The method of claim 25, further comprising the steps of:
performing a hash function on said subset group beginning at a starting position in a second digital sequence until said first predetermined numeric pattern in said hash value is obtained; marking a second breakpoint in said second digital sequence when said first predetermined numeric pattern in said hash value is obtained; comparing said predetermined hash value at said first breakpoint with that at said second breakpoint; and equating a corresponding portion of said first digital sequence from said starting point to said first breakpoint with a corresponding portion of said second digital sequence from said starting position to said second breakpoint.
- 28. A computer program product comprising:
a computer usable medium having computer readable code embodied therein for determining a first breakpoint in a first digital sequence comprising: computer readable program code devices configured to cause a computer to effect determining a subset group of said first digital sequence; computer readable program code devices configured to cause a computer to effect performing a hash function on said subset group of said first digital sequence beginning at a starting position in said first digital sequence until a first predetermined numeric pattern in said hash value is obtained, wherein the hash function is a 32-bit rolling hash function and the subset group is a 32-bit pattern, and the performing comprises shifting the 32-bit pattern over one bit, reading a character from the first digital sequence, and repeating the shifting and the reading; and computer readable program code devices configured to cause a computer to effect marking said first breakpoint when said first predetermined numeric pattern in said hash value is obtained.
- 29. The computer program product of claim 28, further comprising:
computer readable program code devices configured to cause a computer to effect further performing a hash function on a subset group of said first digital sequence from said first breakpoint until said first predetermined numeric pattern in said hash value is again obtained; and computer readable program code devices configured to cause a computer to effect marking another breakpoint in said first digital sequence when said first predetermined numeric pattern in said hash value is again obtained.
- 30. The computer program product of claim 29, wherein said computer readable program code devices configured to cause a computer to effect further performing said hash function is carried out by means of a rolling hash function.
- 31. The computer program product of claim 28, further comprising:
computer readable program code devices configured to cause a computer to effect determining a second predetermined numeric pattern in said hash value; computer readable program code devices configured to cause a computer to effect continuing said step of performing said hash function on said subset group of said first digital sequence until an established threshold restriction has been met; and computer readable program code devices configured to cause a computer to effect alternatively marking said first breakpoint when said second predetermined numeric pattern in said hash value is obtained.
- 32. The computer program product of claim 28, further comprising:
computer readable program code devices configured to cause a computer to effect performing a hash function on said subset group beginning at a starting position in a second digital sequence until said first predetermined numeric pattern in said hash value is obtained; computer readable program code devices configured to cause a computer to effect marking a second breakpoint in said second digital sequence when said first predetermined numeric pattern in said hash value is obtained; computer readable program code devices configured to cause a computer to effect comparing said predetermined hash value at said first breakpoint with that at said second breakpoint; and computer readable program code devices configured to cause a computer to effect equating a corresponding portion of said first digital sequence from said starting point to said first breakpoint with a corresponding portion of said second digital sequence from said starting position to said second breakpoint.
- 33. A method for partitioning a digital sequence comprising:
performing a hash function on at least a portion of a digital sequence; monitoring binary sequences produced by said hash function for a predetermined pattern of bits; and marking a breakpoint in said digital sequence when said predetermined pattern occurs.
- 34. The method of claim 33, wherein the predetermined pattern of bits comprises a consecutive sequence of bits.
- 35. The method of claim 34, wherein the consecutive sequence of bits comprises a plurality of endmost bits.
- 36. The method of claim 33, wherein the performing of the hash function comprises performing a rolling hash function adapted to scan portions of said digital sequence and to progressively reduce a contribution of more distant bits in said digital sequence.
- 37. A computer program product comprising:
computer readable program code devices configured to cause a computer to effect performing a hash function on at least a portion of a digital sequence; computer readable program code devices configured to cause a computer to effect monitoring binary sequences produced by said hash function for a predetermined pattern of bits; and computer readable program code devices configured to cause a computer to effect marking a breakpoint in said digital sequence when said predetermined pattern occurs.
- 38. A method for partitioning a digital sequence comprising:
performing an indirect hash function on at least a portion of a digital sequence; monitoring binary sequences produced by the indirect hash function for a first predetermined pattern of bits; and marking a breakpoint in the digital sequence when the first predetermined pattern is identified as occurring during the monitoring.
- 39. The method of claim 38, wherein said step of performing said hash function comprises a rolling hash function adapted to scan portions of said digital sequence and to progressively reduce a contribution of more distant bits in said digital sequence.
- 40. The method of claim 38, further comprising:
determining a threshold restriction for said step of monitoring said hash values; and increasing a probability of said marking of said breakpoint in said digital sequence.
- 41. The method of claim 40, wherein the increasing of the probability of said marking of said breakpoint in said digital sequence is a function of at least a length of a current portion of said digital sequence.
- 42. The method of claim 40, wherein the increasing of said probability of said marking of said breakpoint in said digital sequence is carried out by the steps of:
utilizing a second predetermined numeric pattern for said step of monitoring said hash values; and alternatively marking said breakpoint when said second predetermined numeric pattern occurs.
- 43. The method of claim 40, wherein the increasing of said probability of said marking of said breakpoint in said digital sequence is a function of some content portion of said sequence.
- 44. A computer program product comprising:
computer readable program code devices configured to cause a computer to effect performing an indirect hash function on at least a portion of a digital sequence; computer readable program code devices configured to cause a computer to effect monitoring binary sequences produced by the indirect hash function for a first predetermined pattern of bits; and computer readable program code devices configured to cause a computer to effect marking a breakpoint in the digital sequence when the first predetermined pattern is identified as occurring during the monitoring.
- 45. The computer program product of claim 44, wherein said performing said indirect hash function comprises performing a rolling hash function adapted to scan portions of said digital sequence and to progressively reduce a contribution of more distant bits in said digital sequence.
- 46. The computer program product of claim 44, further comprising:
computer readable program code devices configured to cause a computer to effect determining a threshold restriction for said step of monitoring said hash values; and computer readable program code devices configured to cause a computer to effect increasing a probability of said marking of said breakpoint in said digital sequence.
- 47. The computer program product of claim 46, wherein the increasing of the probability of said marking of said breakpoint in said digital sequence is a function of at least a length of a current portion of said digital sequence.
- 48. The computer program product of claim 46, wherein the increasing of said probability of said marking of said breakpoint in said digital sequence comprises:
utilizing a second predetermined numeric pattern for said step of monitoring said hash values; and alternatively marking said breakpoint when said second predetermined numeric pattern occurs.
- 49. The computer program product of claim 44, wherein the increasing of said probability of said marking of said breakpoint in said digital sequence is a function of some content portion of said sequence.
- 50. A method for partitioning a digital sequence comprising:
performing an indirect hash function on at least a portion of a digital sequence; monitoring hash values produced by the indirect hash function for a numeric pattern selected from a range of numeric values; and marking a breakpoint in the digital sequence when the numeric pattern occurs according to the monitoring.
- 51. A computer program product comprising:
computer readable program code devices configured to cause a computer to effect performing an indirect hash function on at least a portion of a digital sequence; computer readable program code devices configured to cause a computer to effect monitoring hash values produced by the indirect hash function for a numeric pattern selected from a range of numeric values; and computer readable program code devices configured to cause a computer to effect marking a breakpoint in the digital sequence when the numeric pattern occurs according to the monitoring.
- 52. A method for partitioning a digital sequence comprising:
performing an indirect function on at least a portion of said digital sequence, the performing comprising: indexing bytes from the digital sequence into alternative pre-determined bit sequences; performing a hash function on the pre-determined bit sequences; monitoring the bit sequences produced by the said hash function for a first predetermined bit pattern; and marking a breakpoint in said digital sequence when said first predetermined bit pattern occurs.
- 53. The method of claim 52, wherein said alternative pre-determined bit sequences are 32-bit sequences.
- 54. The method-of claim 52, wherein said alternative pre-determined bit sequences are 32-bit sequences selected to normalize breakpoints around a certain data block size.
- 55. A computer program product comprising:
computer readable program code devices configured to cause a computer to effect performing an indirect function on at least a portion of said digital sequence, the computer readable program code devices comprising additional computer readable program code devices configured to cause a computer to effect: indexing bytes from the digital sequence into alternative pre-determined bit sequences; performing a hash function on the pre-determined bit sequences; monitoring the bit sequences produced by the said hash function for a first predetermined bit pattern; and marking a breakpoint in said digital sequence when said first predetermined bit pattern occurs.
- 56. A method for partitioning a digital sequence comprising:
processing at least a portion of a digital sequence to produce a transformation; performing a hash function on the transformation; monitoring binary sequences produced by the hash function for a pattern of bits; and marking a breakpoint in the digital sequence when said first predetermined pattern occurs as determined by the monitoring.
- 57. The method of claim 56, wherein the processing to produce the transformation and the hash function are configured to produce a plurality of the breakpoints aggregating around a pre-determined data block size as opposed to a flat distribution of data block sizes.
- 58. The method of claim 56, wherein the transformation indexes the processed portion of the digital sequence into one of “n” pre-determined numbers or numeric sequences.
- 59. A computer program product comprising:
computer readable program code devices configured to cause a computer to effect processing at least a portion of a digital sequence to produce a transformation; computer readable program code devices configured to cause a computer to effect performing a hash function on the transformation; computer readable program code devices configured to cause a computer to effect monitoring binary sequences produced by the hash function for a pattern of bits; and computer readable program code devices configured to cause a computer to effect marking a breakpoint in the digital sequence when said first predetermined pattern occurs as determined by the monitoring.
- 60. The computer program product of claim 59, wherein the processing to produce the transformation and the hash function are configured to produce a plurality of the breakpoints aggregating around a pre-determined data block size as opposed to a flat distribution of data block sizes.
- 61. The computer program product of claim 60, wherein the transformation indexes the processed portion of the digital sequence into one of “n” pre-determined numbers or numeric sequences.
CROSS REFERENCE TO RELATED PATENT APPLICATIONS
[0001] This application is a continuation of U.S. application Ser. No. 09/777,149, filed Feb. 5, 2001, which claims priority from U.S. Provisional Application No. 60/245,920, filed Nov. 6, 2000, the disclosures of which are herein specifically incorporated by this reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60245920 |
Nov 2000 |
US |
Continuations (1)
|
Number |
Date |
Country |
Parent |
09777149 |
Feb 2001 |
US |
Child |
10861796 |
Jun 2004 |
US |