The present disclosure generally relates to data compression, and more specifically to compression of data associated with a certificate of authenticity protecting a product.
A certificate of authenticity (COA) is a label, marking or a digitally signed physical object that gives the consumer assurance that a product is genuine, and is not a pirated copy or shoddily made fake. Preferably, a COA has a random unique structure that satisfies three requirements. First, the cost of creating and signing an original COA should be small, relative to a desired level of security. That is, the cost to the genuine manufacturer of the product should be small compared to the cost or retail price of the product. Secondly, the cost of manufacturing a COA instance should be several orders of magnitude lower than the cost of exact or near-exact replication of the unique and random physical structure of this instance. That is, while the COA is inexpensive for the legitimate manufacturer to create, it is expensive for the pirate manufacturer to replicate. And thirdly, the cost of verifying the authenticity of a signed COA is small, again relative to a desired level of security. The desired level of security should be typically closely tied to the cost of the product to which the COA is attached.
The uniqueness of each COA instance is a crucial element in fighting counterfeiting. Any single, unvarying feature can easily be mass-produced by a specialized machine whose construction cost is amortized over millions of counterfeit copies it produces. As an example, counterfeiters have been economically successful in forging en masse anti-counterfeiting holographic features, regardless of their sophistication.
Accordingly, a need exists for a new and improved certificate of authenticity.
Systems and methods for compressing data, particularly for use in manufacturing and verifying certificates of authenticity (COA), are described herein. Data elements obtained from a COA are ordered based on an iterative selection process. First, one or more data ranges are defined. Having defined the ranges, a data element from within each of the ranges is selected. The selected data elements are then encoded. The encoding of each data element is based on a position of that data element within a range from which the data element was selected.
The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.
Thus,
Note that the extraction of fiber statistics f from the scanned barcode (e.g. block 304 of
In a second strategy, the adversary could devise a manufacturing process that can exactly replicate an already signed COA instance. Referring back to
In a third strategy, the adversary misappropriates (i.e. steals) signed COA instances. Of course, the organization that issues COA instances must be responsible for such misappropriations, and take appropriate steps to prevent this occurrence.
At block 404, the binary bit string (produced at the end of block 402) is decompressed and compared with the original scanned fiber locations.
At block 406, if the de-compression engine output (produced at block 404) and the compression engine input (input to block 402) are the same, then the compression engine is fully functional for that particular input. Accordingly, the binary string produced at block 402 is output as the compression engine result.
Lossless Compression
A lossless compression algorithm is adapted for use in compressing COA fiber location data, as well as for generic lossless compression applications. In a COA application, it is important to create a binary representation for an ensemble of distinct points with minimum number of bits possible. From a practical perspective, this is because the message region 104 (
Accordingly, the following notation will be used throughout this disclosure. We are given a set P of M distinct points in the N-dimensional finite integer grid: PΔ{p1, p2, . . . , pM}, where piεUΔ{1,2, . . . L}N, pi≠pj for i≠j and 1≦i≦M. In the COA implementation, the set P is derived from scanning a fiber optics-based COA. However, the set P could be based on other data requiring compression. Furthermore, P is a uniformly-randomly-chosen, cardinality-M subset from the universal set Upper our assumptions. We would like to describe P by a binary bit string (as short as possible); such a compression algorithm can also be thought of as an “enumeration technique” to efficiently label all possible cardinality-M subsets of U. Let R be the total number of bits spent for compression.
A straightforward approach would be to use ┌log2 L┐ bits in order to encode each coordinate of each point, where ┌•┐ denotes rounding to the nearest larger integer operator. This means encoding each point by N┌log2 L┐ bits, which results in R=MN┌log2 L┐ bits to describe any set P. On the other hand, noting that there are a total of
cardinality-M equally-likely subsets of U, we observe the following standard information-theoretic entropy bound that holds for any compression algorithm:
where we relied on the uniform distribution of P from all cardinality-M subsets of U. Here the right hand side of (4.1) is the entropy of the underlying distribution. Note that the entropy bound is maximized by uniform distribution in discrete finite sets (which is the case for our problem). Naturally, the bound would yield different values (to be more precise strictly less than
in case of non-uniform distributions (i.e., the cases where distinct points are not uniformly randomly scattered in the space).
Lemma 1: The straightforward algorithm introduces a redundancy of at least log M! bits with respect to the entropy bound:
Furthermore, the bound is asymptotically tight as LN/M→∞.
Proof:
The second part of the lemma follows from using
An embodiment of a compression algorithm is motivated by Lemma 1, and provides much better performance than the straightforward method. It can be shown experimentally that the embodiment has a redundancy (i.e., the gap between the algorithm performance and the entropy bound) that is small enough for most practical purposes. For convenience, the remainder of this disclosure will refer to the straightforward approach as the case of “no compression.”
Compression Algorithm
As a first step, it can be shown that another problem in one dimension is equivalent to our original problem of lossless compression in high dimensions. For any point pεU, let p(k) denote its k-th component, 1≦k≦N. Note that, for all k, p(k)ε{1,2, . . . L}. Now define the mapping M(•): U→V such that
where it can easily be verified that V={1,2, . . . LN}. Furthermore, it can be shown that M is a one-to-one mapping. In fact, given c=M(p), p can be iteratively computed by first noting
p(1)=1+{(c−1) mode L}, (3)
and for all 1<k≦N, given {p(l)}l=1k−1 via calculating
Recall that, per problem construction, there is a uniform probability distribution on P among all possible cardinality-M subsets of U; this implies that there is also a uniform probability distribution on M(P) among all possible cardinality-M subsets of M(U) since M is a one-to-one mapping. In this section, the latter problem is considered, and a method is disclosed to compress uniformly randomly-distributed cardinality-M subsets of M(U)={1,2, . . . LN}. In practice, at the encoder side, scattered N-dimensional points from U can be mapped to one-dimensional points via equation (2) prior to using the encoding algorithm; at the decoder side, after applying the de-compression algorithm, one-dimensional points from M(U) can be re-mapped back to N-dimensional space via (see equations (3) and (4)).
An assumption may be made that for any set P={p1,p2, . . . ,pM}⊂{1,2, . . . ,LN}, we are given M(P)={c1,c2, . . . ,cM}⊂{1,2, . . . ,LN}, where ciΔM(pi), 1≦i≦M. Furthermore, since the order is unimportant, we assume that the input points {ci} are given in ascending order without loss of generality: c1<c2< . . . <cM. In one embodiment, the first step is to form a tree from these points. The basic idea is to use an iterative process to decrease the range (i.e., the difference between the known minimum upper and maximum lower bounds) bounding each point as the depth of the tree increases in a balanced fashion. The next step is to encode these points using the formed tree, thereby encoding the points in an order that saves bits. Bits are saved because successively encoded points are associated with smaller and smaller ranges (since the binary tree successively bifurcated the ranges from which a median point was selected), and the encoding of each point is expressed as a distance from that point to a bound of the range.
Data Ordering and Tree Formation
In one implementation, the basic idea is to choose the order of encoding the points such that the previously encoded points can “help” the encoding of the current point as much as possible. We assume that the number of bits spent to encode each point is equal to the (base 2) logarithm of the difference between the known minimum lower bound and the known maximum upper bound. Thus, previously encoded points help in encoding the current point by decreasing this gap (i.e., by either decreasing the known minimum upper bound or increasing the known maximum lower bound). As it turns out, gains are achieved in compression by selecting the order of encoding carefully. This basic idea is illustrated in the following very simple example.
A simple example is instructive of these concepts.
In a somewhat changed example, suppose the bits are encoded in the order of (c2, c1, c3), and that order is known at the decoder. In encoding c2, we spend 4 bits. For the next symbol c1, we know that 0<c1<c2=9, which means spending ┌log(9−0−1)┐=3 bits. Similarly for c3, we know that c2=9<c3<17, which means spending ┌log(17−9−1)┐=3 bits; overall we spend 4+3+3=10 bits in total, which is one bit less than the bit rate of the previous method, where we chose the symbols in the ascending order. This is because we had a better partitioning of the interval {1, 2, . . . 16} in the second method; on average, we decreased the ranges of the second and third symbols after encoding the first symbol, because the first symbol was selected as the median of the three.
Accordingly, an embodiment of the compression algorithm uses a particular encoding order, where at each step we encode the median symbols within a given range. The rationale is to minimize the range of the future symbols to be encoded as much as possible. As it turns out such an encoding order can be represented by a binary tree (i.e., a tree where each parent can have at most 2 children). Next, a tree formation algorithm is disclosed, which readily gives the encoding order; furthermore, since this algorithm is known at the receiver, the decoder also knows this order, thereby providing synchronization between encoder and decoder.
At block 602, variables are established and initialized according to the above indicated notation, such that i1,1=┌M/2┐, d=1, and dmax=┌log(M+1)┐.
At block 604, a loop, such as a “while” or “do” loop is entered. More particularly, the loop is performed “While d<dmax do:” wherein d is incremented at block 618.
At block 606, the definition of Sd is refined by adding to the set of encoded symbols in an iterative manner. In a more specific example, we define Sd={il,j|1≦l≦d,1≦j≦2d−1}∪{0,M+1} as the set of encoded symbol indices so far.
At block 608, another loop, such as a “while” or “do” loop, is entered. In a more specific example, the loop is performed “for all 1≦j≦2d−1 do,” wherein j is incremented at block 616.
At block 610, values for the variables MinUpperBound and the MaxLowerBound are found. For example, these bounds may be found according to: MinUpperBound=arg minlεS
At block 612, a check is made to determine if a value should be inserted into a particular tree location. In a more specific example, if id,j>(MaxLowerBound+1), set id+1,2j−1|(MaxLowerBound+id,j)/2|; else, leave id+1,2j−1 blank or empty.
At block 614, a second check is made to determine if a value should be inserted into a further tree location. In a more specific example, if id,j<(MinUpperBound−1), set id+1,2j=|(MinUpperBound+id,j)/2|; else, leave id+1,2j blank or empty.
At block 616, j is incremented, allowing return to block 608, where j is still within the range. If not, at block 618 d is incremented, allowing return to block 604 if d is less than dmax. If not, the binary tree is completed, meaning that data elements have been located within the tree in an order that is helpful during the compression algorithm.
Encoding Data
This section describes an encoding process, wherein values are assigned to the individual elements of a given set {c1, c2, . . . cM}, thereby producing a binary codeword for the whole set M(P). In the bit assignment, the order by which the symbols are encoded is significant, and may be obtained from a binary tree (e.g.
At block 908, a further loop, such as a “while” or “do” loop is entered. More particularly, the loop is performed for a given d, for each 1≦j≦2d−1, wherein j is incremented at block 916 Within the loop, consider ci
At block 910, a value for the variable MaxLowerBound is found. In particular,
At block 912, a value for the data point is found, based on the data point's distance from the MaxLowerBound. Alternatively, the value could be based on the point's distance from the MinUpperBound. The value is then expressed using a number of bits based on a distance between the MaxLowerBound and the MinUpperBound. Thus, the value for the data point is based on the data point's position within the range associated with the data point. Accordingly, as the tree is traversed, and the ranges become narrower, the number of bits required to express each data point is reduced. In particular, the binary (encoded or compressed) representation of (ci
At block 914, the resulting bits are appended to the end of compressed bit sequence.
It is important to note that because the tree structure, from which {id,j} are derived, is known at the encoder and decoder simultaneously, perfect synchronization is achieved between the two. Therefore, the decoder simply performs the operation by repeating the steps of the encoder, except that conversion from ci
It is also important to note that the complexity of the algorithm is linear in M (the number of symbols to be compressed). As it turns out, in practice, it is not necessary to find the MaxLowerBound and MinUpperBound values in tree formation or encoding, because the input symbols are already given in ascending order. Thus, the overall complexity is linear in the number of symbols and per symbol complexity is roughly about the complexity of conversion of a decimal number to binary bits for the encoder (and the complexity of conversion of binary bits to a decimal number for the decoder).
At block 1010, each selected data element is encoded based on a position of that data element within a range from which it was selected. Blocks 1012-1016 are representative of implementations wherein data element are encoded, and may be performed singly or in combination. In the implementation of block 1012, a distance is calculated between the data element and a bound of the range. For example, at block 912 of
At block 1018, a result of the encoding for each data element is appended to an output. This output can be encoded directly onto the certificate of authenticity (e.g. see block 210 of
Exemplary Computer
The computing environment 1100 includes a general-purpose computing system in the form of a computer 1102. The components of computer 1102 can include, but are not limited to, one or more processors or processing units 1104, a system memory 1106, and a system bus 1108 that couples various system components including the processor 1104 to the system memory 1106. The system bus 1108 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a Peripheral Component Interconnect (PCI) bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
Computer 1102 typically includes a variety of computer readable media. Such media can be any available media that is accessible by computer 1102 and includes both volatile and non-volatile media, removable and non-removable media. The system memory 1106 includes computer readable media in the form of volatile memory, such as random access memory (RAM) 1110, and/or non-volatile memory, such as read only memory (ROM) 1112. A basic input/output system (BIOS) 1114, containing the basic routines that help to transfer information between elements within computer 1102, such as during start-up, is stored in ROM 1112. RAM 1110 typically contains data and/or program modules that are immediately accessible to and/or presently operated on by the processing unit 1104.
Computer 1102 can also include other removable/non-removable, volatile/non-volatile computer storage media. By way of example,
The disk drives and their associated computer-readable media provide non-volatile storage of computer readable instructions, data structures, program modules, and other data for computer 1102. Although the example illustrates a hard disk 1116, a removable magnetic disk 1120, and a removable optical disk 1124, it is to be appreciated that other types of computer readable media which can store data that is accessible by a computer, such as magnetic cassettes or other magnetic storage devices, flash memory cards, CD-ROM, digital versatile disks (DVD) or other optical storage, random access memories (RAM), read only memories (ROM), electrically erasable programmable read-only memory (EEPROM), and the like, can also be utilized to implement the exemplary computing system and environment.
Any number of program modules can be stored on the hard disk 1116, magnetic disk 1120, optical disk 1124, ROM 1112, and/or RAM 1110, including by way of example, an operating system 1126, one or more application programs 1128, other program modules 1130, and program data 1132. Each of such operating system 1126, one or more application programs 1128, other program modules 1130, and program data 1132 (or some combination thereof) may include an embodiment of a caching scheme for user network access information.
Computer 1102 can include a variety of computer/processor readable media identified as communication media. Communication media typically embodies computer readable instructions, data structures, program modules, or other data in a computer readable media.
A user can enter commands and information into computer system 1102 via input devices such as a keyboard 1134 and a pointing device 1136 (e.g., a “mouse”). Other input devices 1138 (not shown specifically) may include a microphone, joystick, game pad, satellite dish, serial port, scanner, and/or the like. These and other input devices are connected to the processing unit 1104 via input/output interfaces 1140 that are coupled to the system bus 1108, but may be connected by other interface and bus structures, such as a parallel port, game port, or a universal serial bus (USB).
A monitor 1142 or other type of display device can also be connected to the system bus 1108 via an interface, such as a video adapter 1144. In addition to the monitor 1142, other output peripheral devices can include components such as speakers (not shown) and a printer 1146 which can be connected to computer 1102 via the input/output interfaces 1140.
Computer 1102 can operate in a networked environment using logical connections to one or more remote computers, such as a remote computing device 1148. By way of example, the remote computing device 1148 can be a personal computer, portable computer, a server, a router, a network computer, a peer device or other common network node, and the like. The remote computing device 1148 is illustrated as a portable computer that can include many or all of the elements and features described herein relative to computer system 1102.
Logical connections between computer 1102 and the remote computer 1148 are depicted as a local area network (LAN) 1150 and a general wide area network (WAN) 1152. Such networking environments are commonplace in offices, enterprise-wide computer networks, intranets, and the Internet. When implemented in a LAN networking environment, the computer 1102 is connected to a local network 1150 via a network interface or adapter 1154. When implemented in a WAN networking environment, the computer 1102 typically includes a modem 1156 or other means for establishing communications over the wide network 1152. The modem 1156, which can be internal or external to computer 1102, can be connected to the system bus 1108 via the input/output interfaces 1140 or other appropriate mechanisms. It is to be appreciated that the illustrated network connections are exemplary and that other means of establishing communication link(s) between the computers 1102 and 1148 can be employed.
In a networked environment, such as that illustrated with computing environment 1100, program modules depicted relative to the computer 1102, or portions thereof, may be stored in a remote memory storage device. By way of example, remote application programs 1158 reside on a memory device of remote computer 1148. For purposes of illustration, application programs and other executable program components, such as the operating system, are illustrated herein as discrete blocks, although it is recognized that such programs and components reside at various times in different storage components of the computer system 1102, and are executed by the data processor(s) of the computer.
Although aspects of this disclosure include language specifically describing structural and/or methodological features of preferred embodiments, it is to be understood that the appended claims are not limited to the specific features or acts described. Rather, the specific features and acts are disclosed only as exemplary implementations, and are representative of more general concepts.
Exemplary methods for implementing aspects of compression of fiber-based certificate of authenticity data were described with primary reference to the flow diagrams of
Number | Name | Date | Kind |
---|---|---|---|
7356148 | Hayashi | Apr 2008 | B2 |
20040153674 | Hayashi | Aug 2004 | A1 |
Number | Date | Country | |
---|---|---|---|
20060269062 A1 | Nov 2006 | US |