The present application claims priority to Korean Patent Application No. 10-2018-017337, filed Feb. 13, 2018, which is incorporated herein by reference.
The present invention relates generally to a reversible DNA information hiding method based on prediction-error expansion and histogram shifting, the method being capable of false start codon prevention, original sequence length preservation, high watermark capacity, and blind detection based on prediction-error expansion and histogram shifting without biological mutation.
A DNA sequence consists of a coding DNA and a non-coding DNA, and watermarks are inserted into the two regions, respectively, such that data can be hidden. In the case of the coding DNA, a redundancy codon range is extremely small, and thus the coding DNA is not suitable for reversible watermarking. In the case of the non-coding DNA, a watermark available range is wide compared to the coding DNA due to no condition for protein code preservation, and thus the non-coding DNA is suitable for DNA reversible watermarking.
Lossless compression and difference expansion (DE)-based methods widely used in conventional reversible image watermarking have been proposed by T. Chen, et al. (reference [1]). A histogram-based reversible DNA watermarking method with a low modification rate of bases has been proposed by Huang, et al. (reference [2]). In this method, the modification rate of bases is low, but bpn is extremely low and a false start codon occurs, similar as Chen's method.
Furthermore, a piecewise linear chaotic map (PWLCM)-based information hiding method has been proposed by Liu, et al. (reference [3]). Information hiding methods for tamper location detection and restoration of a DNA sequence have been proposed by J. Fu (reference [4]) and Ma (reference [5]). These methods are for hiding data using substitution by complementary rule, and non-blind methods requiring a reference (or original) DNA sequence for extraction and restoration.
The foregoing is intended merely to aid in the understanding of the background of the present invention, and is not intended to mean that the present invention falls within the purview of the related art that is already known to those skilled in the art.
Accordingly, the present invention has been made keeping in mind the above problems occurring in the related art, and the present invention is intended to propose a reversible DNA information hiding method based on prediction-error expansion and histogram shifting, the method being capable of false start codon prevention, original sequence length preservation, high watermark capacity, and blind detection based on prediction-error expansion and histogram shifting without biological mutation.
In order to achieve the above object, according to one aspect of the present invention, there is provided a reversible DNA information hiding method based on prediction-error expansion and histogram shifting, the method including: coding, at a first step, a four-letter base sequence of a non-coding region DNA to an n order code value; embedding, at a second step, multiple bits for each code value by a least square (LS) prediction error; embedding, at a third step, an n order watermark bit by non-circular histogram and circular histogram multi-level shifting; verifying, at a fourth step, occurrence of a start code of a watermarked intra code value and a watermarked inter code value.
At the first step, b may be a four-letter base b={‘A’, ‘T’, ‘C’, ‘G’}, b may be a base value of the b, x may be a base block consisting of n bases, x may be a code value for the base block x, and n may be a coding order. Coding to a 2n-bit code value x in units of the base block x consisting of the n bases may be performed as follows
where x=(b1, b2, . . . , bn), x∈┌0,22n−1┐. The bases of the base block may be restored from the code value x as follows f−1(x)=x where bk=(x>>2(n−k))%4 for k=1, . . . , n.
At the fourth step, preventing of a false start codon in the watermarked intra code value may include: generating a code value table containing the false start codon in advance; and embedding a watermarked code value not to contained in the code value table.
At the fourth step, preventing of a false start codon in the watermarked intra code value may include: when a previous watermarked code value x′i−1 is given, a number of embedded bits for a current processed code value is controlled such that the current processed code value x′i does not satisfy
x′
i−1(n−1,n)∥x′i(1,2)∈Zc
if (x′i−1%24)=f(‘AT’)=1 and (x′i>>2(n−1))%22=f(‘G’)=3
if (x′i−1%22)=f(‘A’)=0 and (x′i>>2(n−2))%24=f(‘YG’)=7.
At the second step, the code value may be predicted through local prediction for each embedding region.
The present invention has been made keeping in mind the above problems occurring in the related art. According to the reversible DNA information hiding method based on prediction-error expansion and histogram shifting, false start codon prevention, original sequence length preservation, high watermark capacity, and blind detection based on prediction-error expansion and histogram shifting are possible without biological mutation
The above and other objects, features and other advantages of the present invention will be more clearly understood from the following detailed description when taken in conjunction with the accompanying drawings, in which:
when all watermark bits have values of one, w={1}12n-1.
According to a preferred embodiment of the present invention, a reversible DNA information hiding method based on prediction-error expansion and histogram shifting is a method using difference expansion (DE) of a multi-bit base code value and histogram shifting, and main features of the present invention are as follows.
1. Blind Reversibility: a reversible watermark is hidden without change in the length of a DNA sequence and in amino acid, and extraction and restoration are possible without an original DNA sequence.
2. Watermarking Usability: a base bit sequence of a bit is encoded to a code value sequence of 2n bits, such that reversible watermark hiding, extraction, and restoration processes are easily performed.
3. Watermark Capacity: based on DE and histogram shifting of a code value sequence, multi-bit embedding for each target code value is enabled, and thus watermark capacity is increased.
4. No false start codon: through a false start codon—code value table and comparison-search between adjacent code values, occurrence of a false start codon in an intra code value and inter code values is prevented.
Before description of the present invention, symbols used in the present invention are defined as follows.
Cardinality |D| of a matrix L indicates the number of elements or length of L.
1. Coding of Four-Letter Base
For ease of watermarking signal processing on a four-letter base sequence, multi-bit coding processing is essential. In this section, the multi-bit coding processing for ease of watermarking signal processing and false start codon prevention will be described.
1-1. Coding Based on a Coding Order
Generally, a nucleotide base is expressed as four letters, b=(A, T, C, G) as shown in
b=(0,1,2,3)10=(00,01,10,11)2←b=(A,T,C,G) (1)
For ease of signal processing, rather than a 2-bit value, as shown in
The bases of the base block are easily restored from the code value x as follows.
f
−1(x)=x where bk=(x>>2(n−k))%4 for k=1, . . . ,n (3)
In the present invention, the number n of bases of the base block is called a coding order. Bases in the embedding region Di are coded to a code value Xi based on the coding order n; Xi={xk|k∈[1,Ni]}, Ni=└|Di|n┘. Here, the number Ni of code values is determined by the coding order n.
1-2. False Start Codon Prevention
The false start codon may occur in an intra code value or inter code values as follows.
1) Intra Code Value
a code value domain based on the coding order n is z∈Z=┌0,22n−1┐. In the case of n>2, as shown in
for ∀j=[1,n−2] and ∀bk∈[A,T,C,G], k=1, 2, . . . , j−1, j+3, . . . , n
Here, the symbols ‘A’, ‘T’, and ‘G’ correspond to 0, 1, and 3 as shown in Formula (3), and except for consecutive bases {A,T,G} on arbitrary positions, all bases at remaining positions have {A, T, C, G}. According to the present invention, in coding of the base, a code value table Zc={zc} including the false start codon is generated in advance, and then an embedding process is performed for a watermarked code value x′ not to be included in the Z.
2) Inter Code Values
The false start codon may occur between a base block x′i−1 of a previous watermarked code value x′i−1 and a base block x′1 of a current processed code value x′1. As shown in
x′
i−1(n−1,n)∥x′i(1,2)∈Zc (5)
if (x′i−1%24)=f(‘AT’)=1 and (x′i>>2(n−1))%22=f(‘G’)=3
if (x′i−1%22)=f(‘A’)=0 and (x′i>>2(n−2))%24=f(‘YG’)=7.
x(j,j+1) indicates the j-th and j+1-th bases of the code value x, and ∥ indicates a concatenation operator. x′i−1(n−1,n)∥x′i(1,2) indicates a code value where the n−1-th and n-th bases of x′i−1 are concatenated with the first and second bases of x′i. In the present invention, when the previous watermarked code value x′i−1 is provided, the number of embedded bits for the code value xi is controlled to prevent the current watermarked code x′i from satisfying the above condition.
2. Embedding Region (Target Region) Selection
In the present invention, a watermark is embedded into a code value string generated in units of a base block. Here, a region with a short sequence length is not suitable for a watermark embedding target due to a short code value string. Thus, the embedding region is a region having a or more code values, and a set Γ(n) of embedding regions for the coding order n is defined as follows.
Γ(n)={Di∥Di|>αp×n},Di={bii|j∈[1,|Di|]} (6)
Here, Di indicates the i-th embedding region, bii indicates the j-th four-letter base in the Di region, and |Di| indicates the number of bases in Di. α indicates the minimum number of code values in the embedding region, and x indicates a prediction order, which will be described in section 3. According to an embodiment of the present invention, the minimum value of code values is set to 10 or more, and the embedding region is selected based on the prediction order x.
A ratio of the number of embedding regions to the total number of non-coding regions on the given DNA sequence is designated by Rregion(n), and a ratio of the number of bases in embedding regions to the number of bases in total non-coding regions is designated by Rbase(n).
3. Code Value Prediction-Error Expansion (PE)-Based Reversible Watermarking
When a code value of the non-coding region is given, a prediction-error expansion method used in a conventional image data may be used to embed a bit in a pair of code values. For example, when a prediction {circumflex over (x)} value a with respect to an arbitrary code value x and a watermark bit w are given, the embedded code value x′ is as follows.
x′={umlaut over (x)}+2(x−{umlaut over (x)})+w=2x−{umlaut over (x)}+w (7)
Watermark extraction and code value restoration are easily obtained from {umlaut over (x)} and x′ as
This method is suitable for image data with high correlation between adjacent pixels. By a prediction error modeled as Laplacian distribution, one bit can be embedded into each of pixel pairs.
However, code values of the DNA sequence have a low correlation between successive predictors, and thus an adaptive prediction is required. Also, code values can be moved without limitation under false start codon limitation conditions, and thus multiple bits can be embedded in a pair of code values. Thus, in this section, a code value prediction-error expansion-based multi-bit embedding method will be described.
3-1. Code Value Error Expansion Condition for Multi-Bit Embedding
Except for false start codon values, DNA code values having no condition for definition move without limitation within a valid range. Thus, the prediction error d for a pair of code values can be expanded 2k times according to an expansion condition to embed k bits, and at most 2n−1 bits can be embedded; kmax=2n−1.
When k bits of watermark {wj}1k and a prediction value {circumflex over (x)} are given, a k-bit embedded code value x′ is obtained by the 2k times expanded prediction error d as follows.
When the embedded code value x′ and the number k of bits are given, watermark extraction and restoration are easily performed as follows.
w
i=((x′−{circumflex over (x)})>>(j−1))%2 for j=1, . . . ,k (9)
x={circumflex over (x)}+d={hacek over (x)}+(x′−ĉ)>>k (10)
Since the embedded code value x′ is desired to be 0≤x′≤22n−1, expansion condition of the prediction error d for 2k times expansion is as follows.
The code value x is desired to satisfy the condition as follows.
x∈[max(0,┌ĉ+2−k(−{circumflex over (x)}−α(k))┐),min(22n−1, └{circumflex over (x)}+2−k(22n−1−{circumflex over (x)}−α(k)┘)], (12)
where
Such the expansion condition is determined depending on watermark k bits and {wj}1k the prediction value {circumflex over (x)}, and the number of bits to be embedded in the code value x is determined depending on the expansion condition.
3.2 Code Value Prediction
A row vector of x code values for predicting the current code value xi is xi=(xi−1, . . . , xi−v) and a row vector of x parameter is b=(β1, . . . , βv). Here, x indicates a prediction order. When xi is observed, the prediction value {circumflex over (x)}1 of x1 is defined by a linear regression function ƒβ(x) as follows.
When a row vector of all code values in an arbitrary embedding region is y=(x1, . . . , xN) and N×p matrix of N observed previous code values is X=(x′1, . . . , x′N), LS predictor computes parameter t that minimizes the square distance) ∥y′−Xb′∥2=(u′−Xb′)′(u′−Xb′) between u′ and Xb′ as follows.
b=(X′X)−1X′y′ (14)
In the present invention, rather than whole prediction on whole embedding regions, local prediction for each embedding region is performed to predict the code value. Thus, in decoding process, additional information of |Γ(n)|×t which is parameter t by the number |Γ(n)| of embedding regions of the DNA sequence is required.
The code value may be predicted using a successive predictor {circumflex over (x)}i=xi−1 or a mean predictor
In
The prediction error histogram of an image is modeled as Laplacian distribution, but the LS prediction error histogram of the code value is modeled as normal distribution that (μ,σ)=(0,20) with n=3 and x=10, (μ,σ)=(0,19) with n=3 and x=20, (μ,σ)=(0,80) with n=4 and x=10, and (μ,σ)=(0,76) with n=4 and x=20.
3.3 Coding Process
In the coding process of the present invention, when the coding order n and the prediction order are given, an LS prediction parameter t is obtained for each embedding region. The LS predictor by t is used for the code value xi with i>p, and the mean predictor is used for the code value with i≤x, thereby obtaining {circumflex over (x)}1.
After determining the number ki (0≤ki≤2n−1) of embedded bits based on expansion condition of the prediction error di=xi−{circumflex over (x)}1, k1 bits {wI}I=1k
x′i∉Zt and x′i−1(n−1,n)∥x′i(1,2)∉Zt
When the embedded code value x′1 is included in a false start codon tale Zt or the previous code value x′i−1 includes the false start codon, the number ki of embedded bits is reduced by one, and then the above-described process is repeated until ki is zero. In this way, multiple bits are embedded in code values of all embedding regions, and then a watermarked region Γ′(n) is obtained. When ki is 0, it indicates a non-embedding region of the prediction error or a case where the false start codon occurs.
The number K={ki} of embedded bits for each code value and the prediction parameter t for each embedding region are additional information required in watermark extraction and original sequence restoration. It is required that the additional information is included in the watermarked region Γ′(n) and is transmitted without occurrence of the false start codon and generation of another additional information. In the present invention, by arithmetic coding, lossless compression is performed on the number K of embedded bits, the prediction parameter t, and an LSB bit E of a 2-bit base binary number in Γ′(n), thereby generating a compression bit string C={ci}. The compression bit ci is substituted to the LSB of the binary number b′i of the four-letter base as follows.
b′
i=(b′i>>1)<<1+c1, if b′i−2≠‘A’ and b′i−1≠‘T’ (17)
Here, in a case where two previous embedded bases (b′1−2,b′1−1) are “AT”, when the current base is b′1=‘G’, b′1 is substituted by one of ‘A’, ‘T’, and ‘C’. When b′1≠‘G’, embedding is omitted. Finally, a base string “AT” in the embedding region Γ″(n) including a compression string C performs as a marker directly indicating that a subsequent base does not include a compression bit. The length of the compression string C is determined by a compression algorithm, but in the present invention, arithmetic coding which is a general lossless compression algorithm is used. Consequently, the DNA sequence D′=Dnc+Dc, Dnc=Γ″(n)+Γc(n) containing the additional information and the non-coding region Γ″(n) where the watermark is embedded is transmitted.
3.4 Decoding and Restoration Processes
In decoding process, in the non-coding region Γ″(n) of the DNA sequence D′ transmitted first, from the LSB of all bases except for the base following “AT”, the number K of embedded bits of the additional information compression string C, the prediction parameter t, and the base LSB bit E are obtained. The code sequence X′ of Γ′(n) where the base LSB bit E of Γ″(n) is substituted is obtained by the coding order n. From all code values in X′, the watermark is extracted by the number K of embedded bits and the prediction parameter t, and the original code value is restored.
For example, when the number of embedded bits ki>0 and arbitrary code value x′i are given, the prediction value {circumflex over (x)}1 is obtained from the previous restored code value (xi−1, . . . , xi−v), and then the watermark k1 bit is extracted from the prediction error di=x′i−{circumflex over (x)}1, w1=((x′i−{circumflex over (x)}i)>>(l−1))%2 for l=1, . . . , ki. The original code value xi is restored by ki bit shifting of the prediction error di as xi={circumflex over (x)}i+((x′i−{circumflex over (x)}i)>>ki).
3.5 Watermark Capacity and Additional Information Amount
Watermark capacity is affected by the coding order n and the prediction order x. When n and x are given, the number of watermark bits embedded in the embedding region Γ(n)={Di}i=1|Γ(n)| is the sum of the number K of embedded bits for each code value in the region. Thus, the number of bits per base (bpn) bpnFE(n,p) is as follows.
where Ni=└|Di|/n┘ and 0≤ki≤2n−1
|Γ(n)| indicates the number of embedding regions, and Ni indicates the number of code values in the region Di.
When is LSB substitutable bit amount to embed the additional information compression string C,
is determined by the number of bases omitted by the false start codon in substituting process. The maximum
is equal to the total number
of bases in Γ′(n). It is required that the length of the additional information compression string C is less than the substitutable bit amount , the amount of the additional information that is the number K of embedded bits, the prediction parameter t, and the LSB E of 2-bit base is small, or an algorithm with high compression efficiency is required. When an arbitrary watermarked region D′1 (∈Γ′(n)) is given, E consists of |Di| bits, and the number K of embedded bits is expressed by Ni┌log22n┐ bits, and the prediction parameter t for each embedding region is expressed by x floating points of 32 bits. Thus, additional information ExtraPB(n,p) for Γ′(n) is as follows.
When the additional information compression string C is ρ×ExtraPB(n,p), compression is performed to be
4. Code Value Histogram Shifting-Based Method
Code values in a non-coding region may be shifted to, except for a code value table having the false start codon, a remaining region. In this section, non-circular and circular code value histogram shifting-based methods for increasing data capacity will be described.
4.1 Non-Circular Histogram Shifting (HS)
(1) Coding Process
In the present invention, an n order code value histogram domain Z=┌0,22n−1┐ is divided into M sections {Pi}i=1M. Here, each section is provided in bilateral symmetry with respect to a center value Ri, and Ri is used as a reference value of shifting. Thus, the length of the section has a value of an odd number, and is determined by the number of embedded bits.
When the maximum number of shifting bits in the section is kmax and the center value is Ri=z, Pi consists of 2×2maxk−1 values as follows.
P
i
={z−2k
R
i
=z (21)
The number M of sections is as follows.
Here, a residual section of 22n−(2×2maxk−1)M values is Zc=Z−␣i=1MPi, and is not selected for watermark embedding.
When an arbitrary code value x1 belongs to the section Pi, a difference from the center value R1 of the section is di=xi−R1, xi∈P1. Here, based on the range of |di|, the number k1 of bits to be embedded in x1 is determined as follows.
ki=0, if xi=R1
Next, k1 bits {wI}I=1k
x′i∉Zt and x′i−1(n−1,n)∥x′i(1,2)∉Zt
The value xi=Ri which is the center value Ri of the section is the number of embedded bits ki=0, and is excluded from bit embedding. Here, when a shifted code value x′i is in the false start codon table Zt or when the false start codon occurs between the x′1 and the previous shifted code value x′1, the number k1 of embedded bits is reduced by one until reaching zero. This process is repeated. Thus, the false start codon is prevented in the same manner as a successive code value pair DE method. In this way, for all code values in the embedding target region, multiple bits are embedded depending on the number of embedded bits for each code value, and then the watermarked non-coding region Γ′(n) is obtained.
As additional information for watermark extraction and original sequence restoration, the number K={ki} of embedded bits for each code value, a marker T={τ} of a section shifted based on a section reference value and the LSB bit E of the 2-bit base binary number in the watermarked non-coding region Γ′(n) are required. Like the successive code value pair DE method, a bit string C of the additional information (K,T,E) is generated with lossless compression, and then the bit string is substituted by the LSB bit of the base binary number in Γ′(n). The DNA sequence D′=Dnc+Dc, Dnc=Γ″(n)+Γc(n) containing the final additional information and the non-coding region Γ″(n) where the watermark is embedded is transmitted.
The code value x corresponding to the right subsection Pi+ (d>0) of the section Pi is shifted by the watermark bit to the left subsection Pi+1−(d≤0) of the right section Pi+1. In contrast, x corresponding to the left subsection Pi−(d<0) of the section Pi is shifted by the watermark bit to the right subsection Pi−1+(d>=) of the left section Pi−1. In other words, as shown in
Among the watermarked code values, the code value which is the center value x′i=Ri is generated in three cases. First, when the previous code value is the center value xi=Ri (ki=0), it is excluded in shifting. Thus, the original code value xi=Ri is not shifted. Also, as shown in
As shown in
(2) Decoding and Restoration Processes
In decoding process of the present invention, from the non-coding region Γ″(n) of the DNA sequence D′ previously transmitted, the additional information (K,T,E) of the compressed bit string is obtained, and then the watermarked non-coding region Γ′(n) by base binary number substitution of E is obtained. From the code sequence X′ of Γ′(n) watermarking and original value restoration are performed by the number K of shifting bits for each code value and the marker of T={τ} a shifted section.
When the code value x′1 of the code sequence X+ is given, the center value R of the original section of x′1 is required to be obtained first. That is, when the shifted section P1 of x′1 is not the boundary section (x′i∈P1) and the number k1 of shifting bits is ki>0, the center value R for the previous section of x′i is obtained as follows.
Here, based on the shifted section Pi of x′i, the center value R of the section before embedding is easily obtained. However, when x′i is the center value Ri of the shifted region Pi (x′i=Ri), ℏ is obtained by the marker τi of the previous section. The watermark ki bits {wI}I=1k
w
I=((x′i−R)>>(l−1))%2 for l=1, . . . ,ki (27)
x
i
=R+((x′i−R)>>ki) (28)
(3) Watermark Capacity and Additional Information
When the coding order n and the maximum number kmax of section shifting bits are given, the number of watermark bits embedded in the embedding region
is determined based on the number of bits defined by the difference range from the center value in the histogram domain section Pi and the frequency at which the code value belongs to each section.
The frequency with z value on the code value histogram is designated by p(z). Here, the number of shifting bits on an arbitrary section Pi is calculated by the sum of the number C(Pi−) of shifting bits in the left subsection Pi− and the number C(Pi+) of shifting bits in the right subsection Pi+.
The total number of watermark bits embedded in Γ(n)={Di}i=1|Γ′(n)|is the sum of the number of shifting bits on the remaining sections, except for the boundary sections P1− and PM+ among total M sections, and the number of bits per base bpn bpnHS(n,kmax) is defined as follows.
|Γ(n)| is the number of embedding regions, N is the number of code values in the region Di, and
is the total number of bases in the embedding target region.
The additional information ExtraHS(n,kmax) for watermark extraction and restoration is the number R of shifting bits for each code value, the marker T of the section shifted based on the section reference value, and the LSB bit E of the 2-bit base binary number of the watermarked non-coding region Γ′(n). When the maximum number of shifting bits in the histogram domain section is kmax, the number of embedded bits is expressed by ┌log2kma┐ bit. Thus, the number K of shifting bits for whole code values is expressed by total
bits. The marker T of the shifted section is binary information determining whether the code value x′=Ri shifted based on the center value of the adjacent section is shifted from the left section or the right section, and is expressed by
bits. E is
bits that is the same as the number of bases of all regions in Γ′(n). Thus, additional information ExtraHS(n,kmax) is as follows.
When a compression rate is ρ, lossless compression is performed such that additional information ExtraHS(n,kmax)
When the watermark bit is not embedded k=0, it corresponds to the boundary section of the histogram domain section, the residual section that do not belong to the section, and the code value that is the center value of the section. That is, k=0 probability P(k=0|x) is as follows.
is the probability of the code value in P1− section,
is the probability of the code value in PM+ section, and
is the probability of the value in the residual section that do not belong to P. Last,
is the probability of the code values that are the center values of all sections.
4.2 Circular Histogram Shifting (CHS)
Unlike the pixel value of the image, code values in the non-coding region have no condition for definition, and thus shifting between the maximum value and the minimum value is possible. In the circular histogram shifting method, histogram section shifting is changed to circular histogram shifting such that embedding is possible in the left subsection P1−1 (d<0) of P1 and in the right subsection PM+ (d>0) of PM that are the boundary sections, thereby increasing watermark capacity in the non-circular histogram shifting method.
(1) Coding Process
In the rest sections except for the boundary sections and the residual section, the watermark is embedded in the same manner as embedding process of the non-circular histogram shifting method. In circular form of the histogram domain section, as shown in
P
M
=P
M
−
+P
M
+ (33)
where PM−={z−2k
PM+={z+δ, z+δ+1, . . . , z+δ+2k
divided into a subsection PM− smaller than RM−=z and a subsection PM+ larger than RM+=z+δ. In PM section, two center reference values are generated.
By the center value ℏ of the section P1 to which x1 belongs on the arbitrary code value x1
k1 bits {wn}n=1k
x′
i=(R+2ikdi+α(ki))%22n (16)
where di=xi−R and
Here, the number of shifting bits of the residual value [RM−+1,RM+−1] between PM− and PM+ and the code values that are the center values of respective sections is zero.
Information T on the previous section for the value x′1 shifted to the center value of the adjacent section is determined as follows.
In this way, watermarks are embedded into all code values in the code sequence X without occurrence of intra code and inter code false start codon, and the watermarked non-coding region Γ′(n) is obtained. The additional information required for watermark decoding and restoration of the original code value is the number K of shifting bits for each code value, the marker T of the shifted section, and the LSB bit E of a 2-bit base binary number, like the non-circular method. LSB substitution of the compressed additional information is applied in the same manner as the two methods, and the final watermarked DNA sequence D′ by the substituted region Γ″(n) is transmitted.
(2) Decoding and Restoration Processes
Form the substituted region Γ″(n) of the transmitted DNA sequence, the watermarked region Γ′(n) is obtained by inverse substitution, and then from the code sequence X′ in Γ′(n), the watermark is decoded by (K,T) and the original code sequence is restored.
When the code value x′1 with ki>0 is provided in the code sequence X′, the center value R of the previous section of x′1 is obtained depending on the boundary section and the non-boundary section as follows.
k1 bits {wI}I=1k
w
I=(((x′i−R)%22n)>>(l−1))%2 for l=1, . . . ,ki (39)
x
i
=R+((x′i−R)%22n>>ki) (40)
(3) Watermark Capacity and Additional Information
In the circular histogram shifting method, the watermark is embedded in all sections except for the residual section in the code value histogram domain range. Thus, when the coding order and the maximum number kmax of section shifting bits are given, the number of watermark bits in the embedding region Γ(n) is the sum of the number of shifting bits on the left subsection Pi− (d<0) and the right subsection Pi+ (d>0) of each section, and bpn bpnCHS(n,kmax) thereof is as follows.
The additional information ExtraHS(n,kmax) for watermark extraction and restoration is the same as information in the non-circular histogram shifting method, ExtraHS(n,kmax)=ExtraCHS(n,kmax). Like the above-described methods, lossless compression is performed such that the additional information ExtraCHS(n,kmax) is
The circular histogram shifting method has the same additional information but higher watermark capacity, compared to the non-circular histogram shifting method.
The previous region information of the code value shifted to the center value and information on the number of embedded bits of the code value that belong to all regions except for the residual value region are follows.
Here,
is probability of belonging to the residual value, and ℏ is reference value R={R1, R2, . . . , RM−1, RM1, RM2} of the region. Thus, the bpn of additional data is bpnECHS=NECH/ND [bit/base]. Capacity efficiency OCHS that is a ratio of additional data to the embedded data is CCHS=NWCHS/NECHS=bpnWCHS/bpnECHS.
Although a preferred embodiment of the present invention has been described for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2018-017337 | Feb 2018 | KR | national |