The exemplary embodiment relates to n-gram statistics generated from documents. It finds particular application in connection with a system and method for modifying n-gram statistics for inhibiting document reconstruction from the statistics.
Organizations often see advantages to releasing part of the data they own for reasons of general good, prestige, harnessing the work of those to whom the data is released, or to open access to new resources for financial gain. It may not be feasible to release the data in its original form due to privacy concerns, legal constraints, or economic interest. In such cases, a compromise is to release some statistics computed over the data. For example the statistics released may include n-gram counts for text documents. Here, n-grams are sequences of words of length n words. Examples of the production of such information include the release of copyrighted material (for example, the Google Ngram Corpus) and the exchange of phrase tables for machine translation when the original parallel corpora are private or confidential.
However, there has been considerable interest in trying to reconstruct at least part of a document, given the count of all its n-grams, as disclosed, for example in Matias Tealdi and Matthias Gallé, “Reconstructing documents from perfect n-gram information,” SeqBio, pp. 41-42, 2013, and U.S. application Ser. No. 14/083,483, filed on Nov. 19, 2013, entitled RECONSTRUCTING DOCUMENTS FROM n-GRAM INFORMATION, by Tealdi and Gallé (hereinafter, collectively referred to as Tealdi and Gallé). The method enables reconstruction of parts of a document from the set of its n-grams and their respective counts. The possibility of retrieving large chunks of the original document with absolute certainty is feasible only when the complete n-gram data (this is, all n-grams and their respective counts) is released. In the method of Tealdi and Gallé, a de Bruijn graph of the given n-grams is constructed. In the graph, each n-gram becomes an edge between two nodes, each one denoting an n−1-gram. Each edge is associated with a multiplicity, denoting the number of times it occurs in the corpus. Any Eulerian path through such a graph therefore denotes a plausible document that would produce a n-gram set as the one provided as input. Two reduction steps are used to merge adjacent edges, whenever certain conditions are achieved, that ensure that such a merge corresponds to a substring which has to exist in the original corpus (that is, two edges are merged if and only if any Eulerian path has to transverse these edges sequentially). This allows iterative reconstruction of chunks of text that are larger than size n. It can be shown that the iterative application of these two reduction steps results in an irreducible graph, this is, a graph where no other reduction step is possible. In practice, one of the steps (a global one, involving division points) is hardly used (less than 1%), while being the computationally costlier of the two, and can be omitted. In experiments on the Gutenberg corpus, the method of Tealdi and Gallé is able to reconstruct chunks of an average length of 55.44 words, and an average maximal length of 658.34, starting from the corpus of all 5-grams.
To inhibit such a reconstruction from n-gram statistics, it has been proposed to remove some of the n-grams, for example, by removing all n-grams that occur less than a predefined threshold amount M. This approach has been applied on the Google Ngram corpus (see, Jean-Baptiste Michel, et al., “Quantitative analysis of culture using millions of digitized books,” Science, 331(6014):176-182, 2011, hereinafter “Michel”). Only less frequent evidence is eliminated, which may be less interesting for most applications. However, there are problems with this method. In particular, reconstruction is still possible for many fragments, most of which are correct. Additionally, the utility of such a corpus is greatly reduced for some applications. For example, the measured perplexity of a language model obtained from the corpus is decreased considerably.
There remains a need for a system and method for inhibiting the ability to reconstruct documents from their n-gram statistics while minimizing the impact on the usefulness of the statistics for other purposes.
The following reference, the disclosure of which is incorporated herein in its entirety by reference, is mentioned:
U.S. application Ser. No. 14/083,483, filed on Nov. 19, 2013, entitled RECONSTRUCTING DOCUMENTS FROM n-GRAM INFORMATION, by Tealdi, et al.
In accordance with one aspect of the exemplary embodiment, a method for modifying n-gram statistics includes obtaining n-gram statistics for a sequence of symbols. The n-gram statistics include, for each of a set of n-grams present in the sequence, an associated measure of occurrence in the sequence. An initial directed graph is generated from the n-gram statistics, the directed graph including nodes connected by edges, each edge corresponding to one of the n-grams in the set of n-grams and being associated with a multiplicity which is based on the measure of occurrence. A modified directed graph is generated. This includes adding a plurality of edges to the initial directed graph, the plurality of added edges corresponding to n-grams that are not present in the sequence of symbols and being each associated with a multiplicity. Modified n-gram statistics are generated for the modified graph. The modified n-gram statistics include, for n-grams represented in the graph, an associated measure of occurrence.
At least one of the generating an initial directed graph, generating a modified directed graph, and generating modified n-gram statistics from the modified directed graph may be performed with a processor.
In accordance with another aspect of the exemplary embodiment, a system for modifying n-gram statistics includes a graphing component for generating an initial directed graph from n-gram statistics for a set of n-grams. The initial directed graph including nodes connected by edges. Each edge corresponds to one of the n-grams in the set of n-grams and is associated with a multiplicity derived from the n-gram statistics. A modification component generates a modified directed graph. The modification component performs at least one of: a) for a plurality of iterations, selecting an irregular node from the directed graph and adding an edge to each of two other nodes of the directed graph, each added edge being associated with a multiplicity that reduces the irregularity of the irregular node, and b) for a plurality of iterations, selecting a regular node from the directed graph and adding an edge to each of two other nodes of the graph, each added edge being associated with a multiplicity that increases the irregularity of the regular node. A reconstruction component generates modified n-gram statistics for the modified graph, the modified n-gram statistics including, for n-grams represented in the modified directed graph, an associated measure of occurrence. A processor implements the graphing component, modification component, and reconstruction component.
In accordance with another aspect of the exemplary embodiment, a method for modifying n-gram statistics includes obtaining n-gram statistics for a sequence of symbols. The n-gram statistics include, for each of a set of n-grams present in the sequence, an associated measure of occurrence in the sequence. An initial directed graph is generated from the n-gram statistics. The initial directed graph includes nodes connected by edges. Each of the edges corresponds to one of the n-grams in the set of n-grams and is associated with a multiplicity which is based on the measure of occurrence. A modified directed graph is generated. This includes adding a plurality of edges to the initial directed graph, including at least one of: a) for a plurality of iterations, selecting an irregular node from the directed graph and adding an edge to each of two other nodes of the graph, each added edge being associated with a multiplicity that reduces the irregularity of the irregular node, and b) for a plurality of iterations, selecting a regular node from the directed graph and adding an edge to each of two other nodes of the graph, each added edge being associated with a multiplicity that increases the irregularity of the regular node. Modified n-gram statistics are generated for the modified directed graph. The modified n-gram statistics include, for n-grams represented in the graph, an associated measure of occurrence.
At least one of the generating an initial directed graph, generating a modified graph, and generating modified n-gram statistics from the modified graph may be performed with a processor.
In the exemplary system and method, n-grams are added in a non-deterministic way to n-gram statistics generated from a document or corpus of documents prior to release of the n-gram statistics.
It is assumed that the data to be released is text and that it takes the form of a corpus of n-grams. The goal is to inhibit the reconstruction of larger chunks of text than the released n-grams from the released n-gram statistics while maintaining the utility of the statistics. While the utility may vary from application to application, as an example, the construction of a traditional language model is considered, as a very generic but still concrete usage. Language Models are used in several applications, such as machine translation, speech recognition and other document-access uses.
With reference to
The system includes memory 18 which stores instructions 20 for performing the exemplary method and a processor 22 in communication with the memory for executing the instructions. One or more input/output (I/O) interfaces 24, 26 allow the system to communicate with external devices via a link 28, such as a wired or wireless network, such as the Internet. Hardware components 18, 22, 24, 26 of the system communicate via a data/control bus 30. The system may be hosted by one or more computing devices 32.
The instructions 20 include an n-gram statistics generator 40, a graphing component 42, a modification component 43 including one or more n-gram statistics modification components 44, 46, a reconstruction component 48, and an output component 50. The statistics generator 40 generates initial n-gram statistics 14 from an input text string 12, if this has not already been performed. The graphing component 42 generates a directed graph, specifically a de Bruijn graph 52, from the initial n-gram statistics 14. The exemplary more n-gram statistics modification components 44, 46 include a polishing component 44 and a disturbing component 46, which modify edges of the graph 52 to generate a modified directed graph 54, as described in greater detail below. The reconstruction component 48 computes the n-gram statistics 16 of the modified graph 54. The modified n-gram statistics include, for each n-gram represented in the modified graph 54, an associated measure of occurrence, such as a count, although in this case, at least some of the counts do not correspond to a count of the respective n-gram in the text string 12, which is zero in some instances. The output component 50 outputs the modified n-gram statistics 16.
A Bruijn graph 52, denoted G, is readily constructed from n-gram statistics 14. In the graph, each n-gram becomes an edge between two nodes, each one denoting an n−1-gram. Each edge is associated with a multiplicity, denoting the number of times the n-gram occurs in the n-gram statistics 14. Each multiplicity is thus an integer' value which is greater than 0, such as 1, 2, 3, or 4, etc. For each node x in a de Bruijn graph, there is at least one incoming edge ei with multiplicity ki and at least one outgoing edge ei with multiplicity kj.
The indegree din(x) of a node x of the graph is defined as Σe∈E:head(e)=xmultiplicity (e) i.e., the sum, over all incoming edges, of their multiplicity, and the outdegree dout(x) of a node x is defined as Σg∈E:tail(g)=xmultiplicity (g), i.e., the sum, over all outgoing edges, of their multiplicity. A graph is Eulerian if it is connected and din(x)=dout(x) for all nodes x. In this case, the degree of the node d(x)=din(x)=dout(x). For node x in
An Eulerian cycle through the de Bruijn graph is a cycle that visits each edge e exactly multiplicity(e) times. The set of all Eulerian cycles of G is denoted by ec(G). Given one such Eulerian cycle, its label sequence is the list of labels of its edges and the sequence it represents is the concatenation of these labels.
Without the modifications described herein, given the statistics 14, some reconstruction of the original corpus 12 is feasible, using, for example, the method of Tealdi and Gallé. One local rule applied to the de Bruijn graph in that method is referred to as the Pigeonhole Rule, which is illustrated in the example shown in
More generally, for any node x, with incoming edges ei, outgoing edges gj and their respective multiplicities ki,kj, the Pigeonhole rule is applied whenever ki>d(x)−kj(or kj>d(x)−ki), where d(x) denotes the degree of the node x. These nodes are referred to as irregular nodes. For an irregular node of the graph, the irregularity value is defined as:
δ(x)=maxe∈incoming(x)multiplicity(e)+maxg∈outgoing(x)multiplicity(g)−d(x)>0. (1)
i.e., the sum of the multiplicity of the incoming edge e with the highest multiplicity and the multiplicity of the outgoing edge g with the highest multiplicity minus the degree of the node has a value of δ(x) which is greater than 0. The irregularity values of irregular nodes are thus integer values, such as 1, 2, 3, etc. In general a range of irregularity values is present in the initial graph 52.
As an example, consider the node x in
In the exemplary system and method, instead of removing infrequent n-grams to obfuscate the data as in the method of Michel, n-grams are added in a strategic and non-deterministic way. The system and method specifically targets the application of the Pigeonhole rule. The exemplary system and method operate by polishing irregular nodes so that they become regular and/or by disturbing nodes of any type, creating false irregular nodes.
The computer-implemented system 10 may include one or more computing devices 32, such as a PC, such as a desktop, a laptop, palmtop computer, portable digital assistant (PDA), server computer, cellular telephone, tablet computer, pager, combination thereof, or other computing device capable of executing instructions for performing the exemplary method.
The memory 18 may represent any type of non-transitory computer readable medium such as random access memory (RAM), read only memory (ROM), magnetic disk or tape, optical disk, flash memory, or holographic memory. In one embodiment, the memory 18 comprises a combination of random access memory and read only memory. In some embodiments, the processor 22 and memory 18 may be combined in a single chip. Memory 18 stores instructions for performing the exemplary method as well as the processed data 14, 52, 54.
The network interface 24, 26 allows the computer to communicate with other devices via a computer network, such as a local area network (LAN) or wide area network (WAN), or the internet, and may comprise a modulator/demodulator (MODEM) a router, a cable, and and/or Ethernet port.
The digital processor 22 can be variously embodied, such as by a single-core processor, a dual-core processor (or more generally by a multiple-core processor), a digital processor and cooperating math coprocessor, a digital controller, or the like. The digital processor 22, in addition to controlling the operation of the computer 32, executes instructions stored in memory 18 for performing the method outlined in
The term “software,” as used herein, is intended to encompass any collection or set of instructions executable by a computer or other digital system so as to configure the computer or other digital system to perform the task that is the intent of the software. The term “software” as used herein is intended to encompass such instructions stored in storage medium such as RAM, a hard disk, optical disk, or so forth, and is also intended to encompass so-called “firmware” that is software stored on a ROM or so forth. Such software may be organized in various ways, and may include software components organized as libraries, Internet-based programs stored on a remote server or so forth, source code, interpretive code, object code, directly executable code, and so forth. It is contemplated that the software may invoke system-level code or calls to other software residing on a server or other location to perform certain functions.
At S102, a text string 12 is received, such as a sequence of words forming a document or a document corpus.
At S104, from the string 12, initial n-gram statistics 14 (a list of n-grams and their respective counts in the sequence 12) are obtained. In one embodiment, they are generated by the statistics generator 40. In another embodiment, there is no access provided to the original document(s) 12 and the n-gram statistics 14 are received from an external source that has access to the original document(s) from which the n-grams statistics are generated.
At S106, an input directed (de Bruijn) graph 52 is generated, based on the initial n-gram statistics 14.
At S108, the input graph 52 is modified a plurality of times using one or both of the modification methods (polishing S110, and disturbing S112), to generate a modified graph 54. This may be performed by one or both of the modification component(s) 44, 46. S110 and S112 are described in greater detail below with reference to
At S114, the modified graph 54 is processed, by the reconstruction component 48, to identify the modified n-gram statistics 16 corresponding to some or all of the edges of the modified graph 54 and the multiplicities of these edges.
Optionally, at S116, the statistics 16 may be evaluated, for example, by computing the error rate for reconstruction of sequences longer than n from the modified n-gram statistics 16 (using the method of Tealdi and Gallé, for example) and/or by measuring the perplexity of a language model generated from the modified n-gram statistics, as described, for example, in the Examples below. This may be used to confirm that the statistics would not reveal too much of the original text sequence on the one hand, and on the other, are useful for their intended purpose. In the event that the reconstruction error is not low enough, the method may return to S108 for further iterations of S110 and/or S112 to be performed on the modified graph.
At S118, information is output which may include or be based on the modified statistics 16.
The method ends at S120.
Aspects of the exemplary system and method will now be described in further detail.
Creation of n-gram Statistics (S104)
n-gram statistics 14 can be generated from a text document or document corpus 12. The words (unigrams) can be automatically identified in a text document based on the white spaces between them. The extraction of n-gram statistics may include generating a document-length sequence, which may include adding unique beginning and ending symbols which do not appear as symbols (words) in the document (here illustrated by “$” and “#”). For example, given a very small document which consists solely of the famous quote: A rose is a rose is a rose, based on Gertrude Stein's poem, the document-length sequence generated is $ A rose is a rose is a rose #. Then, starting at a first end, all possible n-grams are extracted, i.e., including overlapping n-grams. For each n-gram, a measure of occurrence, such as count of the number of occurrences in the text sequence 12, is stored. For example, if n is 3, the n-gram statistics would be as shown in TABLE 1:
As will be appreciated, the n-grams may be listed in any order, such as alphabetical order, number of occurrences, or randomly. While the list is illustrated as a table, any suitable data structure may be employed for storing the n-gram statistics.
In the case of two or more documents, their sequences may be concatenated. Punctuation, capitalization, and/or numbers may be ignored in some embodiments. Other pre-processing may also be performed.
Various methods and software packages exist for creation of an initial de Bruijn graph 52 based on n-gram statistics 14. See, for example, Compeau, et al., “How to apply de Bruijn graphs to genome assembly,” Nature Biotechnology 29(11) 987-991 (2011); and Weisstein, Eric W., “De Bruijn Graph”, from MathWorld (http://mathworld.wolfram.com/deBruijnGraph.html).
For example, as illustrated in
The input graph is modified by adding edges ei,gj to the graph with associated multiplicities k. This is achieved by performing polishing and disturbing steps S110, S112, each of which is performed a number of times. While in the illustrated embodiments, all of the polishing iterations (S110) are performed prior to the disturbing iterations (S112), the order can differ. For example, one or more polishing iterations may be preceded and/or followed by one or more disturbing iterations. The total number of added edges may be, for example, at least 0.5%, or at least 1%, or at least 2% or at least 5% of the number of edges in the initial graph, such as at least 100, or at least 1000 or at least 10000 added edges, or more, depending on the size of the text sequence.
In the polishing stage, some (but not all) of the irregular nodes, as defined according to Eqn. 1 above, are converted into regular nodes or are made less irregular, by reducing their irregularity value δ(x). This is achieved by adding new edges to the graph.
Step S110 is illustrated in the flow chart shown in
An example of a polishing iteration is illustrated in
Algorithm 1 illustrates an exemplary implementation of the polishing method. A value for two parameters are selected: K, the number of iterations, and a maximum value of δ, denoted δmax.
The algorithm adds 2K edges to the graph (2 at each iteration), converting, at most, K nodes into regular ones. In order not to add a false n-gram with too large a multiplicity, the multiplicity δ is thresholded by the parameter δmax. This provides an upper bound for multiplicity δ, i.e., to be no greater than δmax. In the exemplary Algorithm, δ is selected as the minimum of the two values δmax and δ(x) (although it could be even smaller). If δmax is large, it has little or no impact on the result. In one embodiment, δmax may be selected by observing the effect of different values of δmax on performance. In another embodiment, it may be selected to be up to a certain multiple of the average multiplicity over the n-gram statistics, e.g., ≦5×average k or ≦2×average k, or ≦0.5×average k. In another embodiment δmax may be a user-selectable parameter. As an example, such as up to 50, or up to 20, or up to 10, or up to 6.
K can be selected to provide a high error rate for reconstructing the original sequence, given the n-gram statistics, while at the same time, maintaining the usefulness of the data, which can be measured in terms of perplexity (a measurement of how well a language model generated from the statistics 16 predicts the next word of the original sequence). The selection of K is a tradeoff and a suitable value may be determined by evaluating the two objectives for different values of K. K may be, for example at least 0.1% or at least 0.5% or at least 1% of the number of edges in the initial graph, such as at least 10, or at least 20, or at least 50, or at least 100, or at least 1000, or more.
The added edges do not increase the degree of node x. The exemplary algorithm breaks the Eulerian nature of the graph (requiring that all nodes are balanced, i.e., din(x)=dout(x)). In particular, although node x is still balanced, nodes u and v have an additional single edge which makes their incoming and outgoing degrees different. One way to avoid this is by grouping all K1 nodes by their δ(x) and creating an Eulerian cycle for each group.
In addition to (or as an alternative to) removing/reducing irregularities in the graph as described for S110, another way of misleading the application of the Pigeonhole rule is by creating false irregular nodes. To do so, edges are added so that δ(x) of a regular node becomes positive. This step converts some (but not all) of the regular nodes of the graph to irregular ones. The method is illustrated in
At S304, two other nodes u and v are selected, e.g., drawn randomly, from the nodes in the graph. These two nodes may be irregular or regular nodes. For each of nodes u and v, an edge is created to the irregular node x (S306, S308). Each edge has the same multiplicity m. If at S310, a predefined number K′ of disturbing iterations has not yet been performed, the method returns to S300 for another iteration of S300-S308, else to S114.
An example of a polishing iteration is illustrated in
Algorithm 2 shows one method for performing the disturbing step S112.
The Algorithm takes as input a number of iterations K′ and a value λ. A suitable value of K′ may be selected as for K. K′ may be, for example at least 0.1% or at least 0.5% or at least 1% of the number of edges in the initial graph, such as at least 10, or at least 20, or at least 50, or at least 100, or at least 1000, or more. In some embodiments, K=K′, although it is to be appreciated that different values can be selected.
In step 2 of the Algorithm, a random node is selected which may be regular or irregular, although in other embodiments, the node x is selected from only regular nodes of the graph.
At step 3, p is chosen using the probability distribution exp(λ). At step 4, the multiplicity m is computed as the degree of node x, plus the floor of p (to convert p to the next lowest integer) plus 1, to ensure that the multiplicity is always greater than the degree of node x. Steps 5 and 6 proceed as for steps 4 and 5 of Algorithm 1.
In some embodiments, the modification methods can be modified slightly in order to keep each node balanced (indegree=outdegree), therefore hiding the fact that the corpus has been modified at all. In one embodiment, these modifications may be performed on the graph after the polishing and disturbing steps are complete. This can be performed by adjusting, as far as reasonably feasible, the multiplicities of those nodes for which indegree ≠ outdegree while maintaining the irregularity value δ(x) of the node.
The exemplary system and method thus add noise to a corpus of n-grams in a way which (1) inhibits the reconstruction of substrings larger than those disclosed in the n-gram statistics while (2) maintaining the utility of the corpus, as measured by the quality of a language model obtained from it. The noise is added to focus on the irregular nodes, which are the key nodes for inferring larger substrings. The irregularity value of a random set of those nodes is removed/reduced and false irregularities are created by making regular nodes irregular (and, potentially, by making irregular nodes more irregular).
The method illustrated in
Alternatively, the method may be implemented in transitory media, such as a transmittable carrier wave in which the control program is embodied as a data signal using transmission media, such as acoustic or light waves, such as those generated during radio wave and infrared data communications, and the like.
The exemplary method may be implemented on one or more general purpose computers, special purpose computer(s), a programmed microprocessor or microcontroller and peripheral integrated circuit elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmable logic device such as a PLD, PLA, FPGA, Graphical card CPU (GPU), or PAL, or the like. In general, any device, capable of implementing a finite state machine that is in turn capable of implementing the flowchart shown in
As will be appreciated, the steps of the method need not all proceed in the order illustrated and fewer, more, or different steps may be performed.
Without intending to limit the scope of the exemplary embodiment, the following examples demonstrate the applicability of the method using a set of documents.
A random sample of 100 books was taken from the Gutenberg project (https://www.gutenberg.org/). All of the 5-grams were extracted from these books and their statistics generated.
First, the robustness of the reconstruction method of Tealdi and Gallé was evaluated, when the approach Michel is taken for removing all n-grams occurring less than M times. M ranged from 1 to 10. The method of Tealdi and Gallé ensures that all obtained chunks are correct, but only if the underlying graph is complete, meaning it reflects exactly all n-grams and their count. However, the method still can yield many correct chunks even when the dataset is incomplete. This is demonstrated by running the method on the incomplete dataset and determining what proportion of the reconstructed chunks are actually correct. Ideally, a significant proportion of the reconstructed chunks should be wrong so that a potential attacker is not able to trust the outcome of the reconstruction.
The presence of reconstructed subsequences longer than 5 words that are present in the original text was evaluated (for efficiency reasons a random sample of 1000 reconstructed subsequences was employed). The results are shown in
This was compared the exemplary method described above. For simplicity the same value of K (i.e., K=K′) is used for Algorithms 1 and 2, resulting in an addition of a total of 4K edges. δmax was set to 20 and λ to 0.5. The error rate of the method is also shown in
In the exemplary method, the aim is not simply to hinder reconstruction, but also to allow the dataset to be useful if released in that manner. To provide a measure of the utility, it is assumed that the goal is to construct a language model out of the collected n-grams, a goal that covers many different applications. The perplexity of a language model created of the modified corpora (either through removing edges, as in the method of Michel, or by adding edges in the present method) is thus evaluated. For this purpose, the CMU-Cambridge Statistical Language Modeling Toolkit v2 was used (http://svr-www.eng.cam.ac.uk/prc14/toolkit.html, described in CLARKSON, et al., “Statistical Language Modeling Using the CMU-Cambridge Toolkit,” Proc. ESCA Eurospeech 1997), with default options (Good-Turing discounting is used).
For testing, an additional 100 random books (not contained in the ones used to create the modified n-gram statistics) were used and average perplexity (ppx) is shown for the language models in
It should be noted that the baseline perplexity, corresponding to M=0, is 7.55, which is only slightly lower than the values obtained by adding edges.
The evaluations suggest that the exemplary method for adding n-grams performs better than the standard method of removing less-frequent n-grams. First, the inferred substrings using the method of Tealdi and Gallé are more likely to be wrong. Second, the utility of the modified n-gram corpus for language modeling is only slightly worse, with respect to learning the language model, than the perfect information.
It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.