Determination of data object properties, such as syntax and semantics, is a fundamental feature of all data management products. Knowledge of a data object's properties enables correct data manipulation and processing. Knowledge of the data object's properties also enables establishment of proper security controls for that data. For example, data masking, or redacting, is an important data management technology which prevents access to sensitive data by unauthorized users. In order to properly mask a data element, the masking application should be knowledgeable of at least the data element's syntax.
The process of discovering data object's syntax and semantics is commonly referred to as “data profiling.” A traditional data profiling application takes a “metadata+data” approach in which at first it makes an attempt of gleaning the data object's type or domain from the available metadata and then tries to match data object's internal structure to a collection of known syntactic patterns each of which is associated with a semantic category such as US Social Security Number, credit card number, geographic location, etc.
This traditional data objects profiling approach suffers from uncertainty in the metadata assessment: there are no metadata naming conventions or rules. For example, a database column containing ABA routing numbers may not contain any indication of its content in its name and be called something like “FI”—an acronym for “Financial Institution”. Furthermore, metadata may be totally misleading. For example, a database column containing “SSN” in its name may not contain US Social Security Numbers as the name may imply but rather a hull classification of a nuclear powered general purpose attack submarine (e.g. SSN-774—the Virginia class).
The traditional approach to profiling data objects typically uses regular expressions (“RegExp”) which provide a binary “match” or “no match” answer when assessing said data object's syntax. The RegExp-based approach does not produce any indicative result when data object's syntax is even slightly different from the template. Also, due to its binary nature, the RegExp-based profiling approach is incapable of providing hints on a direction in which data object's type and domain discovery may proceed.
The above limitations of traditional data profiling methods lead to bloated and often imprecise data discovery tools which are hard to extend and manage. Furthermore, inability to discern data object's syntax using traditional RegExp-based methods impedes the ability to protect said data object by the means of format preserving methods such as format preserving masking or format preserving encryption, thus creating unnecessary security risks with potentially costly consequences.
Accordingly, improvements are needed in systems for data profiling masking data while preserving formatting in a deterministic fashion such that each instance of an original data element when transformed by the data masking system under the same conditions results in the same masked data element having the same format.
While methods, apparatuses, and computer-readable media are described herein by way of examples and embodiments, those skilled in the art recognize that methods, apparatuses, and computer-readable media for determining a data domain of a data object are not limited to the embodiments or drawings described. It should be understood that the drawings and description are not intended to be limited to the particular form disclosed. Rather, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the appended claims. Any headings used herein are for organizational purposes only and are not meant to limit the scope of the description or the claims. As used herein, the word “can” is used in a permissive sense (i.e., meaning having the potential to) rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to.
Applicant has discovered a method, apparatus, and medium which alleviates low adaptability problems related to traditional data object profiling mechanisms. In particular, the present application introduces a profiling technology which is configured to produce multinomial classifications of data objects with an indication of closeness to the ideal model.
The methods for profiling disclosed herein can be implemented as part of a data profiling component, which can be software or hardware, and which can be implemented as a standalone system or be incorporated in an application such as, without a limitation, a data masking system.
The probabilistic methods of data profiling described herein can adhere to Bayesian reasoning by making a priori probabilistic assumptions and rejecting a null hypothesis (“accept”) if further study of a data object in question disproves the null hypothesis. As a result each examined data object can associated with probability of being a member of a certain class (“data domain”). Said Bayesian reasoning can be implemented in a Bayesian inference engine. As discussed further below, upon completion of the examination, the data object can be associated with a data domain with a largest computed probability.
An important benefit of the disclosed method and system is an approach which minimizes expert input required to describe a data domain. This feature reduces time required for configuration of a data profiling system. Additionally, the disclosed method and system provides a dramatic simplification of a data domain description which is achieved by applying unsupervised machine learning methods to computing similarity between a data object instance and a data object model.
The data objects which are profiled can be received or retrieved by the data profiling component from one or more data sources, such as databases, servers, user input, or from any computing device or software application. The data objects can be received in any format, for example, database entries, database columns, database rows, database tables, files, input from a user, or any other computer-readable format.
Data objects can include, without limitation, continuous numbers, discontinuous numbers, strings, symbols, or any combination of these. Data objects can also have associated metadata and characteristics which can be derived and/or extracted from the data object.
As shown in
As used herein, a data domain refers to a data type which optionally can have associated constraints. Data domains can also include object classes, such as those used in object-oriented programming languages. Examples of data domains in databases can include a Social Security Number domain, an address domain, a name domain, etc.
The data domain name is a human readable name of a data domain. The data domain name enables an analyst to associate a certain sematic with the instances of said data domain. Examples of data domain names are “US Social Security Number (SSN)”, “VISA Credit Card Number,” etc.
The data domain identifier (“id”) is a unique identifier for all syntactic variations of a data domain notation. For example, a data domain for a US SSN can be represented by a 9-digit number, a 9-symbol character string, and an 11 characters long string comprised of three groups of digits 3, 2 and, 4 digits long respectively separated by a dash (“-”) symbol. All of these data domains would have the same data domain id.
The data domain type is a data object interpretation hint. Data object types such as string, integer, date, timestamp, and others can enable narrowing of relevant data domains to a subset of data domains that match a type of a data object being profiled.
The data domain object size provides upper and lower data domain object instance size bounds and acceptable variations. For example, a string of either 9 or 11 characters long may represent a US SSN data domain object while a string between 6 and 11 characters long may represent a German passenger car license plate number. During profiling, data domain object size enables narrowing of relevant data domains to a subset which objects instances size satisfies stated limitations. In the example shown in
The list of alphabets comprises one more strings, each of which are comprised of the characters that make up the characters found in data objects of the data domain. Each alphabet can be a string comprising a sequence of characters with ascending or descending encoding in which next character's encoding exceeds previous character's encoding by 1 or more or decreases the previous character's encoding by 1 or more. For example, a sequence of characters “ABCD” in the ASCII encoding can be considered an alphabet while a sequence of characters “ABDE” can be considered to not comprise an alphabet. Alternatively, alphabets can be configured to not require a strict ordering or particular increment. For example, an alphabet can be the sequence of characters “DARZT.” In the example shown in
A positional map is an array, each element of which indicates which alphabets are a source of characters at a given position in an object of the data domain. Each array element indicates at least one alphabet associated with a given position in the data domain object instance. The positions in the positional map are counted left to right with a leftmost position denoted as position 0. In the example shown in FIG. 3 of California license plate numbers, all objects of this domain adhere to the format [A3][A2][A2][A2][A1][A1][A1], where A1 E “0123456789”, A2 E “ABCDEFGHJKLMNOPRQSTUVWXYZ”, and A3 E “234567.” This means, for example, that a standard issue California license plate cannot start with the number “1.”
Data domain special conditions are data domain-specific semantic rules. The data profiling component can utilize the special conditions to corroborate an input data object with the corresponding data domain-specific semantic rule. Data domain-specific rules can be provided or entered by an analyst with specialized domain knowledge. For example, while US SSN are generated randomly, US SSN strings may not have all “0” characters in any of the three parts which constitute a US SSN. As shown in
A lookup table handle is a designator of a list of known values associated with data domains of nominal type. Data objects in nominal data domains are not ordered or possess a distinguishable internal structure. Examples of nominal data domains include a list of world countries, a list of street names in a city, etc. Another type of nominal data domains is binary data domains such as gender designation: “M”, “F”, “male”, “female.”
Locality is an ISO 3166-1 locality code associated with the data domain. Locality code combined with a geographic location in which the input data object is evaluated can provide additional corroboration of the data object's profiling outcome. Examples of ISO 3166-1 locality codes are 840—USA, 450—Madagascar. In situations wherein the data domain is locality neutral, the locality code can be set to 0. Locality neutral data domains do not influence the outcome of a data object profiling process.
The quality coefficient estimates the input data object data domain identification quality in case of a match. This coefficient reflects commonality of a data domain representation by the data domain characteristics. In other words, the quality coefficient indicates a likelihood that a data object which matches the characteristics of the data domain belongs to the data domain. For example, a US SSN formatted as a 9-digit string can be assigned a default quality assurance coefficient 0.80 and a US SSN formatted as a 11 character string of 9 digits separated by two dash (“-”) symbols in positions 4 and 7 counting from left is assigned a default quality assurance coefficient 0.99.
Of course, these characteristics are provided for illustration only, and a data domain may contain fewer or greater characteristics and/or different characteristics.
Returning to
The syntactic definition of data domain can be expressed as one or more alphabets and a positional map, as discussed previously with respect to
For example, the data object 601 has a value of “2” at position 1. Based on positional map 602, the alphabet corresponding to position 1 is alphabet A3. As shown in in the list of alphabets, alphabet A3 includes the value “2.” Therefore, a determination is made that the value of the data object at position 1 is included in the alphabet corresponding to position 1 (2 ∈A3). In another example, the data object 601 has a value of “3” at position 2. Based on positional map 602, the alphabet corresponding to position 2 is alphabet A2. As shown in in the list of alphabets, alphabet A2 does not include the value “3.” Therefore, a determination is made that the value of the data object at position 2 is not included in the alphabet corresponding to position 2 (3 ∉A2).
For every position in which the value of the data object at that position is not included in the alphabet corresponding to that position in the positional map, the syntactic distance is incremented, as shown in box 604. In the example shown in
Additionally, the syntactic distance is incremented by the size of any length differential between the length of the data object 601 and the length of the positional map 602. As shown in
This syntactic distance can be used to compute a probabilistic variable P(s), which is the syntactic match probability between the data object and the data domain. The syntactic match probability established by the means of a feature called “divergence factor” which is computed as the distance calculation discussed above. This syntactic distance, as explained earlier, is the distance of an input data object a to a set of data objects generated by a list of alphabets and a positional map of a data domain. An example of such set is a collection of all US SSN instances, a collection of license plates issued for passenger automobiles in California, etc. The divergence factor is therefore a measure of closeness of the syntax of a sample data object and the syntax (as expressed by the syntactic definition) of a target set of the data objects in the data domain.
The syntactic distance calculation described with respect to
df(a,S)=infimum{df(a,s):s∈S}
where S is a non-empty set of data objects and s is the sample data object from which distance from said set of objects S is being assessed. The set of data objects S can differ from the data domain for which said input data object a is considered for membership because the syntactic definition may not account for special conditions associated with the data domain. In other words, the closed set generated by a positional map and the alphabets can contain elements not permissible in said data domain d. For example, a set of alphabets and a positional map for the US SSN data domain generates values including those with all zeros in the second and the third group of characters.
As discussed with respect to
For example, distance between a 9-digit number representing a US SSN data object and a set of California license plates numbers (mxxxddd) is either 5 or 6: 3 unmatched characters (xxx are uppercase ASCII), the difference in size of 2 characters and a first digit potentially equal to 1 (m is a digit greater than 1).
A larger distance between said input data object and the set of data objects S leads to a smaller probability of said input data object a being a member of said set of data objects S. The syntactic match probability can be computed as:
where d is the distance between sample data object a and a set of data objects S (having member data objects s), n=max(|s|, |a|), and cnorm=1.313. cnorm is a normalization coefficient which ensures that syntactic match probability equals 1 when the distance between said input data object a and said set of data objects S is 0.
The above formula gives a longer data object with a few mismatched positions a better chance of being a member of a class of objects as opposed to short matching sets of data objects. This important property mitigates appearance of false positive matches of short partially matching data objects.
Of course, divergence metrics other than Hausdorff distance can be used for estimation of similarity between a data object and a set of data objects. Similarly, computation of the distance between a sample data object and a set of data objects may be carried out using an alternative approach while the probability of a match can be computed using a sigmoid or a similar function.
Returning to
The characteristic probability value can be given by P(φk|d), which is the probability of data object a having a k-th characteristic φk of the data domain d. Data domain characteristics are responsible for adherence to semantics expressed by the means of the data domain special conditions. Probability values associated with semantic characteristics can be empirical and can be supplied by an analyst.
For example, a US SSN data domain instance may take a form of a 9-digit number, a 9-character string which represents a 9-digit number or a 11 character string containing 9 digits separated by two dash (“-”) symbols in positions 4 and 7 counting from left. Semantically, none of the digit sequences may be comprised of only “0” sequences and the leftmost three digit sequence value cannot exceed 899. While syntactic characteristics of an input data object are verified by the means of the divergence factor/syntactic distance computation, an analyst can supply probabilities of a data domain match for semantic characteristics. Said empirical probabilities can reflect local data quality tolerance levels.
Alternatively, probability values can be determined based upon an automated comparison of characteristics of the data object which can be extracted from the data object and the characteristics associated with the data domain. For example, the probability values can be assigned by a probabilistic classifier or software module which relies upon a corpus of training data and analyzes the characteristics extracted from the data object in conjunction with the characteristics of the data domain.
Box 800 illustrates the determined characteristic probability values for each characteristic of data domain 801. As shown in box 800, the characteristic probability value for the data object having a name matching the domain name is 1%. This indicates that there is a 1% chance that the characters “232-43-613” belong to a domain having the name “California passenger license plate.” The characteristic probability value for the data object having a type matching the domain type is 70%. This indicates that there is a 70% chance that the characters “232-43-613” are a string (as opposed to, for example, a sequence of three independent numbers). The characteristic probability value for the data object having an object size matching the domain type is 0%. This is because the data object size of 10 is plainly greater than the upper bound size of 7 encoded in the data domain size characteristic. The characteristic probability value for the data object having special conditions matching the domain type is 100%. This can be, for example, because the data object 802 does not violate the special conditions of the data domain 801. Additionally, the characteristic probability value for the data object having a locality code matching the domain type is 85%. This can be based, for example, on an assessment that the format of the data code could fit the profile of a US SSN and therefore could likely have a locality code of 840, corresponding to the US.
Returning to
The data profiling component can utilize a multinomial Naïve Bayes model for its operation. Probability of an input data object a being a representative of data domain d, P|a), can be computed as:
P(d|a)∝P(s)Π1≦k≦n
where P(s) is the syntactic match probability that is based on the divergence factor or syntactic distance, nd is the number of characteristics in data domain d, and Π1≦k≦n
At step 104 a data domain in the one or more data domains is determined which corresponds to the data object based at least in part on the probability of the data object belonging to each of the one or more data domains. This step can involve simply selecting the data domain with the highest associated probability out of all of the data domains as corresponding to the data object. This step can also include verifying that the probability associated with the highest ranking domain exceeds a minimum threshold. The minimum threshold can be set by an analyst and can be used to ensure that the data object is not linked to a data domain to which it has only a minimal probability of belonging. If the highest ranking domain does not exceed the minimum threshold, then a domain can be selected by the analyst.
Analyst intervention in the decision making process can also be requested when the number of potential data domains which have a probability of corresponding to the data object above a certain probability threshold exceeds a certain threshold. For example, if six different domains have a probability above 80%, then an analyst can make a final determination regarding which domain to select as corresponding to the data object.
When user/analyst input is required, this step can include outputting one or more probabilities associated with one or more of the data domains (for example, the top N domains), along with relevant information about the domains, and determining a domain corresponding to the data object based upon a user selection of one of the domains outputted.
At step 901 one or more syntactic match probabilities corresponding to one or more data domains are computed, each syntactic match probability being based at least in part on a syntactic distance between the data object and a syntactic definition of a corresponding data domain. This step is similar to step 101 of
At step 902 a plurality of characteristic probability values corresponding to each data domain in the one or more data domains are determined, wherein each characteristic probability value corresponds to a probability of the data object having a characteristic of a corresponding data domain. This step is similar to step 102 of
At step 903 one or more ratios of syntactic variations, P(d), corresponding to the one or more domains are determined. Each ratio of syntactic variations comprises a quantity of syntactic variations corresponding to each data domain divided by a total quantity of data domains.
A data domain syntactic variation is a pattern recognized as a representative of a given data domain. For example, in an exemplary collection of data domains US SSN may be represented by a 9-digit number, a 9-character sequence of decimal digits, a 9-character sequence of decimal digits with a dash symbol after the third and the fifth digits. In this case, the US SSN data domain can be considered to have three syntactic variations.
Returning to
The contextual coefficients, P(c) fall in the range 0<P(c)≦1. The contextual coefficient can initially be set to P(c)=0.5. The contextual coefficient increases when a data object's instance profiling context is supportive of the data object belonging to a particular data domain and decreases otherwise. In other words, the contextual coefficient reflects the influence of the context in which profiling of a given data object a is taking place.
The factors which constitute data profiling context can include, without limitation, presence of known related information, metadata (such as the metadata associated with a data object and described with reference to
At step 905 of
The quality coefficient, P(q) falls in the range 0<P(q)≦1. This coefficient estimates the input data object data domain identification quality in case of a match and reflects commonality of a data domain representation by the data domain characteristics.
The quality coefficient can be assigned to data domains by analyst based on the analysts prior experience, or can be assigned through an automated process based upon an analysis of training data or previous data sets.
For example, a US SSN formatted as a 9-digit string can assigned a default quality assurance coefficient 0.80 and a US SSN formatted as a 11 character string of 9 digits separated by two dash (“-”) symbols in positions 4 and 7 counting from left can be assigned a default quality assurance coefficient 0.99, meaning that the 11 character string is more likely to correspond to a US SSN. Similarly, a 16 digit number in 4 groups of 4 digits separated by spaces can be considered more likely to be a credit card number than just 16 digits, which may be an international phone number. This can result in the 16 digit number separated by spaces being assigned a quality coefficient of 0.99 and the non-spaced 16 digit number being assigned a quality coefficient of 0.9.
At step 906 of
The data profiling component can utilize a multinomial Naïve Bayes model for its operation. Probability of an input data object a being a representative of data domain d, P(d|a), can be computed as:
P(d|a)∝P(d)P(c)P(q)P(s)Π1≦k≦n
P(d) is the ratio of syntactic variations and is given by
where N is the total number of data domains in the collection of data domains, Nd is the number of data domain d syntactic variations in the collection of data domains.
P(c) is the contextual coefficient, discussed earlier.
P(q) is the quality coefficient, also discussed earlier.
P(s) is the syntactic match probability that is based on the divergence factor or syntactic distance, nd is the number of characteristics in data domain d, and Π1≦k≦n
At step 907 a data domain in the one or more data domains is determined which corresponds to the data object based at least in part on the probability of the data object belonging to each of the one or more data domains. This step can involve simply selecting the data domain with the highest associated probability out of all of the data domains as corresponding to the data object. This step can also include verifying that the probability associated with the highest ranking domain exceeds a minimum threshold. The minimum threshold can be set by an analyst and can be used to ensure that the data object is not linked to a data domain to which it has only a minimal probability of belonging. If the highest ranking domain does not exceed the minimum threshold, then a domain can be selected by the analyst.
Analyst intervention in the decision making process can also be requested when the number of potential data domains which have a probability of corresponding to the data object above a certain probability threshold exceeds a certain threshold. For example, if six different domains have a probability above 80%, then an analyst can make a final determination regarding which domain to select as corresponding to the data object.
When user/analyst input is required, this step can include outputting one or more probabilities associated with one or more of the data domains (for example, the top N domains), along with relevant information about the domains, and determining a domain corresponding to the data object based upon a user selection of one of the domains outputted.
A result of the process performed by the data profiling component is a probability of said input data object a being a member of said data domain d. In practice said input data object a is matched against a plurality of data domains D={di}, i=1, . . . , k possibly resulting in a plurality of results P(D)={p(a|dj}, j=1, . . . , r where p(a|dj) is a probability of said input data object a being a member of data domain dj.
The probabilistic method of data profiling disclosed herein can also be used to establish a metric of data quality. Consider a non-empty collection of a plurality of data objects {circumflex over (X)}={xi}, i=1, . . . , m which were determined to belong to data domain X. Application of said probabilistic method of data profiling to said plurality of data objects produces a plurality of probabilities, {pi}, i=1, . . . , m, where pi is a probability of data object xi being a member of data domain X. As discussed below, the plurality probabilities can then be used to compute a metric of data quality metric for the data domain.
At step 1101 a standard deviation of a plurality of probabilities of the plurality of data objects belonging to the at least one data domain in the one or more data domains is computed. The standard deviation, sm, of the collection of probabilities pi is given by the equation
where
At step 1102 a t value is computed based at least in part on the standard deviation and a mean probability of the plurality of probabilities. The t value is computed as
At step 1103 a degree of correlation between the plurality of data objects and the data domain is determined based at least in part on a t-distribution and the t value.
Referring to
Returning to
The degree of correlation between the plurality of data objects and the data domain can itself serve as the metric of data quality, in which case this step merely involves assigning the degree of correlation to serve as the metric of data quality. Using the above methods, a standard deviation of zero would indicate the highest possible metric of data quality.
Additionally, the metric of data quality can be represented by bands established by an analyst, with the bands expressing data quality in semantic terms. For example, when a degree of correlation for a plurality of data objects is below 0.9 then the data quality metric can be considered to be poor. When a degree of correlation for a plurality of data objects is above 0.90 but is below 0.97 then data quality metric can be considered to be medium. When a degree of correlation for a plurality of data objects exceeds 0.97 but is below 0.99 then the data quality metric can be considered to be good. When a degree of correlation for a plurality of data objects exceeds 0.99 then the data quality metric can be considered to be excellent.
Of course, other methods of data quality characterization can also be used based upon the degree of correlation. For example, the degree of correlation can be rescaled to some other interval such as [0,100] or a different number of ranges can be employed for a semantic data quality determination.
The data quality metric disclosed herein can be used to determine a similarity of a first plurality of data domains to a second plurality of data domains.
At step 1301 a first plurality of metrics of data quality are computed for the first plurality of data domains. At step 1302 a second plurality of metrics of data quality are computed for the second plurality of data domains. The metrics of data quality can be computed as described with respect to
At step 1303 a similarity is determined between the first plurality of data domains and the second plurality of data domains based at least in part on the first plurality of metrics of data quality and the second plurality of metrics of data quality. This step is explained in greater detail below.
Consider a plurality of data domains A={diA}, i=1 . . . n and a plurality of data domains B={diB}, j=1 . . . m and respective vector representations of computed data quality metrics {right arrow over (PA)}=(pd
Similarity between said third and fourth pluralities of data domains can be established by computing cosine similarity between third and fourth data domains quality metrics vectors {right arrow over (PA*)} and {right arrow over (PB*)}:
where the numerator is a dot product (“inner product”) of said third and fourth data domains quality metrics vectors and the denominator is a product of said third and fourth data domains quality metrics vectors Euclidian length.
The similarity computation between third and fourth data domains quality metrics vectors also reflects similarity between the first and the second data domains quality metrics vectors which in turn establishes similarity between said pluralities of data domains A and B.
The first plurality of data domains can represent a predefined pattern such as a collection of data domains which comprise Personal Identification Information (PII). For such a predefined collection of data domains, the respective data domains quality metrics vector will be a unit vector with data quality metric 1 for each vector coordinate.
Computation of similarity between an arbitrary collection of data domains and a predefined collection of data domains can be illustrated by the following example. Consider a predefined collection of four data domains A, B, C and D and a database table in which columns containing values corresponding to data domains A, B and C have data quality metrics of 0.9, 0.95 and 0.99 respectively while data domain D is not present in said database table.
The inner product of the data quality metrics vectors is 0.9*1+0.95*1+0.99*1+0*1=2.84. Euclidian length of said predefined collection's vector is √{square root over (4)}=2 and Euclidian length of the data quality metrics vector of said database data domains is √{square root over (0.92+0.952+0.992+0)}=1.64. Similarity between the two collections of data domains is
which indicates that said collection of data domains in said database table is close to said predefined collection of data domains.
Of course, other methods of computing similarity between collections of data domains can be used. For example, a Pearson-r correlation-based similarity metric can be utilized for this purpose.
Additionally, by comparing a computed quality metrics vector for a plurality of data domains with a quality metrics vector of a predefined collection of data domains, a similarity can be established between the plurality of data domains and a predefined collection of data domains, thus enabling discovery of sensitive information in data repositories. In this case, the sensitive information data domain can be used as the predefined collection of data domains and used to compute a similarity with data domains stored in data repositories.
The timely discovery of sensitive information in disparate data repositories enables prevention of inference attacks in which an adversary capable of combining information from low sensitivity sources can reconstruct information of much higher sensitivity than the original information sensitivity. For example, by combining information from three databases each containing a fraction of personal information such as user full name, US Social Security number and a credit card number an adversary can impersonate the actual user while each of the above items taken separately is not sufficient for a successful impersonation attack.
Additionally, the method described above for determining data quality metrics and similarity between pluralities of data domains can be be produced by methods other than the specific probabilistic profiling methods disclosed herein. All that is required is a plurality of probabilities corresponding to the plurality of data domains.
One or more of the above-described techniques can be implemented in or involve one or more computer systems.
With reference to
A computing environment can have additional features. For example, the computing environment 1300 includes storage 1340, one or more input devices 1350, one or more output devices 1360, and one or more communication connections 1390. An interconnection mechanism 1370, such as a bus, controller, or network interconnects the components of the computing environment 1300. Typically, operating system software or firmware (not shown) provides an operating environment for other software executing in the computing environment 1300, and coordinates activities of the components of the computing environment 1300.
The storage 1340 can be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment 1300. The storage 1340 can store instructions for the software 1380.
The input device(s) 1350 can be a touch input device such as a keyboard, mouse, pen, trackball, touch screen, or game controller, a voice input device, a scanning device, a digital camera, remote control, or another device that provides input to the computing environment 1300. The output device(s) 1360 can be a display, television, monitor, printer, speaker, or another device that provides output from the computing environment 1300.
The communication connection(s) 1390 enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, audio or video information, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
Implementations can be described in the context of computer-readable media. Computer-readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, within the computing environment 1300, computer-readable media include memory 1320, storage 1340, communication media, and combinations of any of the above.
Of course,
Having described and illustrated the principles of our invention with reference to the described embodiment, it will be recognized that the described embodiment can be modified in arrangement and detail without departing from such principles. Elements of the described embodiment shown in software can be implemented in hardware and vice versa.
In view of the many possible embodiments to which the principles of our invention can be applied, we claim as our invention all such embodiments as can come within the scope and spirit of the following claims and equivalents thereto.
This application is a continuation-in-part of application Ser. No. 15/591,661, filed May 10, 2017 and titled “METHOD, APPARATUS, AND COMPUTER-READABLE MEDIUM FOR AUTOMATED CONSTRUCTION OF DATA MASKS,” which is itself a continuation-in-part of application Ser. No. 15/161,586, filed May 23, 2016 and titled “METHOD, APPARATUS, AND COMPUTER-READABLE MEDIUM FOR MASKING DATA,” the disclosures of which are hereby incorporated by reference in their entirety.
Number | Date | Country | |
---|---|---|---|
Parent | 15591661 | May 2017 | US |
Child | 15645843 | US | |
Parent | 15161586 | May 2016 | US |
Child | 15591661 | US |