Particular embodiments of the invention relate generally to the field of data, and more particularly to a metadata data catalog.
One way to classify data is through the use of metadata. Generally, metadata is used to describe digital data. Metadata may describe the contents and context of data files. In some instances metadata data may be described by a number of categories. Further, data may, in some instances, be stored on multiple physical devices. Metadata is useful in allowing a user to determine the characteristics of a digital data source and make decisions based on those characteristics.
In certain embodiments, a system maintains a plurality of metadata elements. Each metadata element indicates a current classification value for user data described by that metadata element. The system detects the occurrence of an event and automatically determines which of the metadata elements are affected by the event. For each metadata element affected by the event, the system automatically determines an updated classification value for the user data described by that metadata element and dynamically modifies the metadata element to indicate the updated classification value.
Certain embodiments of the present disclosure may provide one or more technical advantages. For example, a technical advantage of one embodiment includes classifying digital data. A technical advantage of an embodiment includes controlling access to digital files. A technical advantage of an embodiment includes changing the classification of data across multiple platforms. For example, a system may detect a trigger, such as a user-indicated event or a time-based event, and may update the classification for the affected data, which may span multiple platforms in certain embodiments.
Certain embodiments of the present disclosure include some, all, or none of the above advantages. One or more other technical advantages may be readily apparent to those skilled in the art from the figures, descriptions, and claims included herein.
For a more complete understanding of the present invention and its features and advantages, reference is now made to the following description, taken in conjunction with the accompanying drawings, in which:
A basic and pervasive problem facing businesses is that increasing volumes of data must be tracked and safeguarded according to increasingly complex combinations of legal, regulatory, and business requirements. Conventional methods for tracking and safeguarding data involve manually designating data as either confidential or non-confidential. The conventional methods may be prone to error and are no longer sufficient. For example, as requirements and status change, it may be difficult to identify and change all of the affected data protections using a manual process.
To meet current and future needs, a broader, flexible classification approach is called for. A flexible classification approach may classify data along multiple dimensions, acknowledge and provide for changes to classification based on time or other trigger events, and/or allow for protections to be automated and dynamic. For example, an aggregate risk may be determined dynamically as requirements and status change, and changes may be made to the protections that are commensurate with the aggregate risk. Embodiments of the present disclosure may provide a flexible classification approach, as further described with respect to
The classification value associated with user data may be changed in response to an event. In some embodiments, user 102, user device 104, may generate an event. For example, an authorized user 102 may input information into device 104 indicating that the financial report has been approved for publication. Device 104 may send the publication event to data classification service module 112 via network 110. Data classification service module 112 communicates the event to an appropriate metadata manager module 106 to update a logical data element 126 associated with the financial report. Thus, logical data element 126 reflects that, as a result of the publication event, the classification value of the financial report has changed from confidential to non-confidential. Metadata manager module 106 communicates the updated logical data element 126 to an appropriate business application module 130 where access control is checked and a corresponding physical data element 146 is updated. The physical data element 146 may exist on hardware (e.g., a source that stores the affected user data, such as the financial report) and may be managed according to access rules that define permissions for modifying the physical data element 146.
In some embodiments, user 102 includes clients, customers, employees, entities, or automated systems that can utilize system 100. As an example, an automated system may monitor or receive information from any suitable source and may generate an event based on the information. Examples of sources may include a person, one or more documents (such as a spreadsheet that contains data), the Internet (which may include articles and other information containing data), an open source intelligence report, a media outlet such as a television station or a radio station that broadcasts information), a clock or calendar, any other suitable source of information, or any combination of the proceeding. Certain users 102, such as employees or other persons, may interact with system 100 via device 104. Other users 102, such as automated systems, may run on device 104 (which may refer to any suitable computing resources). In general, device 104 sends event information to data classification service module 112 via network 110.
Network 110 facilitates communications between device 104, data classification service module 112, metadata manager 106, business application module 130, and/or any other suitable device. This disclosure contemplates any suitable network 110 operable to facilitate communication between the components of system 100. Network 110 may include any interconnecting system capable of transmitting audio, video, signals, data, messages, or any combination of the preceding. Network 110 may include all or a portion of a public switched telephone network (PSTN), a public or private data network, a local area network (LAN), a metropolitan area network (MAN), a wide area network (WAN), a local, regional, or global communication or computer network, such as the Internet, a wireline or wireless network, an enterprise intranet, or any other suitable communication link, including combinations thereof, operable to facilitate communication between the components of system 100. This disclosure contemplates end networks having one or more of the described properties of network 110.
In some embodiments device 104 may be representative of a personal computer, an electronic notebook, a cellular telephone, an electronic tablet device, a laptop, a personal digital assistant (PDA), or any other suitable device (wireless or otherwise: some of which can perform web browsing), component, or element capable of accessing one or more elements within system 100. Device 104 may optionally comprise any suitable interface for a user such as a video camera, a microphone, a keyboard, a set of buttons, a mouse, a touch-sensitive display, a touch-sensitive area, or any other appropriate equipment according to particular configurations and arrangements. In addition, device 104 may contain an element or set of elements designed specifically for communications involving system 100. Such elements may be fabricated or produced specifically for use in system 100. Although examples of device 104 could include end user devices in certain embodiments, device 104 need not be limited to end user devices. For example, for embodiments in which an automated system acts as a user 102, the device 104 that runs the automated system may be a server or an enterprise-level computing system.
In some embodiments, device 104 may include a graphical user interface (GUI) 105. GUI 105 is generally operable to tailor and filter data entered by and presented to user 102. GUI 105 may provide user 102 with an efficient and user-friendly presentation of information and allow user 102 to input an event. GUI 105 may comprise a plurality of displays having interactive fields, pull-down lists, and buttons operated by user 102. GUI 105 may include multiple levels of abstraction including groupings and boundaries. It should be understood that the term GUI 105 may be used in the singular or in the plural to describe one or more GUIs 105 and each of the displays of a particular GUI 105.
In some embodiments, system 100 may include one or more data classification service modules 112. In general, data classification service module 112 detects the occurrence of an event associated with a metadata element, applies classification rules 122 to automatically determine an updated classification value for user data described by a metadata element, and communicates instructions to metadata manager module 106 via network 110. Specific components of data classification service module 112 are described in more detail in
In some embodiments, data classification service module 112 may receive an event from device 104. An event may include anything that could change the classification of user data. For example, an event could indicate that user data is no longer confidential (e.g., a report that was classified as confidential before being filed with the Securities and Exchange Commission (SEC) becomes public information after being filed with the SEC and thus may be reclassified as non-confidential). An event could also indicate that a party has transacted Y number of payment card transactions and, as a result, the party should be reclassified from a current payment card industry (PCI) compliance level X to a new PCI compliance level Y. As another example, an event may be a time-based event (e.g., occurs after the expiry of a timer or at a pre-defined date or time). As a further example, an event may be initiated by a user. For example, user 102 could communicate an event to device 104 via GUI 105. In some embodiments, device 104 communicates the event to data classification service module 112 via network 110.
In the illustrated embodiment, data classification service module 112 contains classification rules 112. Data classification service module 112 may utilize classification rules 122 to automatically determine which, if any, metadata elements are affected by the event. The basic principle of data classification is that data classification is based on business and regulatory requirements. Data classification rules are expressed in business terms. Each classification rule may be defined in a table and linked to one or more logical data elements 126 or a groups of logical data elements 126. In general, data classification service module 112 applies classification rules 122 to an event to determine which, if any, metadata elements require an updated classification value. For example, a classification value could indicate whether user data located in a physical data element is confidential. As a further example, a classification value could include a PCI compliance level for certain user data associated with a financial account.
Data classification service module 112, through application of classification rules 122, may determine that user data that is categorized as confidential may no longer need to be categorized as confidential, or vice versa. If data classification service module 112 determines that a classification value associated with user data needs to be updated, data classification service module 112 may communicate instructions for updating the classification value to one or more metadata manager modules 106 via network 110, each metadata manager module 106 associated with a logical data element 126 that corresponds to the affected user data.
System 100 may also include metadata manager module 106. Metadata manager module 106 facilitates dynamically modifying a metadata element to indicate an updated classification value. In general, metadata module 106 receives instructions for updating a classification value, applies standardization rules 124 and transformation rules 140 to a logical data element 126, and communicates the logical data element 126 to one or more business applications modules 130 via network 110. Specific components of data classification service module 112 are described in more detail in
In the illustrated embodiment, metadata manager module 106 is communicatively coupled to metadata manager database 108. Once metadata manager 106 receives instructions from data classification service module 112, metadata manager module 106 may request one or more logical data elements 126 associated with the instructions from metadata manager database 108. Metadata manager database may provide the requested logical data elements to metadata manager 106 via network 110.
In general, metadata manager database 108 includes logical data elements 126 and/or other suitable data. Metadata manager database 108 may refer to any suitable device capable of storing and facilitating retrieval of data and/or instructions. Examples of metadata manager database 108 include computer memory (for example, Random Access Memory (RAM) or Read Only Memory (ROM)), mass storage media (for example, a hard disk), removable storage media (for example, a Compact Disk (CD) or a Digital Video Disk (DVD)), database and/or network storage (for example, a server), and/or or any other volatile or non-volatile, non-transitory computer-readable memory devices that store one or more files, lists, tables, or other arrangements of information. Although
Metadata manager module 106 includes standardization rules 124. Standardization rules 124 generally refer to logic, rules, algorithms, code, tables, and/or other suitable instructions embodied in a computer-readable storage medium for performing the described functions and operations of data classification service module 112. For example, standardization rules 124 facilitate transforming instructions received from data classification service module 112 via network 110 into a common data format associated with metadata manager module 106. Each metadata manager module 106 may contain the same or different standardization rules 124. In an embodiment, metadata manager module 106 and/or an associated business application module 130 may work with data in a particular format. In this example, standardization rules 124 transform instructions provided by data classification service module 112 into the suitable format. While illustrated as including a particular module, standardization rules 124 may include any suitable information for use in the operation of data classification metadata manager module 106.
In the illustrated embodiment, metadata manager module 106 includes transformation rules 140. Transformation rules 140 generally refer to logic, rules, algorithms, code, tables, and/or other suitable instructions embodied in a computer-readable storage medium for performing the described functions and operations of data classification service module 112. For example, metadata manager module 106 may apply transformation rules 140 to the received logical data element 126 to update its classification value. For example, transformation rules 140 could change a logical data element's classification from confidential to public or vice versa. In an embodiment, system 100 may apply transformation rules 140 to a plurality of logical data elements 126. In an example, the same or difference transformation rules 140 may be applied to each logical data element 126. Each logical data element 126 may be associated with one or more physical data elements 146. In an embodiment, metadata manager module 106 maps the updated logical data element 126 to one or more physical data elements 146 and communicates the updated logical data element 126 to one or more business application modules 130 associated with the one or more physical data elements 146 via network 110.
In some embodiments, system 100 may include one or more business application modules 130. In general, business application module 130 receives an updated logical data element classification associated with a physical data element 146, determines whether the source of event has permission to modify the classification value of physical data element 146, and communicates instructions to change the classification value of physical data element 146 located application data store 144.
Application data store 144 may refer to any suitable device capable of storing and facilitating retrieval of data and/or instructions. In general, application data store includes physical data elements 146, user data, and/or any other suitable data. Examples of application data store 144 include computer memory (for example, RAM or ROM), mass storage media (for example, a hard disk), removable storage media (for example, a CD of DVD), database and/or network storage (for example, a server), and/or or any other volatile or non-volatile, non-transitory computer-readable memory devices that store one or more files, lists, tables, or other arrangements of information. Although
Business application module 130 receives an updated logical data element 126 from metadata manager module 106 via network 110. In an embodiment, business application module 130 applies access rules 110 to the received information. Generally, access rules 110 determine whether the source of the event has permission to modify the physical data element 146. For example, some user data may be classified in a way that only certain sources may update the corresponding physical data element. For example, a junior level employee may not have permission to make an SEC report public, but a senior level employee may have permission to make the SEC report public. In this example, business application module 130 may not update a corresponding physical data element 146 if the junior level employee attempts to make the SEC report public. However, if the senior level employee attempts to make the SEC report public, business application module 130 will utilize information received from metadata manager module 106 to update the physical data element 146 associated with the SEC report.
In the illustrated embodiment, module 200 includes interface 202, processor 204, memory 206, input 212, and output 214. Memory 206 may refer to any suitable device capable of storing and facilitating retrieval of data and/or instructions. Examples of memory 206 include computer memory (for example RAM or ROM), mass storage media (for example, a hard disk), removable storage media (for example, a CD or DVD), database and/or network storage (for example, a server), and/or or any other volatile or non-volatile, non-transitory computer-readable memory devices that store one or more files, lists, tables, or other arrangements of information. Although
Memory 206 is generally operable to store rules 208 and data elements 210. Rules 208 generally refer to logic, rules, algorithms, code, tables, and/or other suitable instructions embodied in a computer-readable storage medium for performing the described functions and operations of module 200. For example, rules 208 may be representative of classification rules 122, standardization rules 124, transformation rules 140, and/or access rules 110. While illustrated as including a particular module, rules 208 may include any suitable information for use in the operation of module 200.
Memory 206 may also store data elements 210. Data elements 210 generally refer to logic, rules, algorithms, code, tables, and/or other suitable instructions embodied in a computer-readable storage medium for performing the described functions and operations of module 200. For example, data elements 210 could include logical data elements, physical data elements, user data, any other suitable data, or any combination of the preceding. While illustrated as including a particular module, data elements 210 may include any suitable information for use in the operation of module 200.
Memory 206 communicatively couples to processor 204. Processor 204 is generally operable to execute rules 208 stored in memory 206. Processor 204 may comprise any suitable combination of hardware and software implemented in one or more modules to execute instructions and manipulate data to perform the described functions for module 200. In some embodiments, processor 204 may include, for example, one or more computers, one or more central processing units (CPUs), one or more microprocessors, one or more applications, and/or other logic.
In some embodiments, interface 202 is communicatively coupled to processor 204 and may refer to any suitable device operable to receive input for module 200, send output from module 200, perform suitable processing of the input or output or both, communicate to other devices, or any combination of the preceding. Interface 202 may include appropriate hardware (e.g. modem, network interface card, etc.) and software, including protocol conversion and data processing capabilities, to communicate through network 110 or other communication system that allows module 200 to communicate to other devices. Interface 202 may include any suitable software operable to access data from various devices such as device 104, data classification service module 112, business application module 130, metadata manager module 106, and/or any other suitable data source. Interface 202 may also include any suitable software operable to transmit data to various devices such as user 10, device 104, data classification service module 112, business application module 130, metadata manager module 106, and/or any other suitable device. Interface 202 may include one or more ports, conversion software, or both.
In some embodiments, input device 212 may refer to any suitable device operable to input, select, and/or manipulate various data and information. Input device 212 may include, for example, a keyboard, mouse, graphics tablet, joystick, light pen, microphone, scanner, or other suitable input device. Output device 214 may refer to any suitable device operable for displaying information to a user. Output device 214 may include, for example, a video display, a printer, a plotter, or other suitable output device.
Modifications, additions, or omissions may be made to system 200 without departing from the scope of the invention. For example, system 200 may include any number of processors 204, memory 206, interfaces 202, input devices 212, and/or output devices 214. Furthermore, the components of system 200 may be integrated or separated. For example, in particular implementations, memory 206 may be integrated as a single component with metadata manager database 208 or application data stores 144.
At step 306, data classification service module 112 determines an updated classification value for the affected metadata element as discussed previously. Data classification service module 112 may apply classification rules 122 to make this determination as discussed. Data classification service module may communicate the updated classification value to metadata manager module 106 and/or business application module 130 via network 110.
At step 308, system 100 dynamically modifies metadata elements associated with the updated classification value. This step may be completed by metadata manager module 106 and/or business application module 130. This step is discussed in more detail in the disclosure relating to
Modifications, additions, or omissions may be made to the method depicted in
At step 404, metadata manager module 106 updates the identified logical data element 126's classification value. As discussed previously, metadata manager module 106 applies standardization rules 124 and transformation rules 140 to update the identified logical data element 126's classification value. The method proceeds to step 406 where metadata manager module 106 maps the logical data element 126 to each associated physical data element 146 and communicates the logical data element 126 to the physical data element 146's corresponding business application module 130.
Business application module 130 determines whether the source of the event has permission to modify the associated physical data element 146 at step 408. As discussed previously, physical data elements 146 may be classified in a way where only certain sources or users may modify the physical data elements 146. If business application module 130 determines that the source does not have permission, the method proceeds to step 412 where it is terminated. If, however, business application module 130 determines that the source does have permission, then the method proceeds to step 410 where business application module 130 updates the classification value of physical data element 146. After the physical data element classification value is updates, the method proceeds to step 412 where the method is terminated.
Modifications, additions, or omissions may be made to the method depicted in
Although the present disclosure has been described with several embodiments, a myriad of changes, variations, alterations, transformations, and modifications may be suggested to one skilled in the art, and it is intended that the present invention encompass such changes, variations, alterations, transformations, and modifications as fall within the scope of the appended claims.