Short-circuit data access

Information

  • Patent Grant
  • 11157641
  • Patent Number
    11,157,641
  • Date Filed
    Friday, July 1, 2016
    8 years ago
  • Date Issued
    Tuesday, October 26, 2021
    3 years ago
Abstract
A policy system enforces data security policies for requests from accessing data stored on a distributed data storage system received from a client device. The policy enforcement system can determine user credentials from the requests. The enforcement system then determines whether the user credentials allow the request to retrieve the data and if yes, whether the user credentials allow the request to retrieve the data without obligations. Upon determining that user credentials allow the request to retrieve the data without obligations, the policy enforcement system directs the client device to communicate directly with a name node of the data storage system, short-circuiting additional data retrieval and filtering of the policy system.
Description
BACKGROUND

In a distributed data storage system, data can be stored on multiple file systems. Various applications from business users, data scientists, analysts or developers can access the data. Each application can correspond to a specific user or a specific group of users. Each user or group of users can have particular access privileges to certain portions of the data. Various data security policies can specify which user can access which portions of data. A client device can submit a request to access data stored in the distributed data storage system. The request can include user credentials. A security system can determine whether to grant the request to access data based on the user credentials.


SUMMARY

In general, this specification describes a policy system for enforcing one or more data access policies.


A policy system performs a method that includes receiving, from a client device, a request to access data from a distributed database system. The distributed database system includes a name node and one or more data nodes. The request includes user credentials. The method includes determining, according to one or more data access policies and the user credentials, whether the request is to be denied, to be allowed with obligations, or to be allowed without obligations. The method includes, in response to different results of the determining, performing different actions. The actions include, upon determining that the request is to be denied, notifying the client device of request denial; upon determining that the request is to be allowed with obligations, forwarding the request to the name node; upon determining that the request is to be allowed without obligations, instructing the client device to submit the request directly to a short-circuit handler executing on the name node of the distributed database system that stores the data. The short-circuit handler is programmed to process the request and to handle cases where no data redaction is required.


Particular embodiments of the subject matter described in this specification can be implemented to realize one or more advantages. For example, compared to conventional technology where a policy system retrieves and process data from a data storage system and then forward the retrieved data to a client device, the techniques described in this specification can bypass certain stages of communication between a policy system and a data storage system upon determining that certain conditions are satisfied. Bypassing the communication can avoid unnecessary network bandwidth consumption, increase throughput rate, increase reliability, and accordingly, enhance performance of the distributed data storage system.


The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other aspects, features, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1A is a block diagram illustrating conventional techniques of proxy-managed data access.



FIG. 1B is a block diagram illustrating a system implementing short-circuit data access techniques.



FIG. 2 is a block diagram illustrating an example proxy-managed data storage system implementing short-circuit data access techniques.



FIG. 3 is a flowchart illustrating an example method of short-circuit data access.



FIG. 4 is a flowchart illustrating an example method of short-circuit data access.





Like reference numbers and designations in the various drawings indicate like elements.


DETAILED DESCRIPTION

Generally, a policy system can act like a proxy between the client device and the distributed data storage system. In response to the request for accessing data from a client device, the policy system enforces the data security policies. In enforcing the data security policies, the policy system can request data from the distributed data storage system on behalf of the client device, receive the data in response, and perform security actions on the retrieved the data according to the user credentials and the policies. The policy system then provides the processed data to the client system. In this example scheme, data flows from the distributed data storage system to the policy system, and then from the policy system to the client device.


A policy system as described in this specification enforces data security policies for requests to access data stored on a distributed data storage system received from a client device. The policy enforcement system can determine user credentials associated with the requests. The policy system then determines whether the user credentials allow the request to access the data and if yes, whether the user credentials allow the request to access the data without obligations. Upon determining that user credentials allow the request to retrieve the data without obligations, the policy enforcement system directs the client device to communicate directly with a name node, short-circuiting additional data retrieval and filtering by the policy system.



FIG. 1A is a block diagram illustrating conventional techniques of proxy-managed data access. A user of a client device 12 executes an application program that requests to access data stored on a data storage system 14. Access to data storage system 14 is shared by multiple users and multiple client devices. A security proxy 16 manages access privileges of the users on the data stored on the data storage system 14.


The security proxy 16 receives a request from the client device 12. The request includes user credentials. The security proxy 16 determines if the user represented by the user credentials has privileges to access some or all of the requested data. If yes, the security proxy 16 fetches the requested data from the data storage system 14. The data storage system 14 provides the requested data to the security proxy 16.


If the security proxy 16 determines that the user has access to only a portion of the requested data, the security proxy 16 can process the retrieved data such that only that portion of the data is visible to the user. The security proxy 16 then sends the processed data to the client device 12 as a response to the request. In this scheme, data flows from the data storage system 14 to the security proxy 16, and then from the security proxy 16 to the client device 12. In this scheme, the security proxy 16 can be a bottleneck in flow of data. Having all the requests flowing through the security proxy 16 may unnecessarily consume bandwidth.



FIG. 1B is a block diagram illustrating a system implementing short-circuit data access techniques. A policy system 102 acts as a security proxy between a data storage system 104 and one or more client device 106, and manages access privileges on data stored on the data storage system 14. A user of a client device 106 executes an application program that requests access to data stored on data storage system 104. Access to data storage system 104 is shared by multiple users and multiple client devices.


The policy system 102 receives a request from the client device 106. The request includes user credentials. The policy system 102 determines whether or not the user represented by the user credentials has privileges to access the data. In addition, assuming the user has privilege to access the data, the policy system 102 determines whether the privilege is with or without obligations, e.g., whether the user has privilege to access only a portion of the requested data or all of the requested data. In this specification, the term “obligation” generally refers to conditions or limitations on a privilege or on accessing a data item. For example, a user U can have privileges to access a database table T. An obligation can specify that under condition A, user U can only access data column X of database table T. The condition can be based on, for example, time, location, data type, or a calculated value.


Upon determining that the user does not have access privilege, the policy system 102 can notify the client device 106 that access has been denied. Upon determining that the user has access privilege with obligations, the policy system 102 fetches the data from the data storage system 104, processes the data as appropriate for any restrictions imposed by the obligations, and provides the processed data to the client device 106 as a response.


Upon determining that the user has access privilege without obligations, the policy system 102 directs the client device 106 to communicate with the data storage system 104 directly. The client device 106 then fetches the data from the data storage system 104. In this scenario, data flows from the data storage system 104 to the client device 106 directly, “short-circuiting” the policy system 102 once the policy system 102 made the determination. Details on short-circuit data access techniques are described below in reference to FIG. 2.



FIG. 2 is a block diagram illustrating an example proxy-managed data storage system 200 implementing short-circuit data access techniques. The proxy-managed data storage system 200 includes a policy system 102 and a data storage system 104. The policy system 102 serves as a security proxy, and acts like an intermediary between the data storage system 104 and one or more client devices 106. The policy system 102 provides access control to the data storage system 104 by implementing one or more data access policies, or simply referred to as policies. Each policy specifies whether a user of a client device 106 is allowed to access data stored in data storage system 104 and, if yes, whether the access shall be subject to obligations by the policy system 102. Data access is subject to obligations if, due to limited access privileges of the user, the data being accessed needs to be redacted (e.g., filtered or modified) before being returned to the user of a client device 106.


The data storage system 104 can be a distributed storage system having a master/slave architecture. For example, the data storage system 104 can be a Hadoop Distributed File System (HDFS). The data storage system 104 can include multiple nodes. Each node can include a processor associated with one or more storage devices. In various implementations, each node can correspond to a standalone computer, or to a virtual machine. In the example shown, the data storage system 104 includes a name node 208 and data nodes 210A, 210B and 210C. The name node 208 can be a master node and the data nodes 210A, 210B and 210C can be slave nodes.


The name node 208 stores metadata on mapping between data and data nodes 210A, 210B and 210C. Each of data node 210A, 210B and 210C can store a respective portion of the data. Therefore, when the name node 208 receives a request for a particular portion of data, the name node 208 can respond with a list of identifiers or addresses, e.g., Internet Protocol (IP) addresses, of data nodes that store the particular portion of data based on the mapping.


In the example shown, a client device 106 executing an application that seeks to access data stored on the data storage system can communicate with the policy system 102. The client device 106 can include a memory, e.g., a random access memory (RAM), for storing instructions and data and a processor for executing stored instructions. The memory can include both read only and writable memory. For example, the client device 106 can be a computer coupled to the policy system 102 through a data communication network, e.g., local area network (LAN) or wide area network (WAN), e.g., the Internet, or a combination of networks. Communication over the network can occur using TCP/IP protocols.


In the communication between the client device and the policy system 102, the client device 106 submits a request 232 for accessing data to the policy system 102. The request 232 includes user credentials and a data identifier. The user credentials (e.g., a user identifier and associated proof of identity) identify a user. The data identifier (e.g., a table name or a file name) identifies data for access. The request 232 also specifies an access type, e.g., whether the access is a read, write, or manage. Each user can correspond to one or more client devices 106. Each client device 106 can correspond to one or more users. For example, different users can log in on the same client device under different user accounts, and each user can log in on different client devices.


The policy system 102 can include one or more computers configured to implement policies for accessing data stored in the data storage system 104. In particular, the policy system 102 includes a policy enforcement point 216 and a policy decision point 218. The policy enforcement point 216 and policy decision point 218 are components of the policy system 102 that are configured to perform a security check on the request 232.


The policy enforcement point 216 receives the request 232, and determines from the request 232 the user credentials, identifier of the data requested, and access type. The policy enforcement point 216 submits a policy check request 235 to the policy decision point 218. The policy check request 235 includes the user credentials, the data identifier, and access type.


The policy decision point 218 of the policy system 102 determines whether the user identified in the user credentials has privileges to access identified data using the access type, based on one or more policies received from an administrator process. The policy decision point 218 is configured to determine whether the user has the privileges based on one or more policies. The policy decision point 218 can receive the policies from a policy administration point programed to receive the policies from an administrator process, or to generate the policies based on specifications of the policy system 102.


The policies restrict access based on a mapping between a user, particular data, and an access type. A policy can be a three-dimensional mapping between the user, the data, and the access type, specifying whether a user U has rights to access a database table T (or a file F) using access type A. In addition, the policy can specify whether the privileges to access certain data is with obligations or without obligations. If a policy specifies that the user U has privilege to access certain data without obligations, the policy system 102 can grant the user U access to the requested data in the entirety. If a policy specifies that the user U has privilege to access the requested data with obligations, the policy system 102 can grant the user U access to a portion of the requested data by redacting other portions of the requested data.


The example below illustrates the relationship between user, data, and access type as specified in a policy. Data stored in the data storage system 104 includes a table T of rows and columns, and a file F. Table T is a data table in a relational database. File F is an unstructured file. Some columns in the table T may include data about people, e.g., names, dates of birth, phone numbers, credit card information, social security numbers, or other personal information. The rows can include data about the individuals, e.g. sorted by unique identifier. A policy P1 specifies that a user U1 has no privilege to access the table T, but read only privilege on file F without obligations. A policy P2 specifies that a user U2 has privileges to access table T with obligations, e.g., user U2 has privileges to read data of some, but not all, columns of the table T, and has no privileges to write to table T. Policy P2 can specify that user U2 has read only privileges on file F. A policy P3 specifies that a user U3 has privileges to access table T without obligations, e.g., user U3 can access all data in table T without restriction and without redaction, and has privileges to access file F.


Upon receiving the policy check request 235 including user credentials, identifiers of data, and access type from the policy enforcement point 216, the policy decision point 218 makes a decision 236 on the privilege request of the policy enforcement point. The decision 236 can include denying access, allowing access with obligations, and allowing access without obligations. For example, the policy decision point 218 can determine that the user identified in the credentials is user U3, that the requested data is table T and file F, and that the requested access type is read only. According to the policy P3, the policy decision point 218 can make a decision 236 that the user has privilege without obligation to access the requested data. The policy decision point 218 can provide the decision 236 to the policy enforcement point 216 as a response to the policy check request 235.


Upon receiving the decision 236 from the policy decision point 218, the policy enforcement point 216 can perform one of three actions. In a first scenario, the decision 236 indicates that data access shall be denied. This would occur if the policies indicate that the user does not have access privilege to the data. In response, the policy enforcement point 216 can notify the client device 106 that a user's request to access certain data (e.g., table T or file F) is denied.


In a second scenario, the decision 236 indicates that data access is allowed with obligations. In response, the decision can include user and data specific obligations. The obligations can specify a portion of the data that needs to be redacted before being presented to the user. In this case, the policy enforcement point 216 can enforce the decision. Enforcing the decision can include modifying a data query in the data request. Modifying the data query can include redacting or removing a column name in the query. For example, enforcing the decision can include changing a column name in a SQL query to blank or to a constant string. Enforcing the decision can include filtering data fetched from the data storage system 104, e.g., by deleting a file or by passing only a portion of the content of the file. Enforcing the decision can include masking data fetched from the data storage system 104, e.g., by replacing certain fetched data with string masks. For example, enforcing the decision can include masking at least a portion of a social security number with a string such that a masked social security number reads “XXX-XX-6789” rather than “123-45-6789.” Enforcing the decision can include encrypting data fetched from the data storage system 104. Enforcing the decision can include performing various combinations of all of the above according to the decision. The policy enforcement point 216 interacts with the name node 208 to obtain metadata, e.g., locations of the data to be retrieved. The policy enforcement point 216 then retrieves data from one or more of data nodes 210A, 210B and 210C. The policy enforcement point 216 passes redacted data to the client device 106. The client device 106 need not interact with the data storage system 104 directly.


In the third scenario, the decision 236 from the policy decision point 218 can indicate that the user is allowed to access the request data without obligations. Upon receiving this decision 236, a short-circuit module 220 of the policy enforcement point determines that redaction of the data is unnecessary for the request 232 from the client device 106. The short-circuit module 220 then bypasses the data processing actions of the policy enforcement point 216. The short-circuit module 220 sends a response 234 to the client device 106. In the response 234, the short-circuit module 220 directs the client device 106 to communicate with the data storage system 104 directly. The response 234 can include an identifier or address of the name node 208 of the data storage system 104. The client device 106 then communicates with the name node 208 of the data storage system 104 directly, by sending a request 237 to the name node 208 of the data storage system 104. In some implementations, the request 237 can be the same as the request 232 previously sent to the policy system 102.


The name node 208 can include a short-circuit handler 222. The short-circuit handler 222 is a component of the name node 208 configured to determine, based on a sender of the request 237, whether to perform a security check on the request. In some implementations, the short-circuit handler 222 is a plugin component of the name node 208. The short-circuit handler 222 first determines whether the request 237 is from a policy system 102 or from a client device 106. If the short-circuit handler 222 determines a request is from the policy system 102, the short-circuit handler 222 need not perform addition actions. The short-circuit handler 222 simply passes the request to other components of the name node 208 for processing.


If the short-circuit handler 222 determines that a request is from a client device 106, to ensure that the name node 208 prevents unauthorized users from accessing specific data, the short-circuit handler 222 performs a security check on the request. In this example, the short-circuit handler 222 of name node 208 determines that the request 237 is from the client device 106. In response, the short-circuit handler 222 performs a security check on the request 237. The short-circuit handler 222 can to initiate communication with the policy system 102 from within the data storage system 104 to perform the security check.


The short-circuit handler 222 submits a security check request 238 to the policy decision point 218 to verify that the user having the credentials in the request 237 indeed has privilege to access the requested data without obligations. The policy decision point 218 can make a decision 240 on what access privilege the user has on the requested data, and provide the decision 240 to the short-circuit handler 222 responsive to the security check request 238. In this example, the policy decision point 218 confirms that user U3 has privileges to access the requested table T and file F without obligation.


Upon receiving the decision 240 from the policy decision point 218, the short-circuit handler 222 can determine an action based on the decision. Upon determining that the decision 240 indicates that the user has no access privilege on the requested data, or upon determining that the decision indicates that the user has access privilege with obligations, the short-circuit handler 222 can deny the request 237, and inform the client device 106 of the decision.


Upon determining that the decision 240 from the policy decision point 218 indicates that the user has access privilege on the requested data without obligations, the short-circuit handler 222 passes the request 237 from the client device 106 to other components of the name node 208, as if the request 237 is from the policy system 102. The data storage system 104 then provides the requested data in a response 242 to the client device 106 directly, bypassing the policy system 102.


In various implementations, the data storage system 104 can provide the data in various ways. For example, in some other implementations, the short-circuit handler 222 can retrieve the requested data from a data node, and return the retrieved data to the client device 106. In some implementations, the short-circuit handler 222 can receive information on the data, including, for example, an identifier or address of a data node 210A storing the data, and provide the identifier or address to the client device 106 in the response 242. The client device 106 can then retrieve the data from the data node 210A directly. Data node 210A can implement various may implement another short-circuit handler that ensures a policy allows the client device 106 to directly access the data.


The example described above can have variations. For example, in some implementations, the short-circuit module 220 of the policy enforcement point 216 need not send a response 234 to the client device 106 to notify the client device 106 to resubmit a request to the data storage system 104. Instead, the short-circuit module 220 can redirect the original request 232 to the name node 208, along with a flag indicating that a security check has already been performed and along with an identifier or address of the client device 106. The short-circuit handler 222 can receive this request and associated flag and identifier. Upon determining that the request is associated with the flag, the short-circuit hander 222 can cause the name node 208 to retrieve data, and send the retrieved data directly to the client device 106 according the identifier or address.


In various implementations, the technology can be implemented on various proxy-protected resources that are different from data storage system 104. A proxy can short-circuit itself upon determining a request from a requesting client for data does not require the proxy to perform redactions or other processing of the data. The proxy can instruct the client to access the resources directly. The resources can confirm that the request is legitimate by inquiring the proxy, and then, upon confirmation, providing the data to the client directly.



FIG. 3 is a flowchart illustrating an example method 300 of short-circuit data access. For convenience, the method 300 will be described with respect to a policy system, e.g., the policy system 102 of FIG. 1. The policy system includes one or more computing devices that execute software to implement the method 300.


The policy system receives (302), from a client device, a request to access data stored one a distributed file system. The request includes user credentials and information identifying the data to be accessed. The distributed file system includes a name node and one or more data nodes. In various implementations, the distributed file system can be a distributed relational database system, a distributed file system for storing unstructured files, or a combination of the above. The distributed file system can be an RDFS.


A policy decision point of the policy system determines (304) whether the request is to be denied, to be allowed with obligations, or to be allowed without obligations. The policy decision point makes the determination according to one or more data access policies, the user credentials, data requested, and requested access type. The policy decision point can receive the one or more policies from an administrator process. The obligations can be defined by a setting in the one or more data access policies. The setting can specify whether at least a portion of the data that the client requests to access shall be redacted.


The policy system perform various actions in response to results of the determination. Upon determining that the request is to be denied, the policy system notifies the client device of request denial. Upon determining that the request is to be allowed with obligations, the policy system requests data from the distributed file system. The policy system can retrieve the data and redact the retrieved data according to the obligations. For example, the policy system can filter, mask, or encrypt at least a portion of the retrieved data according to the setting. Filtering the data can include modifying a query to avoid retrieving certain portion of the data, preventing a portion of data retrieved from the distributed file system from being sent to the client device, or both. Masking the data can include replacing a portion of the data with certain masks, for example, replacing a number (e.g., a social security number) with a string (e.g., “XXX-XX-XXXX”).


Upon determining that the request is to be allowed without obligations, the policy system performs short-circuiting actions, including redirecting (306) the request to a short-circuit handler executing on the name node of the distributed file system. In some implementations, redirecting the request can include providing an identifier or address of the name node to the client device, and instructing the client device to submit the request to the name node rather than to the policy system. In some implementations, redirecting the request can include forwarding the request, as well as identifier or address of the client device, to the name node. In some implementations, the policy system can limit the short-circuit actions to certain access types, e.g., read only.


The short-circuit handler can perform (308) security checks upon receiving the request from the client device 106. The short-circuit handler verifies with the policy decision point that the request from the client device is, indeed, to be allowed without obligations. Upon verification, the short-circuit handler allows the client device to access the requested data. Performing the verification under various circumstances can include determining whether one or more requests received by the short-circuit handler are received from the client device or from the policy system. In some implementations, the short-circuit handler communicates with the policy decision point for verification only upon determining that the one or more requests are from the client device.


Upon determining that the one or more requests are from the client device, the short-circuit handler can initiate the communication with the policy decision point, and submit at least a portion of the request (e.g., the user credentials, identifiers or addresses of the data to be accessed, and the type of access) to the policy decision point. The policy decision point can respond by sending a decision indicating whether the access is denied, allowed with obligations, or without obligations, according to the policies. Upon receiving a decision that the access is denied or allowed with obligations, the short-circuit handler can determine that the request is illegitimate, and deny the request. Upon receiving a decision that the access is allowed without obligations, the short-circuit handler can permit the name node to process the request and provide the requested data to the client device.



FIG. 4 is a flowchart illustrating an example method 400 of short-circuit data access. Method 400 can be performed by a processor implementing a short circuit handler, e.g., the short circuit handler 222 of FIG. 2.


The short circuit handler executes on a name node of a distributed file system. The short circuit handler receives (402) a request to access data stored on the distributed file system. The request is associated with user credentials. The short circuit handler determines (404) whether the request received by the short circuit handler is received from a client device or from a policy system. The distributed file system can be an RDFS.


In response to determining that the request is from a client device, the short circuit handler communicates (406) with a policy decision point of the policy system for verification of the user credentials. The short circuit handler receives (408) a decision from the policy decision point. The decision can indicate whether the request is to be denied, to be allowed with obligations, or to be allowed without obligations. The obligations can be defined by a setting in the one or more data access policies. The setting can specify that at least a portion of the data to be accessed is to be redacted.


In response to the decision, the short circuit hander performs (410) various actions. The actions can include, upon determining that the request is to be denied or to be allowed with obligations, notifying the client device of a request denial. The actions can include, upon determining that the request is to be allowed without obligations, allowing the client device to access the requested data.


Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them. The computer storage medium is not, however, a propagated signal.


The term “data processing apparatus” encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.


A computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.


As used in this specification, an “engine,” or “software engine,” refers to a software implemented input/output system that provides an output that is different from the input. An engine can be an encoded block of functionality, such as a library, a platform, a software development kit (“SDK”), or an object. Each engine can be implemented on any appropriate type of computing device, e.g., servers, mobile phones, tablet computers, notebook computers, music players, e-book readers, laptop or desktop computers, PDAs, smart phones, or other stationary or portable devices, that includes one or more processors and computer readable media. Additionally, two or more of the engines may be implemented on the same computing device, or on different computing devices.


The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).


Computers suitable for the execution of a computer program include, by way of example, can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.


Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.


To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) monitor, an LCD (liquid crystal display) monitor, or an OLED display, for displaying information to the user, as well as input devices for providing input to the computer, e.g., a keyboard, a mouse, or a presence sensitive display or other surface. Other kinds of devices can be used to provide for interaction with a user Us well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending resources to and receiving resources from a device that is used by the user; for example, by sending web pages to a web browser on a user's client device in response to requests received from the web browser.


Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.


The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.


While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.


Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.


Particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking and parallel processing may be advantageous.

Claims
  • 1. A method, comprising: receiving, from a client device by a policy system, a request to access data stored on a distributed file system comprising a name node and one or more data nodes, the request comprising user credentials, the name node being separate from the policy system;determining, by a policy decision point of the policy system and according to (i) one or more data access policies and (ii) the user credentials, whether the request is to be denied, to be allowed with obligations that define conditions for access to a limited portion of the data requested by the client device, or to be allowed without obligations; andin response to results of the determining, perform, by the policy system, actions including:based at least on a determination that the request is to be denied, notifying the client device of request denial by the policy system;based at least on a determination that the request is to be allowed with obligations, requesting data from the distributed file system by the policy system; andbased at least on a determination that the request is to be allowed without obligations:redirecting, by the policy system, the request to a short circuit handler executing on the name node of the distributed file system, the short circuit handler configured to determine whether to perform security check actions on the request based on a sender of the request, wherein redirecting the request comprises either (i) instructing the client device to re-submit the request to the short circuit handler executing on the name node rather than to the policy system or (ii) forwarding the request on behalf of the client device to the short circuit handler, andin response to receiving a verification request from the short circuit handler, providing, by the policy decision point, a verification to the short circuit handler, the verification verifying that the redirected request requires no obligations and authorizing the name node on the distribution file system to allow the client device to access requested data specified in the redirected request without obligation.
  • 2. The method of claim 1, wherein the distributed file system is a Hadoop Distributed File System (HDFS).
  • 3. The method of claim 1, wherein the obligations are defined by a setting in the one or more data access policies, the setting specifying that at least a portion of the data to be accessed is to be redacted.
  • 4. The method of claim 3, comprising, based at least on a determination that the request is to be allowed with obligations, redacting data retrieved from the distributed file system, wherein the redacting comprises at least one of filtering, masking or encrypting at least a portion of the retrieved data according to the setting.
  • 5. The method of claim 3, further comprising performing, by the short circuit handler the security check actions, the security check actions comprising: determining whether one or more requests received by the short circuit handler are received from the client device or from the policy system; andcommunicating with the policy decision point for verification only based at least on a determination that the one or more requests are from the client device and not from the policy decision point.
  • 6. The method of claim 5, further comprising denying, by the short circuit handler, the one or more request upon verification from the policy decision point that the one or more requests are to be allowed with obligations.
  • 7. The method of claim 1, wherein the verification request initiated by the short circuit handler is to inquire whether the request is to be denied, to be allowed with obligations, or to be allowed without obligations.
  • 8. The method of claim 1, wherein redirecting comprises forwarding by the policy system to the short circuit handler, the request on behalf of the client device, and wherein the verification is provided, by the policy system, in form of a flag indicating that the request is allowed without obligation.
  • 9. A system comprising: one or more processors; anda non-transitory computer-readable medium storing instructions that, upon execution by the one or more processors, cause the one or more processors to perform operations comprising:receiving, from a client device by a policy system, a request to access data stored on a distributed file system comprising a name node and one or more data nodes, the request comprising user credentials, the name node being separate from the policy system;determining, by a policy decision point of the policy system and according to (i) one or more data access policies and (ii) the user credentials, whether the request is to be denied, to be allowed with obligations that define conditions for access to a limited portion of the data requested by the client device, or to be allowed without obligations; andin response to results of the determining, perform, by the policy system, actions including:based at least on a determination that the request is to be denied, notifying the client device of request denial by the policy system;based at least on a determination that the request is to be allowed with obligations, requesting data from the distributed file system by the policy system; andbased at least on a determination that the request is to be allowed without obligations:redirecting, by the policy system, the request to a short circuit handler executing on the name node of the distributed file system, the short circuit handler configured to determine whether to perform security check actions on the request based on a sender of the request, wherein redirecting the request comprises (i) either instructing the client device to re-submit the request to the short circuit handler executing on the name node rather than to the policy system or (ii) forwarding the request on behalf of the client device to the short circuit handler, andin response to receiving a verification request from the short circuit handler, providing, by the policy decision point, a verification to the short-circuit handler, the verification verifying that the redirected request requires no obligations and authorizing the name node on the distribution file system to allow the client device to access requested data specified in the redirected request without obligation.
  • 10. The system of claim 9, wherein the distributed file system is a Hadoop Distributed File System (HDFS).
  • 11. The system of claim 9, wherein the obligations are defined by a setting in the one or more data access policies, the setting specifying that at least a portion of the data to be accessed is to be redacted.
  • 12. The system of claim 11, the operations comprising, based at least on a determination that the request is to be allowed with obligations, redacting data retrieved from the distributed file system, wherein the redacting comprises at least one of filtering, masking or encrypting at least a portion of the retrieved data according to the setting.
  • 13. The system of claim 11, wherein the short circuit handler is configured to perform the security check actions that comprise: determining whether one or more requests received by the short circuit handler are received from the client device or from the policy system; andcommunicating with the policy decision point for verification only based at least on a determination that the one or more requests are from the client device.
  • 14. The system of claim 13, wherein the short circuit handler is configured to deny the one or more request upon verification from the policy decision point that the one or more requests are to be allowed with obligations.
  • 15. The system of claim 9, wherein the short circuit handler is configured to initiate communication with policy system to inquire whether the request is to be denied, to be allowed with obligations, or to be allowed without obligations.
  • 16. A non-transitory computer-readable medium storing instructions that, upon execution by one or more processors, cause the one or more processors to perform operations comprising: receiving a request to access data by a short circuit handler executing on a name node of a Hadoop Distributed File System (HDFS), the request being associated with user credentials;determining, by the short circuit handler, whether the request received by the short circuit handler is received from a client device or from a policy system;in response to determining that the request is from a client device, performing, by the short circuit handler, actions including:communicating with a policy decision point of the policy system for verification of the user credentials,receiving a decision from the policy decision point on whether the request is to be denied, to be allowed with obligations that define conditions for access to a limited portion of the data requested by the client device, or to be allowed without obligations, andin response to the decision, performing actions including:based at least on a determination that the request is to be denied or to be allowed with obligations, notifying the client device of a request denial, andbased at least on a determination that the request is to be allowed without obligations, allowing the client device to access the requested data;in response to determining that the request is from a policy system, determining whether the request includes a flag indicating that the request is to be allowed without obligation; andin response to determining that the request includes the flag, allowing a client computing device to access the requested data, the client computing device being identified based on the user credentials associated with the request.
  • 17. The non-transitory computer-readable medium of claim 16, wherein the obligations are defined by a setting in one or more data access policies, the setting specifying that at least a portion of the data to be accessed is to be redacted.
  • 18. The non-transitory computer-readable medium of claim 16, wherein the policy system is configured to enforce one or more data access policies for accessing the data.
  • 19. A non-transitory computer-readable medium storing instructions that, upon execution by one or more processors, cause the one or more processors to perform operations comprising: receiving a request to access data by a short circuit handler executing on a name node of a distributed file system, the request being associated with user credentials;determining, by the short circuit handler, whether the request received by the short circuit handler is received from a client device or from a policy system;in response to determining that the request is from a client device, performing, by the short circuit handler, actions including:communicating with a policy decision point of the policy system for verification of the user credentials,receiving a decision from the policy decision point on whether the request is to be denied, to be allowed with obligations that define conditions for access to a limited portion of the data requested by the client device, or to be allowed without obligations, the obligations being defined by a setting in one or more data access policies, the setting specifying that at least a portion of the data to be accessed is to be redacted, and in response to the decision, performing actions including:based at least on a determination that the request is to be denied or to be allowed with obligations, notifying the client device of a request denial, andbased at least on a determination that the request is to be allowed without obligations, allowing the client device to access the requested data;in response to determining that the request is from a policy system, determining whether the request includes a flag indicating that the request is to be allowed without obligation; andin response to determining that the request includes the flag, allowing a client computing device to access the requested data, the client computing device being identified based on the user credentials associated with the request.
  • 20. The non-transitory computer-readable medium of claim 19, wherein the distributed file system is a Hadoop Distributed File System (HDFS).
US Referenced Citations (202)
Number Name Date Kind
5113499 Ankney May 1992 A
5537548 Fin Jul 1996 A
6163272 Goode Dec 2000 A
6173289 Sonderegger Jan 2001 B1
6205466 Karp Mar 2001 B1
6226372 Beebe May 2001 B1
6463470 Mohaban Oct 2002 B1
6643683 Drumm Nov 2003 B1
6687229 Kataria Feb 2004 B1
7304982 Hondo Dec 2007 B2
7542943 Caplan Jun 2009 B2
7631084 Thomas Dec 2009 B2
7730089 Campbell Jun 2010 B2
8051491 Cavage Nov 2011 B1
8196183 Smith Jun 2012 B2
8341717 Delker Dec 2012 B1
8346929 Lai Jan 2013 B1
8613108 Aggarwal Dec 2013 B1
8683560 Brooker Mar 2014 B1
8745612 Semenzato Jun 2014 B1
8849716 Everhart Sep 2014 B1
8910263 Martini Dec 2014 B1
8997198 Kelley Mar 2015 B1
9003474 Smith Apr 2015 B1
9038151 Chua May 2015 B1
9317452 Forschmiedt Apr 2016 B1
9614715 Bhave Apr 2017 B2
9723005 McInerny Aug 2017 B1
9866592 Arumugam Jan 2018 B2
9871825 Arumugam Jan 2018 B2
9959280 Whitehead May 2018 B1
10055139 Bent Aug 2018 B1
10277633 Arumugam Apr 2019 B2
20010023486 Kayashinna Sep 2001 A1
20020007404 Vange Jan 2002 A1
20020019828 Mortl Feb 2002 A1
20020083058 Hsiao Jun 2002 A1
20020156879 Delany Oct 2002 A1
20020169907 Candea Nov 2002 A1
20020178249 Prabakaran Nov 2002 A1
20030018786 Lortz Jan 2003 A1
20030021283 See Jan 2003 A1
20030046315 Feig Mar 2003 A1
20030079143 Mikel Apr 2003 A1
20030115322 Moriconi Jun 2003 A1
20030124974 Asami Jul 2003 A1
20030200215 Chen Oct 2003 A1
20030225707 Ehrman Dec 2003 A1
20040015470 Smith Jan 2004 A1
20040022191 Bernet Feb 2004 A1
20040054791 Chakraborty Mar 2004 A1
20040073668 Bhat Apr 2004 A1
20040083464 Cwalina Apr 2004 A1
20040088560 Danks May 2004 A1
20040114155 Kurahashi Jun 2004 A1
20040128394 Knauerhase Jul 2004 A1
20040167984 Herrmann Aug 2004 A1
20040267749 Bhat Dec 2004 A1
20050021818 Singhal Jan 2005 A1
20050021978 Bhat Jan 2005 A1
20050027915 Gragg Feb 2005 A1
20050101293 Mentze May 2005 A1
20050188419 Dadhia Aug 2005 A1
20050278775 Ross Dec 2005 A1
20050289144 Dettinger Dec 2005 A1
20060048142 Roese Mar 2006 A1
20060053216 Deokar Mar 2006 A1
20060059092 Burshan Mar 2006 A1
20060143179 Draluk Jun 2006 A1
20060161641 Sekiguchi Jul 2006 A1
20060190985 Vasishth Aug 2006 A1
20060242169 Tunning Oct 2006 A1
20060248337 Koodli Nov 2006 A1
20060253314 Reznichenko Nov 2006 A1
20060259977 Patrick Nov 2006 A1
20070005766 Singhal Jan 2007 A1
20070100701 Boccon-Gibod May 2007 A1
20070124434 Smith May 2007 A1
20070124797 Gupta May 2007 A1
20070156659 Lim Jul 2007 A1
20070234402 Khosravi Oct 2007 A1
20070234408 Burch Oct 2007 A1
20070240231 Haswarey Oct 2007 A1
20080005798 Ross Jan 2008 A1
20080065746 Moghaddam Mar 2008 A1
20080083013 Soliman Apr 2008 A1
20080109554 Jing May 2008 A1
20080120264 Lee May 2008 A1
20080141339 Gomez Jun 2008 A1
20080184335 Zhang Jul 2008 A1
20080242422 Kropivny Oct 2008 A1
20080301437 Chevalier Dec 2008 A1
20090049512 Manickam et al. Feb 2009 A1
20090119770 Soliman May 2009 A1
20090193493 Becker Jul 2009 A1
20090276204 Kumar Nov 2009 A1
20100005511 Maes Jan 2010 A1
20100008299 Shin Jan 2010 A1
20100024019 Backlund Jan 2010 A1
20100064341 Aldera Mar 2010 A1
20100281524 Ghittino Nov 2010 A1
20100313239 Chakra Dec 2010 A1
20100332504 Brucker Dec 2010 A1
20110088084 Yasaki Apr 2011 A1
20110093913 Wohlert Apr 2011 A1
20110107358 Shyam May 2011 A1
20110119481 Auradkar May 2011 A1
20110125894 Anderson May 2011 A1
20110145425 Xiao Jun 2011 A1
20110145593 Auradkar Jun 2011 A1
20110209194 Kennedy Aug 2011 A1
20110211465 Farrugia Sep 2011 A1
20110219425 Xiong Sep 2011 A1
20120042395 Jain Feb 2012 A1
20120072605 Xu Mar 2012 A1
20120110632 Burghart May 2012 A1
20120131164 Bryan May 2012 A1
20120143817 Prabaker Jun 2012 A1
20120159099 Lindamood Jun 2012 A1
20120198467 Jackson Aug 2012 A1
20120240184 Thirasuttakorn Sep 2012 A1
20120246325 Pancorbo Marcos Sep 2012 A1
20130034019 Mustajarvi Feb 2013 A1
20130036447 Lassesen Feb 2013 A1
20130042298 Plaza Fonseca Feb 2013 A1
20130117313 Miao May 2013 A1
20130125210 Felt May 2013 A1
20130250849 Li Sep 2013 A1
20130263210 Lim Oct 2013 A1
20130291054 Arora Oct 2013 A1
20130298186 Radkowski Nov 2013 A1
20130318339 Tola Nov 2013 A1
20130325915 Ukai Dec 2013 A1
20130326041 Bellet Dec 2013 A1
20130332982 Rao et al. Dec 2013 A1
20140012833 Humprecht et al. Jan 2014 A1
20140059310 Du Feb 2014 A1
20140068699 Balacheff Mar 2014 A1
20140090085 Mattsson Mar 2014 A1
20140108648 Nelke Apr 2014 A1
20140122429 Chen May 2014 A1
20140123207 Agarwal May 2014 A1
20140128053 Merchant May 2014 A1
20140136779 Guha May 2014 A1
20140156848 Uttaro Jun 2014 A1
20140157370 Plattner Jun 2014 A1
20140165134 Goldschlag Jun 2014 A1
20140181013 Micucci Jun 2014 A1
20140208304 Subramanya Jul 2014 A1
20140229596 Burke Aug 2014 A1
20140250504 Hung Sep 2014 A1
20140310771 Marshall Oct 2014 A1
20140337914 Canning Nov 2014 A1
20140351573 Martini Nov 2014 A1
20150019480 Maquaire Jan 2015 A1
20150046394 Onda Feb 2015 A1
20150067881 Badstieber Mar 2015 A1
20150095968 Steiner Apr 2015 A1
20150113010 Muthyala Apr 2015 A1
20150128205 Mahaffey May 2015 A1
20150135258 Smith May 2015 A1
20150150073 Bhalerao May 2015 A1
20150172320 Colombo Jun 2015 A1
20150195086 Davison Jul 2015 A1
20150201036 Nishiki Jul 2015 A1
20150215405 Baek Jul 2015 A1
20150222695 Lee Aug 2015 A1
20150234845 Moore Aug 2015 A1
20150236862 Castro Castro Aug 2015 A1
20150242502 Chadha Aug 2015 A1
20150269383 Lang Sep 2015 A1
20150347451 Lee Dec 2015 A1
20150356158 Potapov Dec 2015 A1
20150370615 Pi-Sunyer Dec 2015 A1
20150381660 Hsiung Dec 2015 A1
20160006753 McDaid Jan 2016 A1
20160014157 Gomez Jan 2016 A1
20160026590 Park Jan 2016 A1
20160094541 Tan Mar 2016 A1
20160105343 Janarthanan Apr 2016 A1
20160149859 Curtis May 2016 A1
20160205101 Verma et al. Jul 2016 A1
20160277373 Murray Sep 2016 A1
20160321310 Alshammari Nov 2016 A1
20160342534 Krause Nov 2016 A1
20160342803 Goodridge Nov 2016 A1
20160352731 Mentze Dec 2016 A1
20170012778 Choyi Jan 2017 A1
20170012962 Lissack Jan 2017 A1
20170061148 Buckley Mar 2017 A1
20170093916 Arumugam Mar 2017 A1
20170093925 Sheretov Mar 2017 A1
20170149786 Alon May 2017 A1
20170169088 Hong Jun 2017 A1
20170195457 Smith, II Jul 2017 A1
20170208033 Roskind Jul 2017 A1
20170223024 Desai Aug 2017 A1
20170257379 Weintraub Sep 2017 A1
20180007099 Ein-Gil Jan 2018 A1
20180131726 Arumugam et al. May 2018 A1
20180131727 Arumugam et al. May 2018 A1
20190260641 Giust Aug 2019 A1
Foreign Referenced Citations (4)
Number Date Country
2008118663 Oct 2008 WO
WO-2008118663 Oct 2008 WO
WO-2015153924 Oct 2015 WO
WO-2018005874 Jan 2018 WO
Non-Patent Literature Citations (10)
Entry
Vernekar, Sumeet S., and Amar Buchade. “MapReduce based log file analysis for system threats and problem identification.” In Advance Computing Conference (IACC), 2013 IEEE 3rd International, pp. 831-835. IEEE, 2013. (Year: 2013).
Joshi, Pallavi, Haryadi S. Gunawi, and Koushik Sen. “PREFAIL: A programmable tool for multiple-failure injection.” In ACM SIGPLAN Notices, vol. 46, No. 10, pp. 171-188. ACM, 2011. (Year: 2011).
Shvachko, Konstantin V. “HDFS Scalability: The limits to growth.”; login:: the magazine of USENIX & SAGE 35, No. 2 (2010): 6-16. (Year: 2010).
Khan, Mohammad Asif, Zulfiqar A. Memon, and Sajid Khan. “Highly available Hadoop namenode architecture.” In 2012 International Conference on Advanced Computer Science Applications and Technologies (ACSAT), pp. 167-172. IEEE, 2012. (Year: 2012).
Quillinan, T. B., & Foley, S. N. (Oct. 2004). Security in WebCom: Addressing naming issues for a web services architecture. In Proceedings of the 2004 workshop on Secure web service (pp. 97-105). (Year: 2004).
Threat Modeling a Mobile Application, InfoSec Musings, security musings blogspot, Jun. 25, 2016, 12 pages. (Year: 2016).
International Search Report for PCT/US2017/040118, dated Oct. 30, 2017.
International Application No. PCT/US2016/065853, Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, dated Mar. 24, 2017, 14 pages.
International Application No. PCT/US2016/054107, Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, dated Jan. 9, 2017, 13 pages.
Extended European Search Report in European Application No. 16852450.2, dated Jul. 3, 2018, 8 pages.
Related Publications (1)
Number Date Country
20180004970 A1 Jan 2018 US