Machine learning and artificial intelligence (AI) are being used for various tasks, ranging from numerical value prediction and classification to system monitoring and content generation.
This specification describes a system that can identify and manage the risks inherent in using machine learning and artificial intelligence (AI) tools to perform everyday tasks. AI processes—including generative AI—are being widely used for various applications, ranging from writing professional letters, chats, blogs, white papers, code, to fraud detection, portfolio optimization, providing customer support, and personalizing the customer experience. For example, generative AI tools based on Natural Language Processing (NLP) or Large Language Models (LLM) are being used for generating content ranging from various forms of textual content to images, videos, and even software codes, while AI models based on supervised, unsupervised, reinforced, or deep learning are being used to predict a wide range of outcomes ranging from credit scores to patient healthcare outcomes. Often, AI is embedded in third party applications without users even being aware of the presence of AI models and the risk that they pose.
With the advent of increasingly popular and novel uses of AI processes, the need to understand where AI is being used and the impact of AI and its associated risks has become more important than ever, for example, to ensure that the AI tools and processes are being used in compliance with applicable ethical and regulatory rules, guidelines and regulations. The technology described herein provides an intelligent risk management framework to automate the identification of AI files and models, assign a risk score based on various AI attributes, and manage the risk from those AI files. In particular, the technology can be configured to perform a scan of a file system in order to identify AI files and various risk attributes and determine candidate files for adding to inventory tracking and monitoring.
The system automatically identifies and tracks AI files within a repository to understand linkages and dependencies between the files, for example, across an enterprise-scale system. The files can then be analyzed to identify high-risk candidates, which in turn can be evaluated further to determine one or more performance criteria associated with the underlying AI processes/tools.
According to a first aspect, a method described herein includes accessing, by one or more computing devices, a repository of files stored on one or more computer-readable storage devices, and analyzing, by the one or more computing devices, files stored in the repository to identify a subset of the files as candidates that are potentially generated at least in part using one or more artificial intelligence (AI) processes, wherein the identified subset includes files each of which includes an attribute or content indicative of AI usage.
In some implementations, the method further includes identifying a corresponding set of attributes for each candidate, determining a subset of the candidates based on these attributes, and performing one or more automated tests on the determined subset of candidates.
In some implementations, the method wherein accessing the repository of files stored on one or more computer-readable storage devices further includes accessing the repository of files in a first scan in accordance with a set of one or more configurable scanning parameters, and identifying one or more files in the first scan in accordance with the set of one or more configurable scanning parameters.
In some implementations, the method further includes using results of the first scan in a second scan, wherein using the results includes filtering the one or more files identified in the first scan in accordance with a second set of one or more configurable scanning parameters selected from a group consisting of: target folders, target drives, file types, file age, file compression technology, scan depth, and keywords.
In some implementations, the method wherein accessing the repository of files stored on one or more computer-readable storage devices includes accessing one or more of a local desktop file system, a network file system, or a cloud repository file system used by one or more users.
In some implementations, the method wherein analyzing the files stored in the repository further includes, for each file, identifying one or more attributes of the file using one or more of file metadata, file location, or file contents.
In some implementations, the method wherein analyzing the files stored in the repository further includes, for each file, identifying one or more attributes of the file using one or more of file metadata, file location, or file contents, and wherein identifying one or more attributes of the file using one or more of file metadata, file location, or file contents includes, for each file: identifying a file name, file type, attributes specific to the file type, one or more file authors, and organizational information pertaining to the file, identifying one or more dates of modification, dates of last access, and an author associated with the last modification, determining a file size, identifying a location of the file in the repository with respect to one or more file system hierarchies in the repository, determining security rights of the file in the repository, identifying one or more keywords within the file, and identifying one or more attributes or content indicative of AI usage in the file.
In some implementations, the method wherein identifying the one or more attributes or content indicative of AI usage includes identifying at least one of: a file extension associated with an AI tool, a reference to a generative AI tool, a reference to a library, algorithm, keyword or folder associated with an AI tool, outputs of an AI file, a segment of code, or a number of lines of code.
In some implementations, the method wherein identifying the corresponding set of attributes for each candidate includes identifying the presence of sensitive information.
In some implementations, the method wherein identifying the corresponding set of attributes for each candidate includes identifying one or more code-based factors for each code-containing candidate by performing a code quality assessment for one or more types of errors and warnings.
In some implementations, the method further includes identifying one or more libraries and library versions installed on the one or more computer-readable storage devices.
In some implementations, the method further includes identifying vulnerabilities associated with each version of the identified library versions using a security vulnerability database, wherein the security vulnerability database includes a set of determined security vulnerabilities associated with each version of the identified library.
In some implementations, the method wherein identifying the corresponding set of attributes for each candidate includes: evaluating associated input and output dependencies of each candidate, wherein the associated dependencies include one or more of libraries, files, or import modules, and displaying a visual representation of the identified input and output dependencies for each candidate on a graphical user interface, wherein the visual representation includes an interconnected AI map of one or more objects representing a sequence of inputs to outputs in accordance with the dependencies at a specified-hierarchy level, wherein the objects are indicative of dependency information including a type of dependency, a scan status, and a type of dependency.
In some implementations, the method further includes, through an interaction taken on the graphical user interface, for each candidate, viewing the candidate, adding the candidate to inventory or monitoring, assigning the candidate to a policy categorization, or removing the candidate from future scans.
In some implementations, further including determining a risk score for each candidate using the corresponding set of attributes.
In some implementations, the method wherein determining the risk score for each candidate includes assigning a risk weight to each attribute in the corresponding set of attributes, determining a number of instances of each attribute in the corresponding set of attributes, and for each attribute, multiplying the risk weight with the number of instances of the attribute and aggregating to determine an aggregate risk score for the candidate as the risk score.
In some implementations, the method further includes receiving one or more threshold values, each threshold value indicative of a respective level of risk, and comparing the risk score of each candidate to each threshold value to determine the level of risk for each candidate.
In some implementations, the method further includes, for each candidate, calculating a percentage of risk contribution for each attribute, and displaying the percentages in a risk score card including a visual representation of the percentages on a graphical user interface.
In some implementations, the method determining the subset of the candidates further includes using a risk identification model configured to process a set of file attributes for each candidate to generate an indication of whether the candidate is a high-risk candidate, wherein the risk identification model has been trained by operations including training the risk identification model on a training data set of previously identified and assessed candidates comprising one or more risk attributes.
In some implementations, the method further includes grouping copies of a file in the subset of candidates as a single candidate, or grouping versions of a file in the subset of candidates as a single candidate.
In some implementations, the method wherein determining the subset of candidates further includes assigning a label to each candidate in the subset of candidates, and wherein assigning includes using a calculator engine configured to enact one or more arithmetical and logical operations as a calculation on the candidates based on the corresponding set of attributes, and determining one or more assigned labels based on the calculation.
In some implementations, the method further includes enacting one or more actions on a subset of candidates identified based at least on the one or more assigned labels, the one or more actions including: retaining the subset of candidates for further assessment, deleting the subset of candidates from the repository, moving the subset of candidates to a new location in the repository, adding the subset of candidates to a maintained file inventory, starting a workflow using a pre-built workflow template, and identifying copies of files within the subset of candidates.
In some implementations, the method wherein performing one or more automated tests on the determined subset of candidates further includes identifying one or more automated tests to perform on the subset of candidates based on a determined risk score or assigned label, and adding results of the performed one or more automated tests to an inventory record.
In some implementations, the method wherein identifying one or more automated tests includes identifying one or more tests from a group consisting of a fairness assessment, an interpretability assessment, a validity assessment, a reliability assessment, and a data drift assessment.
In some implementations, the method further includes starting a workflow based on at least one result of the one or more automated tests, or sending a notification to one or more identified relevant users based on at least one result of the one or more automated tests.
In some implementations, the method wherein starting a workflow using a pre-built workflow template includes generating a workflow task form configured to accept one or more user-inputs pertaining to values associated with an outcome of the task in accordance with a user-configured identification of one or more user-inputs solicited from one or more users or groups assigned to the task for completion of the task, wherein the user-configured identification of the one or more user-inputs includes a set of questions indicated as required or optional for task completion, and wherein the one or more users or one or more groups of users are dynamically assigned using a user-input value from a pre-configured metadata form.
In some implementations, the method further includes preventing the workflow from advancing based at least on the one or more user-inputs entered into the workflow task form.
According to a second aspect, a method described herein includes identifying, based at least on one or more attributes indicative of artificial intelligence (AI) usage in a file, one or more automated tests to perform on the file, providing the identified one or more automated tests in a graphical user-interface including a set of test icons, receiving, through the graphical user-interface, an indication of an automated test to perform from a user selecting a first test icon, performing the automated test corresponding to the first test icon, storing results of the performed automated test to an inventory record, and using the results of the automated test to train a machine learning model to generate an output including one or more of model behavior, compliance, or risk status of a file.
In some implementations, the method wherein identifying one or more automated tests to perform on a file includes identifying one or more tests from a group consisting of an AI fairness assessment, an AI interpretability assessment, an AI validity assessment, an AI reliability assessment, and a data drift assessment.
In some implementations, the method further includes performing the automated test corresponding to the first test icon on a subset of candidate files based on a determined risk score or assigned label.
According to a third aspect, a method described herein includes receiving, through a user-interface including a user-configurable identification of a set of metadata fields, a first set of user-inputs including multiple attributes pertaining to data to be stored in a database, each user-input specifying, for each of the multiple attributes, an entry corresponding to at least a subset of the metadata fields, generating a data entry form configured to accept a second set of user-inputs pertaining to values associated with at least a subset of the multiple attributes, wherein the data entry form is automatically generated based on the user-configurable identification of the set of metadata fields, receiving data including the second set of user-inputs through the data entry form displayed on an electronic display device, and storing the data in the database.
In some implementations, the method wherein receiving the first set of user-inputs including multiple attributes pertaining to data to be stored in the database further includes receiving a corresponding set of color, placement, and containerization instructions relating to one or more of a tab and group layout, wherein the set of color, placement, and containerization instructions indicate at least a location and color of the first-set of user-inputs for display on the electronic display device.
In some implementations, the method further includes generating the database based at least on the first set of user-inputs.
In some implementations, the method wherein receiving the first set of user-inputs further includes receiving the first set of user-inputs from an administrator, and wherein receiving the second set of user-inputs through the data entry form further includes receiving the second set of user-inputs from a user entering data in the data entry form.
In some implementations, the method wherein the multiple attributes of the first set of user-inputs include one or more calculable fields that are derived from at least one metadata field.
In some implementations, the method further includes deriving the calculable fields using a calculator engine configured to enact one or more arithmetical and logical operations to the at least one metadata field.
In some implementations, the method further includes calculating a risk assessment using the calculator engine.
According to a fourth aspect, a method described herein includes receiving, through a graphical user-interface including a set of task icons indicative of respective tasks and one or more placeable links for connecting pairs of task icons, a sequence of tasks in a workflow through an arrangement of the task icons into an order using the one or more placeable links and an assignment of each task icon to one or more users or one or more groups of users, enacting one or more tasks in the sequence of tasks in the workflow, wherein enacting includes generating a workflow task form configured to accept one or more user-inputs pertaining to values associated with an outcome of the task in accordance with a user-configured identification of the one or more user-inputs solicited from the one or more users or groups assigned to the task for completion of the task, receiving data including the one or more user-inputs entered through a display of the workflow task form on an electronic device, and storing the data in a database.
In some implementations, the method wherein the user-configured identification of the one or more user-inputs includes a set of questions indicated as required or optional for task completion.
In some implementations, the method further includes preventing the workflow from advancing based at least on the one or more user-inputs.
In some implementations, the method wherein the one or more users or one or more groups of users are dynamically assigned using a user-input value from a pre-configured metadata form.
According to a fifth aspect, a method described herein includes monitoring a file automatically for changes, and, in response to a detected change, generating an audit trail including data derived from the changes, and storing the generated audit trail in a database.
In some implementations, the method further includes determining a measure of risk for the detected change, and, in response to an identification of a high-risk change, identifying and executing one or more tasks, wherein the one or more tasks are identified from a group consisting of: alerting one or more users or groups of users, starting a workflow, securing the file, requiring verification of the detected change, and highlighting the change.
In some implementations, the method wherein determining the measure of risk for the detected change includes processing data relating to the detected change using a machine learning model to predict a measure of risk of the detected change.
In some implementations, the method further includes training the machine learning model using logged data relating to previously detected changes.
Particular embodiments of the subject matter described in this specification can be implemented so as to realize one or more of the following advantages: (a) the automated identification of AI files and processes in a company network or embedded in third party applications, which may not be become known to the firm otherwise, (b) providing further insight to the organization into these files as to code quality, cybersecurity risks, privacy risks, data and model dependencies, human vs. generative AI content (c) distilling all the multiple risk attributes into a single risk score that can be used to prioritize a multitude (e.g., millions) of files and help organizations focus in on files of greatest risk, thereby saving valuable computing resources by first performing a fast scan, and then a more detailed scan based on the results of the fast scan (d) providing a range of automated tests that can be run on the identified AI files to comply with laws, e.g., with minimal user AI knowledge and expertise required, thereby reducing reliance on scare AI resources (e) providing a framework to manage AI risks by inventorying and monitoring the AI files with accountability and tracking. Collectively, the system identifies risks that an organization, e.g., company, would not be able to uncover otherwise, does them at scale by scanning the entire organization network, saving valuable time and scarce resources in this process.
The systems described herein can be configured to unify the risk management process of files into an organized framework with high configurability. The system can provide for configuration of one or more of a discovery, workflow, inventory, or change management modules in order to allow a user to identify files that the user would like to track in an inventory or monitor in an audit trail. More specifically, this system facilitates the generation, logging, and collection of metadata and audit trail data in an automated and modular manner, which reduces the amount of time and effort a user needs to dedicate to ensure that risks are properly managed. Furthermore, the system can maintain relationships between each of the modules to facilitate interactions that improve the system, e.g., by using data generated as part of one module to improve the functioning of another module with respect to a user's customized risk assessment configuration. In particular, the system can be specialized to provide a tailored risk management user experience based on learning customer-specific requirements through user configurations.
Additionally, the system can be configured through user-interfaces, e.g., without the need for manual coding, to generate form, testing and workflow segments addressing one or more configurable parameters of the discovery, workflow, inventory, and change management modules. Such automated reconfigurability can enhance usability of files and systems while managing risks associated with use of AI, especially for a user who is not well-versed in configuring and testing systems using code. This can ensure that organizations can execute the necessary AI tests to ensure that their AI systems are reliable and trustworthy, and that they are not exposed to risks due to the lack of coding knowledge at the organization.
The system can also reduce the use of computational resources by aggregating a measure of risk for any number of processed files. In particular, the system can provide a single risk profile based on a set of risk attributes and corresponding values for a number of files, e.g., 100, 500, 1 million files. The ability of the system to distill the risk profile of many files into a risk score, e.g., one value that can facilitate the risk assessment process. For example, the system can use the risk score for one or more downstream calculations, e.g., thereby reducing the computational resources required with respect to maintaining and processing the separate risk values for the set of risk attributes for the downstream calculations. The system also saves valuable hardware and computing resources by (a) first performing a fast scan, and then filtering the results for a more detailed scan (b) scheduling the scan during off-hours when other computing resources are not being used for a more efficient scan, (c) targeting scans to required file locations, file types, and required parameters only, e.g., relative to performing a full scan (d) performing scans only for files that have changed since the last scan or have been modified within a given time interval, e.g., relative to performing a full scan.
Further, the system reduces computational and storage costs by dispositioning files based on risk level, and identifying file copies, for efficient organization and storage. For example, the system can reduce files in storage by identifying low-risk files for deletion, or for movement to another folder. In particular, low-risk files can be moved to a folder that is not monitored as frequently, thereby reducing the computational resources needed to assess and evaluate high-risk files in the first location. As another example, the system can be configured to treat a group of like files as a single entity, e.g., rather than multiple different entities, and apply the same policy to them. Grouping copies of files removes the need to scan each file, thereby saving computational resources.
In addition, the system can reduce the use of computational resources by maintaining risk assessment information in one accessible location as compared to maintaining this information in disparate locations, which can be hard to navigate in order to determine the risk status of a file. For example, rather than repeating file scans or assessments since there was no logging of results or results were performed by multiple users, the system can log actions taken with associated results for candidate files in a central inventory accessible by the multiple users. The inventory can allow users to conduct tests and attach test results to the inventory. All users can view when a file was assessed and evaluate whether or not it needs further assessment based on a number of factors. As another example, the system can prevent the need to pore through isolated documentation in different parts of the file system, e.g., in order to determine which files have been modified recently, or to exhaustively search for metadata of files to identify a user that is accountable for the file, thereby saving computational resources.
Thus, by efficiently scanning a network to identify high-risk files, maintaining an inventory of files and folders at a central repository, allowing automated testing of these files and the storage of test documentation, and automatically logging access attempts, nature of changes etc., the system described herein obviates the need for repeated scanning and assessments across different file systems. This in turn can result in savings in computational resources as well as early identification and mitigation of any risk and/or non-compliance associated with AI files, thereby resulting in reliable usage of such files.
The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.
Like reference numbers and designations in the various drawings indicate like elements.
The use of AI is ubiquitous. Generative AI tools are being used for various tasks, ranging from textual generation, e.g., writing emails, papers, code, etc., to generation of pictures, video, and other forms of content. Machine learning and AI is being used to perform various complex tasks such as analyzing large volumes of data and predicting the outcome for particular circumstances. Yet, the impact and risks of using AI continue to pose various open questions. In particular, many third party software providers are embedding AI models in their applications and users are using them even without being aware of these models. As such, there is a need for automated tools to identify, track, and mitigate as needed the risks incurred through the use of AI, e.g., in order to advance responsible and ethical AI practices. The technology described herein provides a system for AI risk management that can be configured to perform a file scan in order to identify AI files, whether embedded or otherwise, determine potential risk factors for a repository of files, add files to an inventory, monitor one or more of the files in the inventory, and manage edits and revisions made to the files or its metadata, e.g., through a workflow.
More specifically, the system can provide for the configuration of a file scan to identify attributes of files in a file repository and to evaluate the files for associated risk factors. As an example, the automated tests can include identifying an attribute or content indicative of AI usage or the presence of sensitive personal-identifying information. As another example, the automated tests can include a code quality assessment or a cybersecurity assessment to determine vulnerabilities based on associated code dependencies, e.g., libraries, files, or import modules.
The system can determine a risk score for each file using the identified attributes from the file scan and can be configured to add one or more of the files in an inventory and/or for monitoring. As an example, the system can add a high-risk file to the inventory, and monitor a file automatically for changes, such that in response to a detected change, it can generate audit trail data for the file. The system can also be configured to start a workflow, e.g., a sequence of tasks in an organized process, for the files in the inventory. As an example, the workflow can require responses to questions through a form that the assigned workflow user is required to fill out. The system can allow for the configuration of the form for this and other uses, e.g., entering data into the system.
As an example, the discovery module 130, inventory module 140, and change management modules can all access files from a file system 110. As another example, the AI testing module 180 can receive data from a discovery module in order to perform a test, and the results of the test can be stored in the inventory module 140. As another example, the inventory module 140 can be configured using a metadata form provided from the metadata form configuration module 170. As yet another example, a workflow can be configured to be run for a file being maintained by the inventory register 145 or in response to an indication of a change from the change management module 150.
For example, the risk management system 100 can access a repository of files stored on one or more computer-readable storage devices, e.g., by performing a file scan using a discovery module 130 and the scan results stored in a database. In the particular example depicted, the file system 110 can be a file system as maintained on the computer-readable memory of a device, e.g., laptop 112, a desktop computer 114, or a tablet 116, or a remotely-accessible file system that can be accessed using a network, e.g., the internet, an intranet, etc. As another example, the system 100 can access a remote file system, e.g., a cloud-based file system 120.
The risk management system 100 can also be configured to add candidate files based on an evaluation of risk for the file to an inventory module 140 that includes a metadata form configuration module 170 for configuring the generation of a form to enter metadata into the inventory register 145. As an example, the system 100 can monitor one or more files using the change management module 150 that tracks all changes to generate an audit trail and creates a version history of all file changes. In some cases, the files can be accessed from the inventory register 145 or in another database. In other cases, the files can be accessed from the file system 110 or cloud-based file system 120. As another example, the system 100 can organize a review process for a file in the inventory module 140 or the change management module 150 through the configuration of a workflow, e.g., using the workflow module 160. In some cases, the review process can involve checkpoint tasks assigned to one or more particular users and can include a separate workflow questionnaire that must be answered before the workflow task can be completed.
More specifically, the system 100 can be configured to perform the file scan in a file repository, e.g., the file system 110, the cloud-based file system 120, or both, using the discovery module 130. In some cases, the system 100 can be configured to access the repository of files in a particular location in the file repository, e.g., rather than accessing the entire repository, the system 100 can access a subset of the files in the file repository. As an example, the system 100 can be configured to scan one or more target folders, target drives, or file directories. As another example, the system 100 can be configured to scan a repository or particular location for a particular type of file, e.g., a code-containing file, e.g., a Python, C, Java, executable file, etc., based on the file extension. As yet another example, the system 100 can be configured to search one or more keywords in the subset of files. The system 100 can also be configured to filter the results of the first scan to perform a second more detailed file scan.
Referring now to
In some implementations, the system 100 can provide a selection button 210 to identify the type of file repository to be scanned. As an example, the system 100 can scan one or more of a file share 212, e.g., a storage resource on a shared hard drive accessible to a number of users over a network, list of folders 214, list of files 216, or URL-accessible file system 218 (Uniform Resource Locator-accessible). In some cases, the type of repository can be used to specify the means of accessing the files. For example, the system 100 can access local repositories located on-device using a different method than remote cloud-based repositories, code repositories, or other URL-accessible repositories. In the particular example depicted, the user has elected to scan a file share 212 using the selection button 210.
In the case in which the user has elected to browse a URL-accessible repository, the system 100 can access the remote repository by making a call to the webpage that exposes the repository. As an example, the system 100 can access one or more files accessible in remote cloud storage or a remote shared remote drive using a URL. In some cases, remote storage can include shared text-editing files or code repositories.
As another example, the system 100 can access a number of files by importing a list of folders 214 or list of files 216 that specify files located in different hierarchies of a file repository. In the case in which the user has elected to scan a list of folders 214 or files 216, the user can import or upload the list, e.g., a list that contains file paths for each of the folders or files, to the system. The system 100 can then access the file repository and identify the files selected using the list.
In some implementations, the system 100 can provide a file path entry field 220, e.g., for the user to specify the exact location of the files, e.g., either locally or in the cloud, or for the user to specify the location of the file containing the list of folders 214 or list of files 216. In some cases, the system 100 can additionally provide a file path scan search field 230 to allow the user to search the file repository. The file path scan search field 230 can aid the user in determining exactly which files to select for scanning. As an example, the system 100 can use the selection of the repository type to inform where to search user inputs entered in the file path scan search field 230.
In some implementations, the system 100 can provide a mechanism for the user to save a configured file scan in a log, e.g., using a scan-type identifier 240 and label input field 250. For example, the system 100 can maintain a log of previously configured and performed file scans for user access. In some cases, the system 100 can additionally provide a mechanism for the user to save and access scan-templates, e.g., templates characterizing different types of scans. As an example, a user can perform configure a general type of scan and save the general type as a template for similar scans using a template identifier drop-down 245.
Referring back to
As an example, the system 100 can be configured to identify a subset of the files as candidates that are potentially generated using AI based on the presence of an attribute or content indicative of AI usage. For example, the system 100 can identify file extensions that pertain to code languages (Python, SQL, R, etc.) or executable files (.exe, .sh, .cmd, .bin, .run, etc.) as code-containing files that potentially include AI content. The system can also evaluate the number of lines of code in a code-containing file as an indicator of file complexity. As another example, the system 100 can identify a reference to a generative AI tool, library, algorithm, keyword, etc. in the file or in the title of the file.
Referring now to
The system can provide one or more dashboard visualizations to depict the results of the file scan. In some implementations, the system can provide a dashboard visualization that identifies the file distribution of the types of files in the repository. As an example, the dashboard visualization 300 is a count dashboard that represents the number of one or more different types of files. In some implementations, the system can provide a dashboard visualization that identifies the different types of code contained in the AI files. As an example, the dashboard visualization 320 is a pie chart that represents the relative amounts of the code contained in the AI files in the repository.
The system can also provide a display 340 that provides different determined characteristics of each of the files scanned, e.g., characteristics determined using the file scan. In the particular example depicted, the display 340 includes a table 350 with columns corresponding to different determined file characteristics as determined by the scan. In this case, the table 350 includes file names, the last modified date, the owner of the file, an indicator of the presence of keywords in the file, and the number of code lines in the file. As another example, the table 350 can include information about the libraries in the file or the purpose of the file. As yet another example, the table 350 can include metadata information about the files.
In some implementations, the system can identify the presence of sensitive information in the file. For example, the system can identify the presence of personal identifying information. Referring now to
The system can provide the personal identifier scan configuration display 600 to allow a user to selectively identify different types of personal identifying information. In some implementations, the display 600 can enable a user to scan through files for credit card, medical, contact information, usernames, passwords, or other sensitive data. As an example, the system can employ a regular expression algorithm to match patterns in credit card, medical, contact information, etc.
In the particular example depicted, the personal identifier scan has been configured to scan for credit card data, as depicted in the expression label field 602. After the user selects the expression label field 602, the system can generate one or more detailed expressions 610, e.g., regular expressions. The system can use one or more regular expressions that correspond with the type of personal identifier of interest selected to search for files that contain the personal identifier information of interest. In some cases, the user can additionally add to, e.g., using the add button 612, edit the autogenerated regular expressions, e.g., using a corresponding edit button 614, or delete any one of the detailed expressions 610, e.g., using a corresponding cancel button 616.
After the personal identifier scan has been configured, e.g., using the display 600, the system can provide a display 650 of results. In the particular example depicted, the system displays a keyword 660, e.g., which can correspond with the expression label field 602 in the personal identifier scan configuration display 600, and a count 670 of the number of instances containing one or more regular expression matches in the file.
Referring back to
Referring now to
In some examples, the system 100 can perform a code quality analysis by evaluating the file, e.g., of file path 410, using a set of different error categories, e.g., the error categories 432 as identified in the table 430. In the particular example depicted, the table 430 includes a description 434 of each of the error categories 432 that a file can be associated with as a result of the code quality analysis as well as a count column 436 indicating the number of times that the code quality analysis identified the error category in the file.
More specifically, in display 400, the system has evaluated the file defined by file path 410 against six error categories 432. In this case, the error categories 432 include a fatal error that can prevent further processing, a bug error, an informational error that captures runtime warnings that do not prevent the code from running, a refactor error that quantifies the extent to which the code can be refactored, a convention error that represents violations of coding standards and stylistic problems or minor programming issues.
In some implementations, the system 100 can identify warnings by compiling the code, and the system can identify errors, e.g., “bugs”, by running the code. In some cases, the system 100 can identify whether or not the code violates standard convention or could be improved by being refactored, e.g., by evaluating the code's alignment with one or more best practice coding standards. In some cases, a user can provide the coding standards to the system 100 for evaluation.
The count column 436, e.g., the number of instances of the error as determined by the system 100, associated with each of the error categories 432 can be used to determine the code quality score 420. For example, the system 100 can assign weights in accordance with a measure of riskiness for each of the error categories 432. For example, the system can assign a larger weight to a fatal error than a less concerning identified error, e.g., an informational error.
In some implementations, a user can specify that one or more of the errors in the set of error categories 432 are assigned a weight of zero, e.g., so the errors do not contribute to the code quality score 420. As an example, a user can configure which of the set of error categories 432 contribute to the code quality score 420 using the display 400. In the particular example depicted, a user has specified that the informational errors do not contribute to the code quality score 420.
The system 100 can also provide information for identifying specifically which error was captured in the code analysis and related information, e.g., as depicted in the table 450. For example, the system 100 can provide a line number to define where the error is located, the error category (which corresponds with the error category 432), and the error description. In some implementations, the system 100 can also provide a search error category 440 to selectively filter through the errors logged during the code quality analysis. As an example, this can allow a user to search for particular error categories, error codes, etc., which can facilitate the targeted risk assessment of the code-containing file, e.g., for auditing.
Referring now to
In some implementations, the system 100 can provide a map that depicts one or more code-containing files 520 and dependencies 510, e.g., libraries, files, data, other or modules that are considered dependencies for these files 520. For example, a library can be imported or used by the files 520 through a call to an application programming interface. In some cases, libraries are open-source, e.g., available in an unrestricted manner. In other cases, libraries can be closed-source, e.g., proprietary or use-restricted.
In the particular example depicted, the system 100 provides the library linkage map display 500 for three code-containing python files 520. The library linkage map display 500 includes arrows to link the libraries to their respective files 520 and the files 520 to the respective outputs of the files 530, e.g., the files 520 can contain code that generates a file, e.g., a text-containing file, a figure, comma-separated value file, etc. The system 100 can also provide a designation of what type of file is being considered, e.g., a file, module, or file operation, and an indication of each file's file scan status, e.g., unscanned with respect to the file scan depicted in
More specifically, the linkage map display 500 can conveniently present a collection of related data that pertains to risk management of code-containing files that is often diffuse, e.g., disparate and hard to unify, throughout a file repository. In the case that the linkage map display 500 illustrates both dependencies 510, files 520, and outputs 530, the linkage map display 500 can facilitate the ease of identifying which dependencies 510 were used in the creation of certain output files 530. The system 100 can streamline version-control of code outputs, e.g., code output files, in software development, using the linkage map display 500.
In some implementations, the system 100 can also provide security vulnerability information for public libraries, e.g., libraries as evaluated in a vulnerability database 560, e.g., the National Institute of Standards and Technology (NIST National Vulnerability Database. As an example, when a user clicks on one of the dependencies 510, the system can direct the user to a pop-up display 550 of information about the library version as well as a corresponding security vulnerability identifier 555, e.g., the common vulnerability identifier provided by a governmental entity, by an administrator of the user's system, etc., for an identified security vulnerability of the dependency.
For example, the system 100 can also detect the versions of all libraries 525 installed on the scanned file system 110. In particular, the system 100 can detect multiple versions of a library if installed on a computer, analyze those library versions for cybersecurity vulnerabilities, and provide the link to NIST National Vulnerability Database that identifies those vulnerabilities.
Furthermore, the corresponding security vulnerability identifier 555 can include a link that directs the user to a corresponding webpage which contains further details regarding the security vulnerability identifier 555, e.g., such that the user can browse the corresponding webpage that describes the security vulnerability identifier 555. For example, the system 100 can provide each respective security vulnerability identifier 555 link to the NIST National Vulnerability Database. As another example, the system 100 can provide other related information to the security vulnerability identifier 555, e.g., by opening a pop-up display (not depicted) that includes information from a database of security vulnerability identifiers as provided by an administrator of the user's system.
Referring back to
Referring now to
The system 100 can provide the risk score configuration display 700 for a user to assign a value to each risk factor in order to calculate a measure of riskiness for a file, e.g., a corresponding risk score, based on a combination of user inputs and the results of the file scan. In some cases, the risk score configuration display 700 can also be referred to as a no-code configuration display since the display 700 allows a user to configure the risk score calculation using a user interface, e.g., without the need change or write code.
More specifically, the system 100 can allow a user to assign a corresponding risk score to each attribute identified from the file scan using the risk score configuration display 700, e.g., such that the user can make a determination of a relative importance of each attribute according to the particular scan being performed. In some implementations, the system 100 can additionally assigned a corresponding library risk weight, e.g., by providing a risk factor 740 for each of the libraries. In some cases, the risk factor can be informed by a security vulnerability database, e.g., the NIST database.
The system 100 can then use the weights to calculate a risk score, e.g., by receiving a risk weight parameter for each risk factor and multiplying the assigned value with the risk weight parameter. As an example, the system can calculate a weighted average of the attributes using the corresponding weights to determine the risk score. As another example, the system can receive a corresponding set of threshold values for each risk factor and compare the value for each risk factor to the respective threshold value in order to determine the risk score.
In the particular example depicted, a user is configuring a risk-score calculation for a finance repository 705 using a number of tabs. In the expanded Python tab 710, the user has assigned code quality risk weights 712-722 to each of the error categories corresponding with a code quality analysis, e.g., the code quality analysis described in
After configuring the risk score configuration display 700, the user can elect to run the risk score calculation. In some implementations, the system 100 can output a risk score card display 750 that includes the determined risk score and associated data for each of the files. In the particular example depicted, the system provides a risk score card for the file 752. In this case, the risk score card display 750 includes a risk attribute table 755 to identify how each attribute contributes to the risk score, e.g., by providing a count of the attribute, the risk factor, e.g., the assigned weight, and overall contribution to the risk score.
In some implementations, the system 100 can also provide one or more visualizations in the risk score card display 750 to further illustrate the results. For example, the system 100 can provide a risk contribution bar graph visualization 760 or a pie chart visualization 765. In the particular example depicted, the visualizations 760 and 765 are interactive, e.g., hovering over each bar in the bar graph 760 or portion of the pie chart 765 can display information about the corresponding attribute that contributed to the risk score.
The system 100 can determine a subset of the candidate files as high-risk based at least on the risk factors or the risk score. In some implementations, the system 100 can use the determined risk score to determine whether the file meets certain criteria for further monitoring, e.g., using a risk policy, or can determine to maintain the file in the inventory register 145.
As an example, a user can use the results of the risk score calculation to determine which files to manage further. In some cases, the system 100 can provide an option for a user to elect to maintain a file, e.g., after viewing the corresponding risk score card display 750, the user can select an option to maintain a file under a policy. As an example, the user can configure one or more policies for monitoring the file by specifying a set of rules as the policy. In other cases, the system 100 can be configured to automatically maintain files with risk scores greater than a certain threshold or files with particular attributes greater than a certain threshold level.
In another implementation, the system 100 can employ a risk identification model configured to predict a measure of risk for a set of candidate files. As depicted in
As an example, the risk identification model 785 can be a rules-based model, e.g., the system can compare the input 780 to a set of preconfigured rules that specify conditions for high, medium, and low risk files and can evaluate a measure of riskiness based on the rules the candidate satisfies.
As another example, the risk identification model 785 can be a machine learning model, that can be configured to process the input 780 to generate a prediction output 795, e.g., as depicted. In this case, the system 100 can store file metadata indicating whether or not a user elected to monitor the file as a ground truth label that can be used for training the risk identification model, e.g., using the data maintained in the inventory register 145.
More specifically, the system 100 can train the risk identification model 785 using a training subsystem. In this case, the training subsystem 775 trains the risk identification model 785 as a binary classification model based on the random forest algorithm. In particular, the system 100 can train the risk identification model 785 using a machine learning algorithm on an objective function that includes a discrepancy between a predicted risk label and a ground truth risk label. For example, the risk identification model 785 can be trained with a training dataset including end-user computing (EUC) textual attributes 770 and EUC numeric attributes 772, e.g., attributes determined from previously identified and assessed candidate files from the inventory register 145 or another database. More specifically, the risk identification model 785 can be trained using a data set of previously identified and assessed candidates that were progressed and stored in the inventory register 145.
In some cases, the training data can additionally include the results of one or more tests run on files using the AI testing module 180. In particular, the results can be maintained in the inventory register 145 as data relating to model behavior, compliance, or risk status of a file that has had a test run on it. In this case, the model behavior refers to one or more of the bias, fairness, interpretability, validity, reliability, or robustness to data drift metrics of a model that is included in the file, e.g., as determined by the results of a test run using the AI testing module 180.
For example, in the case that a user has elected to run an AI bias test, the system can store the results of the AI bias test, e.g., data that indicates a measure of riskiness and compliance of the file based on whether the results of the model included in the file are biased as attributes, in the inventory register 145. More specifically, the data relating to model behavior, compliance, or risk status of a file can be stored each time a test is run. In this case, the risk identification model 785 can use the bias data from each of the AI bias tests run to learn which aspects of a file contribute to the bias of the model. Similarly, for other tests run using the AI testing module 180, the system can store the results in the inventory register 145 for training the risk identification model 785.
Referring now to
The user can use the risk policy configuration display 800 to configure a policy 810 for one or more files, e.g., an identified group of files in a particular file repository. For example, the user can select one or more report columns for the output of the risk policy using a report column selector 815. In this case, the system can output a report after enacting an action of the policy. As an example, an action can be marking a file for further review by an identified owner of the file. In this case, the system 100 can provide an owner assignment field 820, e.g., such that a user can designate an owner of the file that can access the report. In this case, the designated owner can be presented with a checkpoint for further verification, e.g., the system 100 can require that further verification be necessary. In some cases, the verification can be a part of advancing a workflow.
In some embodiments, the system 100 can also provide a selection option, e.g., the radio buttons 830, for a user to select from one or more maintenance options for the file. For example, the options of the radio buttons 830 can include manually maintaining the file, excluding the file from further monitoring, or showing metadata columns in generated reports.
In some embodiments, the system 100 can also provide the user with options for configuring various aspects of the generated report. For example, the system 100 can provide a risk multiplier field 825 for a user to specify a specific risk multiplier for a file. As an example, the user can specify a risk multiplier to either upweight or downweight the risk as calculated in a previous risk score calculation as part of the report. As another example, the system 100 can allow a user to define operations on the files as part of the policy using a business operation calculator 840, e.g., a business operation calculator. In some cases, the business calculator can allow for a user to query a database, e.g., without the knowledge of SQL or No-SQL, by selecting operators and specifying how to use them in a statement. For example, configurable rules can include both arithmetical (e.g., less than, greater than, equal to, contains) and logical (e.g., AND, OR) parameters allowing for complex rules to be defined.
The system 100 can also provide a disposition configuration display 850 to enable a user to take a maintenance action, e.g., by selecting a radio button 865 for a particular policy 860. For example, files can be investigated (e.g., marked for further review), moved to another folder, deleted, or retained using a maintenance action. The system 100 can also allow for the file to be automatically added to inventory by selecting a box 870, e.g., a user can specify that a file should be maintained, e.g., even if the determined risk score of the file is considered to be low risk. In the case that a user elects for a file to be investigated, the system 100 can allow for a user to configure the cadence at which the file is monitored, e.g., by specifying an allowable time elapsed since the last scan in a field 885, e.g., using a number of days after scan date. For example, the system can be configured at the cadence of a number of hours, weeks, or months.
As another example, similar files can be grouped together, e.g., using a group file copies checkbox button 875. In this case, the system 100 can group copies or versions of a file in the subset of candidates as a single candidate in the inventory.
The system can present the user with a copy group management display 900 to allow a user to manage copy groups. For example, the system 100 can be configured to perform one risk evaluation for each copy group, e.g., since the files are similar or the same, using the copy group management display 900. In some embodiments, the copy group management display 900 can include an indication of file repository 905 and a table 910 that displays the group names 920 assigned to each file grouping.
More specifically, grouping similar files together can streamline the risk assessment process and reduce computational resources by providing an option to group similar files together for assessment. For example, files in a group can be accounting or banking statements, which are all unique, but not different in terms of the content they contain. In this case, a user can elect to manage multiple copies of a balance sheet, income statement, and cash flow statement within the same copy group, e.g., since the overall content of what the file represents is the same, even though every spreadsheet is unique.
In some embodiments, the system 100 can display a copy group list display window 950 upon receiving an indication that a user has selected a copy group, e.g., for viewing. In this case, the copy group list display window 950 includes information about the quarterly cash flows copy group list 960. The user can navigate between copy group list display windows and the copy group management display 900 to manage the copy groups.
Referring back to
Referring back to
Referring now to
The inventory configuration display 1100 can be used to generate a corresponding inventory form 1150, e.g., using the inventory module of
In some implementations, the inventory configuration display 1100 can include a configuration tab 1115 that can allow a user to add different entry items, e.g., configured data entry fields for the same data entry object, to the corresponding inventory form 1150. For example, the system 100 can provide a data entry button 1120 for adding different data entry items to any existing data entry items e.g., the data entry items 1170, 1174, 1176, 1182, 1186, 1190, and 1194. In the case that the user selects the data entry button 1120, the user can add a data entry item, e.g., a title 1125, data type 1130, e.g., text, number, etc., and other corresponding fields that will be included in the form to solicit the input of the respective data for each data entry item. The system 100 can also provide a data entry field identifier 1128 for each entered data entry item, e.g., as the user enters each data entry item, the system 100 can automatically assign a corresponding data entry identifier.
The system 100 can provide form display options in the configuration display 1110. As an example, the corresponding fields can be separated into different containers, e.g., portions of the form. In some implementations, the system 100 can provide the user the option to segment the data entry fields into tabs, e.g., where each tab includes a subset of the data entry items and the user can select between tabs to view different subsets of items. In this case, the user can input a tab title in a tab column 1132 for each data entry item. As another example, the corresponding fields can be separated into groups, e.g., data entry items can be grouped together on the form, e.g., on the same tab of a form. In this case, the user can input a group title in a group column 1134.
In some implementations, one or more of the corresponding fields can be calculated from one or more entered fields. As an example, the user can provide an indication that the data entry field is a calculable 1140 data entry. In this case, the system 100 can provide the user the option to calculate a data entry field using the input of one or more data entry fields, e.g., by enacting one or more operations on the data entry fields using the field identifier 1128 to identify respective fields as operands for each operation. For example, the system 100 can allow for the multiplying or dividing of two or more inputs to number data fields, concatenating two or more inputs to text fields, running a regular expression on an input to a text field, using simple rules-based calculations on one or more inputs, etc., to populate the entry for a calculable field.
In some implementations, the data entry field calculator display 1200 includes a variety of operations 1210 that can be selected by choosing a corresponding button. Each operation can operate on an operand specified by the data entry field identifier 1128. More specifically, the system 100 can provide a user with operational buttons such that a user that is not familiar with code can configure the calculation. The system 100 can also provide a corresponding dictionary 1260, e.g., to provide definitions for each of the operations 1210. As an example, the definitions can be provided in a portion of the data entry field calculation display 1200 juxtaposed with the operations.
As an example, the system 100 can provide one or more mathematical operations 1220, e.g., for addition, subtraction, multiplication, division, taking a min or max, etc., and can provide one or more logical operations 1230, e.g., if, and, or, not. As another example, the system 100 can provide other operations 1240, e.g., parentheses for asserting the order of operations, relational operations, a comma for use in relational operations, etc. As yet another example, the system 100 can provide one or more datetime operations, e.g., a button to calculate the current time or day in datetime format, a date difference, and date addition button.
In some implementations, the data entry field calculator display 1200 can additionally include an equation display portion 1270 that allows a user to view the equation they are creating as a calculable field by selecting the operations 1210. In this case, the user can select operations 1210 and edit the calculation directly in the equation display portion 1270, e.g., by inputting the data entry field identifier of the corresponding operand or correcting inadvertent errors made while selecting buttons. In the particular example depicted, the equation display portion 1270 includes a chained calculation, e.g., a calculation created with multiple operations. As an example, the user can need to edit the calculation directly using the equation display portion 1270, e.g., when proofreading the equation after entering. As another example, the user can view the equation before saving the equation, e.g., using the save button 1280.
Referring back to
In some implementations, the system 100 can allow the user to configure the input portions to be drop-down menus, radio buttons, multi-line inputs, etc. For example, the system 100 can provide the option for the user to specify whether the corresponding input portion for a data entry item is multiple lines using a multi-line indication column 1136 and whether the corresponding input portion for a data entry item is selected from a list, e.g., a drop-down, using a list indication column 1138. For example, the user has elected that the input portion for the data entry item 1170 and 1174 not be a list, and that the input portion for data entry item 1174 include multiple lines, e.g., as specified by the columns 1136 and 1138. In the case that the data entry item is selected from a list, e.g., for the EUC Owner 1176, the system can receive the list options from the user, e.g., the user can select the pencil icon 1140 in the row and enter the list options in a list option input portion.
As an example, the corresponding inventory form 1150 can be provided to multiple users for data entry, and the system 100 can maintain the data entered using the form in the inventory register 145. More specifically, the system 100 can process the data entered in each corresponding inventory form 1150 to add to a database, e.g., to populate a row in the database.
In some cases, the system 100 can determine the risk of a file in Inventory based on the matrix as depicted in
In particular, the data entered into the inventory, e.g., by a user using the corresponding inventory form 1150, can pertain to files. In this case, the files are maintained in the inventory in the inventory register 145. The system can monitor the data entered into the inventory, e.g., the data pertaining to the files. As an example, the system can run a risk assessment of each file entered into the inventory to provide a risk value for each of the files, e.g., a risk value corresponding with the risk assessment matrix 1300 specifying the associated risk based on a measure of significance and a complexity or likelihood of error.
In some cases, the system can run a risk assessment upon receiving a completed data entry form, e.g., when adding a new row to a database. In other cases, the system can run the risk assessment as part of a policy that has been configured for a file in the inventory. In particular, files that are added to an inventory for maintenance, e.g., as described in
Referring back to
Referring now to
For example, as depicted in
In some implementations, the system 100 can provide an option 1015. In the case that a user selects the option 1015, the system 100 can present the user with suboptions, e.g., both generic suboptions and suboptions associated with the particular file type, e.g., in a second dropdown menu 1020. As an example, the system 100 can provide generic options to further monitor the file, edit metadata, add or remove to the inventory register 145, compare versions, check in the file, generate a rename event, etc. As another example, the system 100 can provide options associated with the particular file type. In this case, since the file 1010 is a code-containing file, e.g., a python file, the system 100 can include an option to open the AI validator option 1030, which is a software suite that provides various tests that a user can select to be run on a file.
In particular, in response to selecting the AI validator option 1030, the system 100 can present the display 1040, as depicted in
As an example, the AI fairness assessment can break down the performance of an algorithm by a sensitive feature such as class protected by law and compare if there is a significant difference in performance across the classes. As another example, the AI interpretability assessment can include feature importance analysis, sensitivity analysis, or partial dependence analysis, e.g., to validate the relationship between the input and predicted output generated using the file. As yet another example, the AI validity assessment can include cross-validation, holdout validation, or comparative analysis, e.g., to assess generalization of the output of the file based on unseen input data. As a further example, the AI reliability assessment can include reproducibility testing, robustness evaluation, or adversarial testing, e.g., to validate the consistency of the output of the file. As yet a further example, the data drift assessment can include a temporal stability analysis, feature distribution comparison, or prediction drift monitoring, e.g., to evaluate the output of the file based on gradually changing input data over time.
As an example, when a user selects the AI fairness test option 1045, the system 100 can run the test, e.g., using a software module configured to generate the AI fairness test report, and present the results in a result presentation display 1050 as depicted in
The system 100 can also store data relating to the test results in the inventory register 145. In particular, the system 100 can maintain test data as attributes, e.g., text or numeric attributes, related to the file that can be used to train a machine learning model configured to evaluate one or more of model behavior, compliance, or risk status. For example, the system 100 can use data relating to test results run using the AI testing module 180 as part of the dataset that is used to train the risk identification model 785, as described in
Referring back to
The system can enact the specified sequence of actions in the workflow in response to a detected change in the file to which the workflow pertains, e.g., according to the current task, using the workflow module 160. More specifically, after each task is completed, the system 100 can enact the next checkpoint task to ensure that proper approval and monitoring is received. In some cases, the system 100 can prevent the workflow from advancing by locking access to the file to which the workflow pertains until the checkpoint is addressed.
In some embodiments, the workflow module 160 can provide a display that allows a user to configure a workflow by arranging task icons in a particular order.
Now, referring to
In the particular example depicted, the user is configuring a workflow for approval of an entered form that includes end-user computing (EUC) details by dragging-and-dropping task icons 1410 and arranging the task icons 1410 into an order using links. In this case, the workflow can include the completion of EUC details 1412, approval of the EUC registration by a different user, e.g., the direct manager 1414 and the registration reviewer 1416, a final task 1418, e.g., a task that indicates final approval as a completion, and a cancellation of the started workflow of the workflow representation display 1420, e.g., in the case that the EUC details are not approved. In the case that the workflow is cancelled, the system 100 can be configured to restart the workflow back at the first task in the workflow, e.g., in response to the completion of EUC details a subsequent time.
As an example, the checkpoints involved in the workflow depicted include awaiting approvals, e.g., the approval checkpoints 1430 and 1432, going into production 1434, requiring more information 1436, or getting cancelled 1438. Each checkpoint is placed in accordance with its role in the sequence of tasks, e.g., in order to advance the workflow. In some cases, the user can select the type of link from a set of options provided for the link. For example, the system 100 can provide different types of links based on the type of tasks being connected for user selection. In other cases, the system 100 can infer the type of link between two tasks. For example, the link between the completed end-user computing (EUC) details task 1312 and the line manager approval task 1314 can be inferred as awaiting direct manager approval. As another example, an intermediate disproval in the sequence of tasks, can be inferred as requiring more information 1436.
Each task can be assigned to a user, e.g., to generate information as part of an audit trail. In some cases, the completion of the task can depend on the user filling out a form to log information regarding to the task.
In the particular example depicted, a user is assigned a review task, e.g., the next review task 1510. In this case, the administrator has used the metadata fields tab to facilitate owner look-up for assignment 1520, e.g., the administrator has assigned the line manager as the owner 1530. In other cases, the administrator can identifier the user from a list of users, e.g., using one or more desired users' names, or from an established group of users, e.g., a defined permissions group. In some cases, the task can be assigned to a configured field in the inventory form, e.g., rather than a static user or group. In other words, a configurable workflow with corresponding tasks can be created using a no-code configuration display, and the workflow task can be assigned to a field value of a configurable inventory or meta data form that is created using the configuration display, for ultimate flexibility and convenience of users.
In some embodiments, the workflow configured can specify the sending of notifications to one or more of the identified users associated with the workflow, e.g., to approve a file in order to advance the workflow. As an example, the system 100 can provide an approval checklist configuration display 1550 to allow a user to configure a checklist for the user assigned to the specific task, e.g., the system 100 can require that the user fill out and submit the checklist in order to complete the task assigned to them.
For example, the approval checklist configuration display 1550 can include a question entry 1560 portion and a question table 1570 that includes each question entered into the approval checklist, e.g., using the question entry portion. In the particular example depicted, the approval checklist configuration display 1550 includes a question input portion 1562, an indication of what the correct answer is 1564, e.g., an answer that can enable the workflow to advance to the next task, an indication of whether or not the question needs to be answered 1566, and a default answer 1568. In this case, the administrator has selected that the necessity of the correct answer is mandatory, e.g., that the workflow cannot advance unless the user enters an answer for the question.
In some implementations, each question entered can be configured for each task, e.g., as depicted in the question table 1570. Similar to the configuration of the corresponding inventory form 1150 as described in
Referring back to
In some cases, the system 100 can log audit trail data in the inventory register 145, e.g., to maintain all system-derived information pertaining to the file in one location. As an example, the system 100 can store and analyze user behavior data for exceptions, such as a user modifying a file at a time when the user usually does not interact with the file, a user making a type of change they have never made before. As another example, the system 100 can store and analyze the types of modifications being made to a file.
In some implementations, the change management module 150 can determine a measure of risk for the detected change, e.g., using a machine learning model with any appropriate architecture configured to process data relating to the change to predict a measure of risk based on the detected change. For example, the system 100 can be configured to train the machine learning model using logged data relating to previously detected and assessed changes by updating the respective values of parameters of the machine learning model.
As an example, the system 100 can allow a user to monitor actions taken with respect to the files in the inventory register 145. In this case, a subset of the file directory display portion 1610 is depicted in the monitoring display portion 1620. In some implementations, the system 100 can be configured to monitor files with associated risk labels 1612. For example, the system 100 can identify and perform one or more automated tests based on the associated risk labels 1612, e.g., the one or more automated tests can be selected from the options presented in the display 1040
In some implementations, the system 100 can use the change management module 150 to monitor a file in order to determine when it was last opened or last modified 1614 as well as any changes made during the last modification. The modifications can be viewed in the monitoring display portion 1620, e.g., the system 100 can provide a table of the most recent changes recorded with respect to the files in the inventory. In this case, the table can include an action column 1622, e.g., the action as logged by the change management module 150. As an example, the action column 1622 can include an indication of a file change, an ownership change, the addition of a change to a repository, etc. As another example, the table can include details of what was changed 1624, e.g., modifications to code, the user associated with the change 1626, and comments associated with the change 1628.
For example, the system can provide a behavior exception configuration display 1700 to allow a user to configure behaviors the system can flag as exceptions. The administrator can use the behavior exception configuration display 1700 to configure an action in response to detected user access or editing behavior that is anomalistic, e.g., with respect to previous user behavior that has been assessed to determine whether or not a logged exception was a true exception. For example, a behavior exception can include a user editing the file very late at night when historically the user had tended to open the file during the day. As another example, a behavior exception can include a user making a bunch of small changes to a file that had previously been in production and not edited for over three years.
The behavior exception configuration display 1700 can enable a user to configure a behavior analysis policy 1710 for when a particular user behavior exception is detected. In some implementations, the system 100 can train a machine learning model to detect user behavior access and edit exceptions relative to previously logged user behavior, e.g., on a measure of criticality of each user behavior and edit exception in the previously logged behavior. In other implementations, the system 100 can use rules, e.g., the defined exception rules 1720, to detect anomalistic user behavior.
As an example, the user can use the behavior exception configuration display 1700 to select a policy to apply for one or more defined exception rules 1720, e.g., static rules provided by the system. In the particular example depicted, the static rules can be configured as templates, e.g., the X 1724 and Y 1726 values can be variables that the user provides to enact the desired rule. In some implementations, the user can provide a support file, e.g., a text editor or spreadsheet file that contains a list of changes, as the X 1724 and Y 1726 values. More specifically, the configuration of exception templates allows for control over the specific types of changes in the one or more files that are being monitored for behavior exceptions using the behavior analysis policy 1710.
In this case, the user has elected that the behavior analysis policy 1710 pertains to any user making changes in a new sheet, e.g., a spreadsheet sheet, that matches a provided list of changes for tracking, e.g., the Significant Changes 1728 for the first time 1726. As another example, the user can select that the policy applies to a change in X values, e.g., using a list of spreadsheet column names, in Y duration, e.g., using a specified time period, e.g., 5 days, 2 weeks, 1 month, etc.
The behavior exception configuration display 1700 can provide one or more exception action options, e.g., the system 100 can provide the behavior exception actions 1730 as checklist buttons for a user to select as part of the behavior analysis policy 1710. For example, the behavior exception actions 1730 can include highlighting an event audit trail, marking the file read-only until verified, or sending a notification to a user or an owner. As another example, the behavior exception actions 1730 can include starting a workflow, restricting file check-in to the inventory register 145, or starting a workflow.
More specifically, the design change policy 1810 can be configured to monitor design additions 1820, e.g., charts, formulas, pivot tables, etc. modifications 1830, formulas, objects, widgets, etc., deletions 1840, e.g., charts, hyperlinks, scenarios, etc., and other design changes 1850, e.g., cell format changes, comment changes, modification of conditional formatting, etc. in the one or more files. In the particular example depicted, the user has elected to monitor modifications to the links, queries, and VBA code and deletions to the links, pivot tables, queries, and VBA code, e.g., as shown by the checked checkboxes.
The system 100 can allow the user to select the action to perform in response to the specified detected change, e.g., using the action selection 1860. In the particular example depicted, the system 100 can be configured to highlight the event audit trail, send a notification to a particular user, send a notification to the owner of a file, make the file read-only until the change is verified, start a workflow pertaining to the file, restricting the file check-in to the system until exceptions are verified, or force verification before allowing a change of status. In this case, the user has elected to send a notification to a specific user, e.g., by selecting the notification checkbox 1862, but has not yet identified the user. In some cases, when the user selects the person icon 1864 to the right of the send notification checkbox 1862, the system 100 can provide a comprehensive list of the users of the system, e.g., for the user to choose the user to send the notifications to.
The system can access a repository of files (step 1910). In some embodiments, the system can access the repository of files in a first scan in accordance with a set of one or more configurable scanning parameters. In this case, the system can identify one or more files in the first scan according to the configurable scanning parameters. In some cases, the system can use the results of the first scan in a second scan, e.g., according to a second set of configurable scanning parameters. For example, the second set of scanning parameters can include one or more of target folders, target drives, file types, file age, file compression technology, scan depth, and keywords.
The repository of files can be stored on one or more computer-readable storage devices, e.g., a laptop, a desktop computer, a tablet, or a smart phone, a remotely-accessible file system that can be accessed using a network, e.g., the internet, an intranet, etc., or a cloud repository file system etc. In some cases, the system can be configured to access a subset of the repository of files, e.g., files in a particular directory in the repository, files including a specified keyword, or files with a particular file extension, e.g., a .py, .xlsx, .xlsm, .exe, etc.
The system can analyze the files to identify one or more candidate files that potentially include an attribute or content indicative of AI usage (step 1920). In some embodiments, the system can identify one or more attributes of the file, e.g., using one or more of file metadata, file location, or file contents. For example, the system can identify a file name, file type, attributes specific to the file type, one or more file authors, and organizational information pertaining to the file. As another example, the system can identify one or more dates of modification, a date of last access, and an author associated with the last access. As yet another example, the system can determine a file size, identify the location of the file in the repository (with respect to a file hierarchy), or determine security rights of the file in the repository.
As a further example, the system can identify one or more attributes or content indicative of AI usage in the file. In this case, the system can identify the potentially AI-generated files using a file extension, a reference to a generative AI tool, library, algorithm, keyword, etc., or a number of lines of code. In another case, the system can identify files that pertain to a certain subject or that were modified by a certain user or groups of users.
The system can identify a corresponding set of attributes for each candidate (step 1930). For example, the system can determine a set of attributes using file metadata. In some cases, the system can identify the presence of sensitive personal identifying information or based on security permissions for the file. As another example, the system can identify attributes by performing a code quality assessment to detect one or more types of errors or warnings and vulnerabilities associated with each version of the identified library versions in a code-containing file.
As yet another example, the system can evaluate associated input and output dependencies including libraries, files, or import modules for candidate files. In some implementations, the system can display a visual representation of the identified input and output dependencies for each candidate on a graphical user interface. For example, the visual representation can include an interconnected AI map of one or more objects representing a sequence of inputs to outputs in accordance with the dependencies at a specified-hierarchy level. As another example, the objects can have characteristics that are indicative of dependency information including a type of dependency, a scan status, and a type of dependency. In this case, the system can allow a user to view the candidate, add the candidate to inventory or monitoring, assign the candidate to a policy categorization, or remove the candidate from future scans, e.g., through an interaction taken on the graphical user interface.
The system can determine of subset of candidates based on the attributes (step 1940). For example, the system can identify a subset of candidates based on a risk score for each candidate determined using the attributes. As another example, the system can assign a label to each candidate using a calculator engine configured to enact one or more arithmetical and logical operations as a calculation on the candidates based on the corresponding set of attributes. In some cases, the system can allow for the grouping of copies of a file as a single candidate or versions of a file as a single candidate.
In some embodiments, the system can assign a risk weight to each attribute, determine a number of instances of each attribute, and multiply the risk weight with the number of instances of the attribute to determine an aggregate risk score for the candidate as the risk score. In some cases, the system can additionally receive one or more threshold values indicative of a level of risk and can compare the risk score of each candidate to each threshold value to determine the level of risk for each candidate. In some embodiments, the system can calculate a percentage of risk contribution for each attribute and can display the percentages in a risk score card that includes a visual representation of the percentages on a graphical user interface.
As another example, the system can determine the subset of candidate using a risk identification model. In some cases, the risk identification model can be a machine learning model that is configured to process a set of file attributes for each candidate to generate an indication of whether the candidate is a high-risk candidate. In this case, the system can train the risk identification model on a dataset of previously identified and assessed candidate files, e.g., by predicting the subset of candidates that are high-risk using the risk attributes of the training data set and minimizing a discrepancy between predicted indications and ground truth risk indications form the training data set.
The system can then perform one or more automated tests on the determined subset of candidates (step 1950). In some embodiments, the system can enact one or more actions to the subset of candidates based on an assigned label. For example, the system can retain the subset of candidates for further assessment. As another example, the system can delete the subset of candidates from the repository, move the subset of candidates to a new location in the repository, or add the subset of candidates to a maintained file repository. As yet another example, the system can start a workflow using a pre-built workflow template. As a further example, the system can identify and perform one or more automated tests on the subset of candidates based on the assigned label.
As a further example, the system can perform one or more of an automated fairness assessment, interpretability assessment, validity assessment, reliability, and data drift assessment. In this case, the system can add results of the one or more automated tests in an inventory record. In some cases, the system can start a workflow based on a result of the one or more automated tests or send a notification to one or more identified relevant users based on a result of the one or more automated tests.
The system can receive a first set of user-inputs specifying an entry to a metadata field (step 2010). For example, the system can receive the user-inputs through a user-interface that includes a user-configurable identification of a set of metadata fields. In this case, the system can receive the first set of user-inputs including multiple attributes pertaining to data to be stored in a database, where each user-input specifies, for each of the multiple attributes, an entry corresponding to at least a subset of the metadata fields. In some cases, one or more of the multiple attributes can be calculable fields that are derived from at least one of the metadata fields. In this case, the system can derive the calculable fields using a calculator engine configured to enact one or more arithmetical and logical operations to the at least one metadata field.
In some embodiments, the first set of user-inputs can include a corresponding set of color, placement, and containerization instructions relating to one or more of a tab and group layout, wherein the set of color, placement, and containerization instructions indicate at least a location and color of the first-set of user-inputs for display on the electronic display device. As an example, an administrator can configure the form using the first set of user-inputs.
The system can generate a data entry form configured to accept a second set of user-inputs pertaining to values for the metadata fields (step 2020). For example, the system can automatically generate the form based on the user-configurable identification of the set of metadata fields. In this case, the system can process the first set of user-inputs to display the one or more multiple attributes according to the corresponding set of color, placement, and containerization instructions received in step 2010.
The system can then receive data including the second set of user inputs from the data entry form (step 2030), and the system can store the data in a database (step 2040). For example, the system can receive the second set of user inputs from the display of the data entry form on an electronic display device. In the case that a database configured to maintain the data included in the second set of user inputs does not exist, the system can generate the database based on the first set of user-inputs.
The system can receive a sequence of tasks in a workflow through an arrangement of task icons using placeable links (step 2110). For example, the system can receive the arrangement of task icons through a user interface including a set of task icons indicative of respective tasks and one or more placeable links for connecting pairs of task icons. In this case, the system can identify the sequence of tasks in the workflow through an arrangement of the task icons into an order using the one or more placeable links and can also receive an assignment of each task icon to one or more users or groups of users from the graphical user interface. In some embodiments, the one or more users or groups of users are dynamically assigned using a user-input value from a pre-configured metadata form.
The system can enact one or more tasks in the sequence of tasks by generating a workflow task form (step 2120). More specifically, the system can generate a workflow task form configured to accept one or more user-inputs pertaining to values associated with an outcome of the task in accordance with a user-configured identification of the one or more user-inputs solicited from the one or more users or groups assigned to the task for completion of the task. In some cases, the user-configured identification of the one or more user-inputs includes a set of questions indicated as required or optional for task completion.
The system can receive data including user-inputs from the workflow task form (step 2130), and the system can store the data in a database (step 2140). For example, the system can receive the data from the display of the workflow task form on an electronic display device. In some cases, the system can prevent the workflow from advancing based on one or more user-inputs.
The system can monitor a file automatically for changes (step 2210). For example, the system can receive an indication that one or more files should be monitored for modification or user behavior exceptions. As an example, modification can include insertion or deletion of content, e.g., including design modifications. As another example, user behavior exceptions can include a user modifying a file at a time when the user usually does not interact with the file or a user modifying a file that has not been modified in a particular duration of time.
In response to a detected change, the system can generate an audit trail including data derived from the detected change (step 2220). In some embodiments, the system can determine a measure of risk for the detected change. In this case, in response to an identification of a high-risk change, the system can identify and execute one or more tasks. For example, the system can alert one or more users or groups of users, start a workflow, or secure the file against further editing. As another example, the system can require verification of the detected change or highlight the change.
The system can store the generated audit trail in a database (step 2230). In some cases, determining the measure of risk for the detected change includes processing data relating to the detected change using a machine learning model to predict a measure of risk in response to the detected change. In this case, the system can train the machine learning model using the data logged in the database relating to previous detected changes and user actions taken in response to the detected change.
This specification uses the term “configured” in connection with systems and computer program components. For a system of one or more computers to be configured to perform particular operations or actions means that the system has installed on it software, firmware, hardware, or a combination of them that in operation cause the system to perform the operations or actions. For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs include instructions that, when executed by data processing apparatus, cause the apparatus to perform the operations or actions.
Embodiments of the subject matter and the functional operations described in this specification can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, in computer hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory storage medium for execution by, or to control the operation of, data processing apparatus. The computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them. Alternatively or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
The term “data processing apparatus” refers to data processing hardware and encompasses all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can also be, or further include, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can optionally include, in addition to hardware, code that creates an execution environment for computer programs, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
A computer program, which may also be referred to or described as a program, software, a software application, an app, a module, a software module, a script, or code, can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages; and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data, e.g., one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, e.g., files that store one or more modules, sub-programs, or portions of code. A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.
In this specification the term “engine” is used broadly to refer to a software-based system, subsystem, or process that is programmed to perform one or more specific functions. Generally, an engine will be implemented as one or more software modules or components, installed on one or more computers in one or more locations. In some cases, one or more computers will be dedicated to a particular engine; in other cases, multiple engines can be installed and running on the same computer or computers.
The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by special purpose logic circuitry, e.g., an FPGA or an ASIC, or by a combination of special purpose logic circuitry and one or more programmed computers.
Computers suitable for the execution of a computer program can be based on general or special purpose microprocessors or both, or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data. The central processing unit and the memory can be supplemented by, or incorporated in, special purpose logic circuitry. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device, e.g., a universal serial bus (USB) flash drive, to name just a few.
Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.
To provide for interaction with a user, embodiments of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages to a web browser on a user's device in response to requests received from the web browser. Also, a computer can interact with a user by sending text messages or other forms of message to a personal device, e.g., a smartphone that is running a messaging application, and receiving responsive messages from the user in return.
Data processing apparatus for implementing machine learning models can also include, for example, special-purpose hardware accelerator units for processing common and compute-intensive parts of machine learning training or production, i.e., inference, workloads.
Machine learning models can be implemented and deployed using a machine learning framework, e.g., a TensorFlow framework, or a Jax framework.
Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface, a web browser, or an app through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. In some embodiments, a server transmits data, e.g., an HTML page, to a user device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the device, which acts as a client. Data generated at the user device, e.g., a result of the user interaction, can be received at the server from the device.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or on the scope of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially be claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings and recited in the claims in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous.
This application claims the benefit of U.S. Provisional Application No. 63/614,478, filed Dec. 22, 2023, the contents of which are incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63614478 | Dec 2023 | US |