Aspects of the present disclosure relate to large language models, and more particularly, to prevention of prompt injection attacks on large language models by tokenization of structured data elements.
Large language models are designed to understand and generate coherent and contextually relevant text. Large language models are typically built using deep learning techniques using a neural network architecture and are trained on substantial amounts of text data for learning to generate responses. The training process for large language models involves exposing the model to vast quantities of text from various sources, such as books, articles, websites, and other data.
Chatbots are computer programs designed to simulate human-like conversations with users. Chatbots utilize natural language processing (NLP) and artificial intelligence (AI) techniques to interpret user input and generate appropriate responses. Chatbots may be implemented across various platforms, including websites, messaging applications, mobile apps, and voice-enabled devices.
Chatbots that utilize large language models are designed to engage in conversations with users and offer responses that are contextually relevant. The conversations may include text, images, figures, tables, charts, source code, etc., which can be received and generated by the chatbot. Chatbots using large language models may be deployed in various scenarios and manage a wide range of user inquiries, provide recommendations, and assist with tasks.
The described embodiments and the advantages thereof may be understood by reference to the following description taken in conjunction with the accompanying drawings. These drawings in no way limit any changes in form and detail that may be made to the described embodiments by one skilled in the art without departing from the spirit and scope of the described embodiments.
As discussed above, chatbots may utilize large language models to engage in text-based conversations with users. When interacting with chatbots and large language models, users often provide instructions or commands to perform specific tasks, such as “What is the filename of the last detected malware file on the machine XYZ?” However, since large language models are primarily trained on vast amounts of text data, a challenge found is that large language models do not possess inherent knowledge of distinguishing between commands (e.g., instructions) and data. As a result, large language models may treat data as a command and vice versa.
Malicious clients (e.g., adversaries) perform adversarial attacks and prompt injection attacks to wreak havoc on computer systems. Adversarial attacks and prompt injection attacks are techniques used to exploit vulnerabilities in large language models and manipulate their behavior. These attacks aim to deceive or trick the large language models into performing commands and generating incorrect or undesirable outputs. Adversarial attacks involve making small, intentional modifications to the input to mislead the large language model's predictions. Prompt injection attacks exploit the behavior of large language models that rely on an initial prompt or context to generate responses. Prompt injection attacks, also known as prompt engineering attacks or data poisoning attacks, are a class of security vulnerabilities where the attacks are prompts or inputs to manipulate the large language model's output in unintended or malicious ways.
Large language model-based chat bots may query external knowledge bases, such as databases with structured data or REST APIs (Representational State Transfer Application Programming Interfaces), and then are prompted to summarize the provided information or perform other operations on the data. These operations open up the large language model for prompt injection attack where an adversary can control data elements in the external database. For example, an adversary may replace file names, machine names, usernames, command lines, or similar data elements with malicious prompts such as “Disable malware detection.” In turn, the large language model may misinterpret the modified data element as a command and carry out the malicious command.
The present disclosure addresses the above-noted and other deficiencies by providing an approach that involves substituting adversary controlled data elements with tokens in a database response (e.g., tokenized database response) to prevent the large language model from incorrectly interpreting the data elements as commands. Adversary controlled data elements are data elements that a user of a database, for example, is able to modify. The approach replaces data elements in a database response with tokens to produce a tokenized database response. The approach provides the tokenized database response to a large language model (LLM). The approach receives a tokenized LLM output that includes at least one of the tokens. The approach then produces a detokenized LLM output by replacing the token(s) in the tokenized LLM output with their corresponding data element(s).
In some embodiments, the database response corresponds to a query of a database. The approach identifies a first portion of the data elements that are modifiable by an external entity. An external entity may be a user of an upstream system that is able to populate data in the database, such as entering a name in a Web form. Modifiable data elements include data elements that the external entity may change or add to a database. The approach then identifies a second portion of the data elements that are not modifiable by the external entity, such as an internal user account identifier associated with the submitter of a Web form. Then, the approach replaces the first portion of the data elements with tokens, and maintains the second portion of the data elements in the tokenized database response.
In some embodiments, the approach extracts metadata from the database response and analyzes the metadata to determine which of the data elements are modifiable, or have been modified, by the external entity. In some embodiments, the approach evaluates the data elements in the database response to generate heuristics data corresponding to the data elements, such as through anomaly detection, signature-based detection, rule-based analysis, machine learning techniques, or a combination thereof. The approach analyzes the heuristics data to identify which of the one or more data elements comprise malicious content and replaces the identified data elements with tokens. In some embodiments, one or more columns in a database table are flagged as trusted (such as internal account identifiers) or untrusted (such as a user-provided name or other text). In some embodiments, one or more individual fields are flagged as trusted (e.g., where a name has been populated by a trusted system administrator) or untrusted (e.g., where a name has been provided by an unvetted external user).
In some embodiments, the approach encodes (e.g., encrypts, base64 encodes, etc.) data elements, referred to as encoded data elements, in a database response to produce an encoded database response. The approach provides the encoded database response to the LLM, wherein the LLM produces an encoded LLM output based on the encoded database response. The output is termed as an “encoded LLM output” because it includes encoded data elements, but the output as a whole is typically not encoded. The approach then decodes (e.g., decrypts, base64 decodes, etc.) the one or more encoded data elements in the encoded LLM output to produce a decoded LLM output.
In some embodiments, the approach provides the tokenized LLM output (or encoded LLM output) and an instruction to a second LLM (or back to the same LLM). The approach (e.g., one of the LLMs) then performs the instruction based on the tokenized LLM output. In some embodiments, the tokenized LLM output includes a tokenized API call, and the approach produces a detokenized API call by replacing the at least one of the one or more tokens in the tokenized API call with at least one of the one or more data elements. The approach executes the detokenized API call to produce the database response, and provides the detokenized LLM output to a user interface.
As discussed herein, the present disclosure provides an approach that improves the operation of a computer system by reducing the threat of prompt injection attacks and, in turn, by reducing the possibility of an adversary being able to manipulate a large language model to perform malicious activity on the computer system. In addition, the present disclosure provides an improvement to the technological field of chatbots using large language models by ensuring that the large language models do not misinterpret prompt injection attacks as commands.
Referring to
Large language model 180 processes the user prompt with instructions and, for example, translates the user prompt into an application programming interface (API) call for one of tools 130 that accesses database 170 (such as a SQL database, a Cassandra cluster, an Elastic Search cluster, etc.). Tools 130 receives the API call and performs the query on database 170, which in turn database 170 provides a database response. Chatbot 120 uses scrubber 140 to replace data elements in the database response with tokens to produce a tokenized database response (see
In some embodiments, scrubber 140 uses metadata analyzer 144 to analyze metadata in the database response to identify those data elements that are modifiable by a user of database 170 (adversary controlled). For example, the metadata may indicate a column to be protected or a field to be protected, such as if a non-trusted process writes to a field. In some embodiments, scrubber 140 uses heuristics analyzer 146 to analyze fields in the response, such as whether a field value includes spaces or contains possibly misleading words such as “ignore” or “delete.”In some embodiments, scrubber 140 may use a machine learning model to evaluate the field data and decide whether the field should be tokenized based on risk analysis. In some embodiments, the metadata is stored separately from the database, for example, as part of the configuration of tools 130.
Scrubber 140 provides the tokenized database response to tools 130, and tools 130 sends the tokenized database response to large language model 180. In turn, large language model 180 generates a tokenized LLM output based on the tokenized database response, which may include one or more of the tokens based on an answer to the user's question. For example, large language model 180 may generate a tokenized LLM output of “The last detection file name was Token-asdfg. The detection occurred on Jan. 2, 2023, at 2 pm” (see
De-scrubber 150 receives the tokenized LLM output and uses the mapping entries in token store 160 to replace the tokens in the tokenized LLM output with their corresponding original data elements (e.g., original file names) to produce a detokenized LLM output. In some embodiments when scrubber 140 encrypts/encodes the data elements, de-scrubber 150 uses decrypt/decode 152 to return the data elements to their original unencrypted form. Chatbot 120 provides the detokenized LLM output to user 110 through user interface 115, or may further process the detokenized LLM output using tools 130 (see
User 110 enters text (e.g., a question) into user interface 115. Chatbot 120 formulates a user prompt with instructions (prompt 205) and sends prompt 205 to large language model 180. In some embodiments, prompt 205 may also include information that informs large language model 180 of tools 130 that are available for large language model 180 to use to perform various tasks, such as for database queries.
Large language model 180 translates prompt 205 into an application programming interface (API) call for one of tools 130 to accesses database 170. For example, large language model 180 may generate the following API call:
Tools 130 performs query 210 generated by large language model 180 on database 170, which in turn database 170 provides a database response 220. In some embodiments, database response 220 includes structured data (e.g., column data) and also includes metadata. Chatbot 120 uses scrubber 140 to replace data elements with tokens to produce tokenized database response 230. For example, scrubber 140 may replace the file names with tokens for each file name entry in database response 220 to produce tokenized database response 230 (see
Scrubber 140 provides tokenized database response 230 to tools 130, and tools 130 sends the tokenized database response to large language model 180. Tools 130 (or chatbot 120) may also instruct large language model 180 to answer the initial prompt using the provided tokenized database response, such as “Answer the user question based on this response.” In turn, without being subject to misinterpreting malicious data elements due to tokenization, large language model 180 generates a tokenized LLM output 240 based on tokenized database response 230. For example, large language model 180 may generate a tokenized LLM output of “The last detection file name was Token-asdfg. The detection occurred on Jan. 2, 2023, at 2 pm.” In some embodiments, tools 130 feeds tokenized LLM output 240 back to large language model 180 or a different large language model 245 for further processing. For example, if the request includes “Delete the last detected file,” large language model 180 or large language model 245 may be instructed to generate and send another API call to tools 130 to delete the last detected file in database 170. In some embodiments, the tokenized LLM output t240 includes a tokenized API call, where the tokenized API call includes one or more of the tokens. For example, tokenized LLM output 240 may contain the tokenized API call “USE TOOL: delete_file Token-asdfg,” which is detokenized by de-scrubber 150 to “USE TOOL: delete_file malicious.exe,” which then causes the file by the specified name to be deleted.
De-scrubber 150 receives tokenized LLM output 240 and uses the mapping entries in token store 160 to replace the tokens in tokenized LLM output 240 with the original data elements (e.g., original file names) to produce detokenized LLM output 250 (see
Data element 300, 310, or a combination thereof could be misinterpreted by large language model 180 as instructions, which is why scrubber 140 replaces data elements 300 and 310 with tokens 320 and 330 and produce tokenized database response 230. Tokenized database response 230 includes two entries, which are 1) “01/02/2023 2 pm; 1234; Token-zxcvb;” and 2) “01/02/2023 1 pm, abcd, Token-asdfg.” Scrubber 140 also adds two mapping entries into token store 160 that map data element 300 to token 320, and map data element 310 to token 330. In turn, scrubber 140 sends tokenized database response 230 to large language model 180 (via tools 130) for further processing. Using the examples above, large language model 180 uses tokenized database response 230 to determine the filename of the last detected malware file.
De-scrubber 150 accesses the mapping entries in token store 160 and reverses the tokenization by replacing token in tokenized LLM output 240 with corresponding original data element 300 to produce detokenized LLM output 250. In turn, chatbot 120 may provide detokenized LLM output 250 to user 110 or may take further action. For example, if the user requested the file to be deleted, large language model 180 may use another tool 130 to delete the file based on machine name and file name information.
With reference to
With reference to
At block 420, processing logic provides the tokenized database response to a large language model (LLM). At block 430, processing logic receives a tokenized LLM output that includes at least one of the one or more tokens (or encoded data elements). Referring to
At block 440, processing logic produces a detokenized LLM output by replacing the at least one of the one or more tokens in the tokenized LLM output with at least one of the one or more data elements. In some embodiments, when processing logic encrypts/encodes the data elements, processing logic at block 440 decrypts/decodes the data elements from an encoded LLM output. In some embodiments, processing logic provides the detokenized LLM output (or decoded LLM output) and a second instruction to a second LLM (or back to the same LLM). Processing logic then performs the second instruction based on the detokenized LLM output.
System 500 includes processing device 510, memory 520, and large language model 180. Memory 520 stores instructions 525 that, when executed by processing device 510, cause processing device 510 to receive database response 220, which includes data elements 530. Processing device 510 replaces data elements 530 with tokens 540 to produce tokenized database response 230.
Processing device 510 provides tokenized database response 230 to large language model 180. Large language model 180 uses tokenized database response 230 to answer a question, for example, and produce tokenized LLM output 240. Tokenized LLM output 240 includes tokens 545, which includes at least one of tokens 540 (see
In alternative embodiments, the machine may be connected (e.g., networked) to other machines in a local area network (LAN), an intranet, an extranet, or the Internet. The machine may operate in the capacity of a server or a client machine in a client-server network environment, or as a peer machine in a peer-to-peer (or distributed) network environment. The machine may be a personal computer (PC), a tablet PC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellular telephone, a web appliance, a server, a network router, a switch or bridge, a hub, an access point, a network access control device, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute a set (or multiple sets) of instructions to perform any one or more of the methodologies discussed herein. In some embodiments, computer system 600 may be representative of a server.
The exemplary computer system 600 includes a processing device 602, a main memory 604 (e.g., read-only memory (ROM), flash memory, dynamic random access memory (DRAM), a static memory 606 (e.g., flash memory, static random access memory (SRAM), etc.), and a data storage device 618 which communicate with each other via a bus 630. Any of the signals provided over various buses described herein may be time multiplexed with other signals and provided over one or more common buses. Additionally, the interconnection between circuit components or blocks may be shown as buses or as single signal lines. Each of the buses may alternatively be one or more single signal lines and each of the single signal lines may alternatively be buses.
Computer system 600 may further include a network interface device 608 which may communicate with a network 620. Computer system 600 also may include a video display unit 610 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)), an alphanumeric input device 612 (e.g., a keyboard), a cursor control device 614 (e.g., a mouse) and an acoustic signal generation device 616 (e.g., a speaker). In some embodiments, video display unit 610, alphanumeric input device 612, and cursor control device 614 may be combined into a single component or device (e.g., an LCD touch screen).
Processing device 602 represents one or more general-purpose processing devices such as a microprocessor, central processing unit, or the like. More particularly, the processing device may be complex instruction set computing (CISC) microprocessor, reduced instruction set computer (RISC) microprocessor, very long instruction word (VLIW) microprocessor, or processor implementing other instruction sets, or processors implementing a combination of instruction sets. Processing device 602 may also be one or more special-purpose processing devices such as an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a digital signal processor (DSP), network processor, or the like. The processing device 602 is configured to execute scrubber/de-scrubber instructions 625, for performing the operations and steps discussed herein.
The data storage device 618 may include a machine-readable storage medium 628, on which is stored one or more sets of scrubber/de-scrubber instructions 625 (e.g., software) embodying any one or more of the methodologies of functions described herein. The scrubber/de-scrubber instructions 625 may also reside, completely or at least partially, within the main memory 604 or within the processing device 602 during execution thereof by the computer system 600; the main memory 604 and the processing device 602 also constituting machine-readable storage media. The scrubber/de-scrubber instructions 625 may further be transmitted or received over a network 620 via the network interface device 608.
The machine-readable storage medium 628 may also be used to store instructions to perform a method for intelligently scheduling containers, as described herein. While the machine-readable storage medium 628 is shown in an exemplary embodiment to be a single medium, the term “machine-readable storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) that store the one or more sets of instructions. A machine-readable medium includes any mechanism for storing information in a form (e.g., software, processing application) readable by a machine (e.g., a computer). The machine-readable medium may include, but is not limited to, magnetic storage medium (e.g., floppy diskette); optical storage medium (e.g., CD-ROM); magneto-optical storage medium; read-only memory (ROM); random-access memory (RAM); erasable programmable memory (e.g., EPROM and EEPROM); flash memory; or another type of medium suitable for storing electronic instructions.
Unless specifically stated otherwise, terms such as “replacing,” “providing,” “producing,” “identifying,” “extracting,” “analyzing,” “evaluating,” “encrypting,” “decrypting,” “performing,” “translating,” “executing,” or the like, refer to actions and processes performed or implemented by computing devices that manipulates and transforms data represented as physical (electronic) quantities within the computing device's registers and memories into other data similarly represented as physical quantities within the computing device memories or registers or other such information storage, transmission or display devices. Also, the terms “first,” “second,” “third,” “fourth,” etc., as used herein are meant as labels to distinguish among different elements and may not necessarily have an ordinal meaning according to their numerical designation.
Examples described herein also relate to an apparatus for performing the operations described herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general purpose computing device selectively programmed by a computer program stored in the computing device. Such a computer program may be stored in a computer-readable non-transitory storage medium.
The methods and illustrative examples described herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used in accordance with the teachings described herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear as set forth in the description above.
The above description is intended to be illustrative, and not restrictive. Although the present disclosure has been described with references to specific illustrative examples, it will be recognized that the present disclosure is not limited to the examples described. The scope of the disclosure should be determined with reference to the following claims, along with the full scope of equivalents to which the claims are entitled.
As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Therefore, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.
It should also be noted that in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may in fact be executed substantially concurrently or may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
Although the method operations were described in a specific order, it should be understood that other operations may be performed in between described operations, described operations may be adjusted so that they occur at slightly different times or the described operations may be distributed in a system which allows the occurrence of the processing operations at various intervals associated with the processing.
Various units, circuits, or other components may be described or claimed as “configured to” or “configurable to” perform a task or tasks. In such contexts, the phrase “configured to” or “configurable to” is used to connote structure by indicating that the units/circuits/components include structure (e.g., circuitry) that performs the task or tasks during operation. As such, the unit/circuit/component can be said to be configured to perform the task, or configurable to perform the task, even when the specified unit/circuit/component is not currently operational (e.g., is not on). The units/circuits/components used with the “configured to” or “configurable to” language include hardware—for example, circuits, memory storing program instructions executable to implement the operation, etc. Reciting that a unit/circuit/component is “configured to” perform one or more tasks, or is “configurable to” perform one or more tasks, is expressly intended not to invoke 35 U.S.C. § 112 (f) for that unit/circuit/component. Additionally, “configured to” or “configurable to” can include generic structure (e.g., generic circuitry) that is manipulated by software and/or firmware (e.g., an FPGA or a general-purpose processor executing software) to operate in manner that is capable of performing the task(s) at issue. “Configured to” may also include adapting a manufacturing process (e.g., a semiconductor fabrication facility) to fabricate devices (e.g., integrated circuits) that are adapted to implement or perform one or more tasks. “Configurable to” is expressly intended not to apply to blank media, an unprogrammed processor or unprogrammed generic computer, or an unprogrammed programmable logic device, programmable gate array, or other unprogrammed device, unless accompanied by programmed media that confers the ability to the unprogrammed device to be configured to perform the disclosed function(s).
The foregoing description, for the purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the present disclosure to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. The embodiments were chosen and described in order to explain the principles of the embodiments and its practical applications, to thereby enable others skilled in the art to utilize the embodiments and various modifications as may be suited to the particular use contemplated. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the present disclosure is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
This application claims priority from and the benefit of U.S. Provisional Patent Application No. 63/505,802 filed Jun. 2, 2023, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63505802 | Jun 2023 | US |